Boost mailing page: Re: [text] SIMD UTF-8 decoding

Date view	Thread view	Subject view	Author view

From: Alexander Grund (alexander.grund_at_[hidden])
Date: 2020-06-18 06:45:52

Next message: pbristow_at_[hidden]: "Re: Boost.Random"
Previous message: degski: "Re: [container] Proposal for ring queues"
In reply to: Phil Endecott: "Re: [text] SIMD UTF-8 decoding"
Next in thread: Phil Endecott: "Re: [text] SIMD UTF-8 decoding"
Reply: Phil Endecott: "Re: [text] SIMD UTF-8 decoding"

> I think it has most of what's needed, though it seems that the
> type conversion __builtin_convertvector, which is needed to
> expand e.g. a UTF-8 byte to UTF-32 with zero bytes, is only present
> in newer versions of g++ than I have.
Than it's likely not very useful for now. Maybe later once that compiler
version is more wide-spread
> // Attempt to decode the subset of UTF-8 with code points < 256.
> // Format is either 0xxxxxxx -> 0xxxxxxx
> // or 110---xx 10yyyyyy -> xxyyyyyy
> // The input mustn't start or finish in the middle of a multi-byte
> // character.
> // Other inputs produce undefined outputs.
Good code for that special case. But I think "undefined outputs" is not
acceptable. I've seen other SIMD UTF-8 conversions around and they
basically all focus on ASCII converting as much as possible and fallback
to one-by-one decoding once a non-ascii is found
> That will be quick, but it does lack a few things; it doesn't check if
> it has reached the end of the input and it doesn't do any error checking.

So not really usable either. BUT: Compare to Boost.Locale which has a
`decode` and `decode_valid` function where the latter assumes valid UTF-8
However checking for end-of-input is a must obviously.

BTW: Does Boost.Text have functions or overloads where you can specify
that text is in a specific encoding/normalization?
If not I think this should be added. Sometimes you get text from an
internal function and know those things so you can skip verification and
conversion

application/pkcs7-signature attachment: S/MIME Cryptographic Signature

Next message: pbristow_at_[hidden]: "Re: Boost.Random"
Previous message: degski: "Re: [container] Proposal for ring queues"
In reply to: Phil Endecott: "Re: [text] SIMD UTF-8 decoding"
Next in thread: Phil Endecott: "Re: [text] SIMD UTF-8 decoding"
Reply: Phil Endecott: "Re: [text] SIMD UTF-8 decoding"

Date view	Thread view	Subject view	Author view