Pure-Haskell proper unicode string handling (http://lelf.lu/prose)
root
Pure-Haskell proper unicode strings
λ> graphemes "བོད་ཀྱི་སྐད་ཡིག།"
["བོ","ད","་","ཀྱི","་","སྐ","ད","་","ཡི","ག","།"]
segmentation:
✓⃞ grapheme
✓⃞ words ⃞ tailored
⃞ sentences ⃞ tailored
⃞ line-breaking ⃞ tailored
normalization:
✓⃞ NFD ✓⃞ NFKD ✓⃞ NFC ⃞ NFKC
collating ⃞ …
transformation ⃞ …
character properties ⃞ …
other cldr ⃞ …
optimizations: none
| Pros | Prose/𝘚 | ICU |
---|---|---|
segmentation/graphemes one-lang text | 1.60ms | 0.47ms |
segmentation/graphemes chars sample | 15.84ms | 16.30ms |