Pure-Haskell proper unicode string handling (http://lelf.lu/prose)

root

Pure-Haskell proper unicode strings

λ> graphemes "བོད་ཀྱི་སྐད་ཡིག།"
["བོ","ད","་","ཀྱི","་","སྐ","ད","་","ཡི","ག","།"]

segmentation:
  ✓⃞ grapheme
  ✓⃞ words  ⃞ tailored
  ⃞ sentences  ⃞ tailored
  ⃞ line-breaking  ⃞ tailored

normalization:
  ✓⃞ NFD ✓⃞ NFKD ✓⃞ NFC  ⃞ NFKC

collating  ⃞ …
transformation  ⃞ …
character properties  ⃞ …
other cldr  ⃞ …


optimizations: none

| Pros Prose/𝘚 ICU
segmentation/graphemes one-lang text 1.60ms 0.47ms
segmentation/graphemes chars sample 15.84ms 16.30ms