Data.CodePoint.Unicode
- Package
- unicode
- Repository
- id3as/purescript-unicode
#isAsciiLower Source
isAsciiLower :: CodePoint -> Boolean
Selects ASCII lower-case letters,
i.e. characters satisfying both isAscii
and isLower
.
#isAsciiUpper Source
isAsciiUpper :: CodePoint -> Boolean
Selects ASCII upper-case letters,
i.e. characters satisfying both isAscii
and isUpper
.
#isAlphaNum Source
isAlphaNum :: CodePoint -> Boolean
Selects alphabetic or numeric digit Unicode characters.
Note that numeric digits outside the ASCII range are selected by this
function but not by isDigit
. Such digits may be part of identifiers
but are not used by the printer and reader to represent numbers.
#isLetter Source
isLetter :: CodePoint -> Boolean
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters).
This function returns true
if its argument has one of the
following GeneralCategory
s, or false
otherwise:
UppercaseLetter
LowercaseLetter
TitlecaseLetter
ModifierLetter
OtherLetter
These classes are defined in the Unicode Character Database part of the Unicode standard. The same document defines what is and is not a "Letter".
Examples
Basic usage:
>>> isLetter (codePointFromChar 'a')
true
>>> isLetter (codePointFromChar 'A')
true
>>> isLetter (codePointFromChar '0')
false
>>> isLetter (codePointFromChar '%')
false
>>> isLetter (codePointFromChar '♥')
false
>>> isLetter (codePointFromChar '\x1F')
false
Ensure that 'isLetter' and 'isAlpha' are equivalent.
>>> chars = enumFromTo bottom top :: Array CodePoint
>>> letters = map isLetter chars
>>> alphas = map isAlpha chars
>>> letters == alphas
true
#isDecDigit Source
isDecDigit :: CodePoint -> Boolean
Selects ASCII decimal digits, i.e. 0..9
.
#isOctDigit Source
isOctDigit :: CodePoint -> Boolean
Selects ASCII octal digits, i.e. 0..7
.
#isHexDigit Source
isHexDigit :: CodePoint -> Boolean
Selects ASCII hexadecimal digits,
i.e. 0..9, A..F, a..f
.
#isSymbol Source
isSymbol :: CodePoint -> Boolean
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns true
if its argument has one of the
following GeneralCategory
s, or false
otherwise:
MathSymbol
CurrencySymbol
ModifierSymbol
OtherSymbol
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".
Examples
Basic usage:
>>> isSymbol (codePointFromChar 'a')
false
>>> isSymbol (codePointFromChar '6')
false
>>> isSymbol (codePointFromChar '=')
true
The definition of "math symbol" may be a little counter-intuitive depending on one's background:
>>> isSymbol (codePointFromChar '+')
true
>>> isSymbol (codePointFromChar '-')
false
#isSeparator Source
isSeparator :: CodePoint -> Boolean
Selects Unicode space and separator characters.
This function returns true
if its argument has one of the
following GeneralCategory
s, or false
otherwise:
Space
LineSeparator
ParagraphSeparator
These classes are defined in the Unicode Character Database part of the Unicode standard. The same document defines what is and is not a "Separator".
Examples
Basic usage:
>>> isSeparator (codePointFromChar 'a')
false
>>> isSeparator (codePointFromChar '6')
false
>>> isSeparator (codePointFromChar ' ')
true
>>> isSeparator (codePointFromChar '-')
false
Warning: newlines and tab characters are not considered separators.
>>> isSeparator (codePointFromChar '\n')
false
>>> isSeparator (codePointFromChar '\t')
false
But some more exotic characters are (like HTML's @ @):
>>> isSeparator (codePointFromChar '\xA0')
true
#isPunctuation Source
isPunctuation :: CodePoint -> Boolean
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.
This function returns true
if its argument has one of the
following GeneralCategory
s, or false
otherwise:
ConnectorPunctuation
DashPunctuation
OpenPunctuation
ClosePunctuation
InitialQuote
FinalQuote
OtherPunctuation
These classes are defined in the [Unicode Character Database])http://www.unicode.org/reports/tr44/tr44-14.html#GC_Values_Table) part of the Unicode standard. The same document defines what is and is not a "Punctuation".
Examples
Basic usage:
>>> isPunctuation (codePointFromChar 'a')
false
>>> isPunctuation (codePointFromChar '7')
false
>>> isPunctuation (codePointFromChar '♥')
false
>>> isPunctuation (codePointFromChar '"')
true
>>> isPunctuation (codePointFromChar '?')
true
>>> isPunctuation (codePointFromChar '—')
true
#isMark Source
isMark :: CodePoint -> Boolean
Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.
This function returns true
if its argument has one of the
following GeneralCategory
s, or false
otherwise:
NonSpacingMark
SpacingCombiningMark
EnclosingMark
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Mark".
Examples
Basic usage:
>>> isMark (codePointFromChar 'a')
false
>>> isMark (codePointFromChar '0')
false
Combining marks such as accent characters usually need to follow another character before they become printable:
>>> map isMark (toCodePointArray "ò")
[false,true]
Puns are not necessarily supported:
>>> isMark (codePointFromChar '✓')
false
#isNumber Source
isNumber :: CodePoint -> Boolean
Selects Unicode numeric characters, including digits from various scripts, Roman numerals, et cetera.
This function returns true
if its argument has one of the
following GeneralCategory
s, or false
otherwise:
DecimalNumber
LetterNumber
OtherNumber
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Number".
Examples
Basic usage:
>>> isNumber (codePointFromChar 'a')
false
>>> isNumber (codePointFromChar '%')
false
>>> isNumber (codePointFromChar '3')
true
ASCII @'0'@ through @'9'@ are all numbers:
>>> and $ map (isNumber <<< codePointFromChar) (enumFromTo '0' '9' :: Array Char)
true
Unicode Roman numerals are "numbers" as well:
>>> isNumber (codePointFromChar 'Ⅸ')
true
#hexDigitToInt Source
hexDigitToInt :: CodePoint -> Maybe Int
Convert a single digit Char
to the corresponding Just Int
if its argument
satisfies isHexDigit
(one of 0..9, A..F, a..f
). Anything else converts to Nothing
>>> import Data.Traversable
>>> traverse (hexDigitToInt <<< codePointFromChar) ['0','1','2','3','4','5','6','7','8','9']
(Just [0,1,2,3,4,5,6,7,8,9])
>>> traverse (hexDigitToInt <<< codePointFromChar) ['a','b','c','d','e','f']
(Just [10,11,12,13,14,15])
>>> traverse (hexDigitToInt <<< codePointFromChar) ['A','B','C','D','E','F']
(Just [10,11,12,13,14,15])
>>> hexDigitToInt (codePointFromChar 'G')
Nothing
#decDigitToInt Source
decDigitToInt :: CodePoint -> Maybe Int
Convert a single digit Char
to the corresponding Just Int
if its argument
satisfies isDecDigit
(one of 0..9
). Anything else converts to Nothing
>>> import Data.Traversable
>>> traverse decDigitToInt ['0','1','2','3','4','5','6','7','8','9']
(Just [0,1,2,3,4,5,6,7,8,9])
>>> decDigitToInt 'a'
Nothing
#octDigitToInt Source
octDigitToInt :: CodePoint -> Maybe Int
Convert a single digit Char
to the corresponding Just Int
if its argument
satisfies isOctDigit
(one of 0..7
). Anything else converts to Nothing
>>> import Data.Traversable
>>> traverse octDigitToInt ['0','1','2','3','4','5','6','7']
(Just [0,1,2,3,4,5,6,7])
>>> octDigitToInt '8'
Nothing
#toLowerSimple Source
toLowerSimple :: CodePoint -> CodePoint
Convert a code point to the corresponding lower-case code point, if any. Any other character is returned unchanged.
#toUpperSimple Source
toUpperSimple :: CodePoint -> CodePoint
Convert a code point to the corresponding upper-case code point, if any. Any other character is returned unchanged.
#toTitleSimple Source
toTitleSimple :: CodePoint -> CodePoint
Convert a code point to the corresponding title-case or upper-case code point, if any. (Title case differs from upper case only for a small number of ligature characters.) Any other character is returned unchanged.
#caseFoldSimple Source
caseFoldSimple :: CodePoint -> CodePoint
Convert a code point to the corresponding case-folded code point. Any other character is returned unchanged.
#GeneralCategory Source
data GeneralCategory
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).
Examples
Basic usage:
>>> :t OtherLetter
OtherLetter :: GeneralCategory
Eq
instance:
>>> UppercaseLetter == UppercaseLetter
true
>>> UppercaseLetter == LowercaseLetter
false
Ord
instance:
>>> NonSpacingMark <= MathSymbol
true
Enum
instance (TODO: this is not implemented yet):
>>> enumFromTo ModifierLetter SpacingCombiningMark
[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]
Show
instance:
>>> show EnclosingMark
"EnclosingMark"
Bounded
instance:
>>> bottom :: GeneralCategory
UppercaseLetter
>>> top :: GeneralCategory
NotAssigned
Constructors
UppercaseLetter
LowercaseLetter
TitlecaseLetter
ModifierLetter
OtherLetter
NonSpacingMark
SpacingCombiningMark
EnclosingMark
DecimalNumber
LetterNumber
OtherNumber
ConnectorPunctuation
DashPunctuation
OpenPunctuation
ClosePunctuation
InitialQuote
FinalQuote
OtherPunctuation
MathSymbol
CurrencySymbol
ModifierSymbol
OtherSymbol
Space
LineSeparator
ParagraphSeparator
Control
Format
Surrogate
PrivateUse
NotAssigned
Instances
#generalCategory Source
generalCategory :: CodePoint -> Maybe GeneralCategory
The Unicode general category of the character.
Examples
Basic usage:
>>> generalCategory (codePointFromChar 'a')
Just LowercaseLetter
>>> generalCategory (codePointFromChar 'A')
Just UppercaseLetter
>>> generalCategory (codePointFromChar '0')
Just DecimalNumber
>>> generalCategory (codePointFromChar '%')
Just OtherPunctuation
>>> generalCategory (codePointFromChar '♥')
Just OtherSymbol
>>> generalCategory (codePointFromChar '\31')
Just Control
>>> generalCategory (codePointFromChar ' ')
Just Space