flambe.tokenizer.word
¶
Module Contents¶
-
class
flambe.tokenizer.word.
WordTokenizer
[source]¶ Bases:
flambe.tokenizer.Tokenizer
Implement a word level tokenizer.
-
class
flambe.tokenizer.word.
NGramsTokenizer
(ngrams: Union[int, List[int]] = 1)[source]¶ Bases:
flambe.tokenizer.Tokenizer
Implement a n-gram tokenizer
Examples
>>> t = NGramsTokenizer(ngrams=2).tokenize("hi how are you?") ['hi, how', 'how are', 'are you?']
>>> t = NGramsTokenizer(ngrams=[1,2]).tokenize("hi how are you?") ['hi,', 'how', 'are', 'you?', 'hi, how', 'how are', 'are you?']
Parameters: ngrams (Union[int, List[int]]) – An int or a list of ints. If it’s a list of ints, all n-grams (for each int) will be considered in the tokenizer.