flambe.tokenizer.subword
¶
Module Contents¶
-
class
flambe.tokenizer.subword.
BPETokenizer
(codes_path: str)[source]¶ Bases:
flambe.tokenizer.Tokenizer
Implement a subword level tokenizer using byte pair encoding. Tokenization is done using fastBPE (https://github.com/glample/fastBPE) and requires a fastBPE codes file.