flambe.nlp.language_modeling.datasets
¶
Module Contents¶
-
class
flambe.nlp.language_modeling.datasets.
PTBDataset
(split_by_sentence: bool = False, end_of_line_token: Optional[str] = '<eol>', cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶ Bases:
flambe.dataset.TabularDataset
The official PTB dataset.
-
class
flambe.nlp.language_modeling.datasets.
Wiki103
(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', remove_headers: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶ Bases:
flambe.dataset.TabularDataset
The official WikiText103 dataset.
-
class
flambe.nlp.language_modeling.datasets.
Enwiki8
(num_eval_symbols: int = 5000000, remove_end_of_line: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶ Bases:
flambe.dataset.TabularDataset
The official WikiText103 dataset.