flambe.nlp.language_modeling
¶
Submodules¶
Package Contents¶
-
class
flambe.nlp.language_modeling.
PTBDataset
(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶ Bases:
flambe.dataset.TabularDataset
The official PTB dataset.
-
PTB_URL
= https://raw.githubusercontent.com/yoonkim/lstm-char-cnn/master/data/ptb/¶
-
_process
(self, file: bytes)¶ Process the input file.
Parameters: field (str) – The input file, as bytes Returns: List of examples, where each example is a single element tuple containing the text. Return type: List[Tuple[str]]
-
-
class
flambe.nlp.language_modeling.
Wiki103
(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', remove_headers: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶ Bases:
flambe.dataset.TabularDataset
The official WikiText103 dataset.
-
WIKI_URL
= https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip¶
-
_process
(self, file: bytes)¶ Process the input file.
Parameters: file (bytes) – The input file, as a byte string Returns: List of examples, where each example is a single element tuple containing the text. Return type: List[Tuple[str]]
-
-
class
flambe.nlp.language_modeling.
Enwiki8
(num_eval_symbols: int = 5000000, remove_end_of_line: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶ Bases:
flambe.dataset.TabularDataset
The official WikiText103 dataset.
-
ENWIKI_URL
= http://mattmahoney.net/dc/enwik8.zip¶
-
_process
(self, file: bytes)¶ Process the input file.
Parameters: file (bytes) – The input file, as a byte string Returns: List of examples, where each example is a single element tuple containing the text. Return type: List[Tuple[str]]
-
-
class
flambe.nlp.language_modeling.
LMField
(**kwargs)[source]¶ Bases:
flambe.field.TextField
Language Model field.
Generates the original tensor alongside its shifted version.
-
process
(self, example: str)¶ Process an example and create 2 Tensors.
Parameters: example (str) – The example to process, as a single string Returns: The processed example, tokenized and numericalized Return type: Tuple[torch.Tensor, ..]
-
-
class
flambe.nlp.language_modeling.
LanguageModel
(embedder: Embedder, output_layer: Module, dropout: float = 0, pad_index: int = 0, tie_weights: bool = False, tie_weight_attr: str = 'embedding')[source]¶ Bases:
flambe.nn.Module
Implement an LanguageModel model for sequential classification.
This model can be used to language modeling, as well as other sequential classification tasks. The full sequence predictions are produced by the model, effectively making the number of examples the batch size multiplied by the sequence length.
-
forward
(self, data: Tensor, target: Optional[Tensor] = None)¶ Run a forward pass through the network.
Parameters: data (Tensor) – The input data Returns: The output predictions of shape seq_len x batch_size x n_out Return type: Union[Tensor, Tuple[Tensor, Tensor]]
-
-
class
flambe.nlp.language_modeling.
CorpusSampler
(batch_size: int = 128, unroll_size: int = 128, n_workers: int = 0, pin_memory: bool = False, downsample: Optional[float] = None, drop_last: bool = True)[source]¶ Bases:
flambe.sampler.sampler.Sampler
Implement a CorpusSampler object.
This object is useful for iteration over a large corpus of text in an ordered way. It takes as input a dataset with a single example containing the sequence of tokens and will yield batches that contain both source sequences of tensors corresponding to the Corpus’s text, and these same sequences shifted by one as the target.
-
static
collate_fn
(data: Sequence[Tuple[Tensor, Tensor]])¶ Create a batch from data.
Parameters: data (Sequence[Tuple[Tensor, Tensor]]) – List of (source, target) tuples. Returns: Source and target Tensors. Return type: Tuple[Tensor, Tensor]
-
sample
(self, data: Sequence[Sequence[Tensor]], n_epochs: int = 1)¶ Sample from the list of features and yields batches.
Parameters: - data (Sequence[Sequence[Tensor, ..]]) – The input data to sample from
- n_epochs (int, optional) – The number of epochs to run in the output iterator. Use -1 to run infinitely.
Yields: Iterator[Tuple[Tensor]] – A batch of data, as a tuple of Tensors
-
length
(self, data: Sequence[Sequence[torch.Tensor]])¶ Return the number of batches in the sampler.
Parameters: data (Sequence[Sequence[torch.Tensor, ..]]) – The input data to sample from Returns: The number of batches that would be created per epoch Return type: int
-
static