flambe.nlp.language_modeling.sampler
¶
Module Contents¶
-
class
flambe.nlp.language_modeling.sampler.
CorpusSampler
(batch_size: int = 128, unroll_size: int = 128, n_workers: int = 0, pin_memory: bool = False, downsample: Optional[float] = None, drop_last: bool = True)[source]¶ Bases:
flambe.sampler.sampler.Sampler
Implement a CorpusSampler object.
This object is useful for iteration over a large corpus of text in an ordered way. It takes as input a dataset with a single example containing the sequence of tokens and will yield batches that contain both source sequences of tensors corresponding to the Corpus’s text, and these same sequences shifted by one as the target.
-
static
collate_fn
(data: Sequence[Tuple[Tensor, Tensor]])[source]¶ Create a batch from data.
Parameters: data (Sequence[Tuple[Tensor, Tensor]]) – List of (source, target) tuples. Returns: Source and target Tensors. Return type: Tuple[Tensor, Tensor]
-
sample
(self, data: Sequence[Sequence[Tensor]], n_epochs: int = 1)[source]¶ Sample from the list of features and yields batches.
Parameters: - data (Sequence[Sequence[Tensor, ..]]) – The input data to sample from
- n_epochs (int, optional) – The number of epochs to run in the output iterator. Use -1 to run infinitely.
Yields: Iterator[Tuple[Tensor]] – A batch of data, as a tuple of Tensors
-
static