flambe.nn.transformer_sru
¶
Module Contents¶
-
class
flambe.nn.transformer_sru.
TransformerSRU
(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidrectional: bool = False, **kwargs: Dict[str, Any])[source]¶ Bases:
flambe.nn.Module
A Transformer with an SRU replacing the FFN.
-
forward
(self, src: torch.Tensor, tgt: torch.Tensor, src_mask: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None, tgt_key_padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]¶ Take in and process masked source/target sequences.
Parameters: - src (torch.Tensor) – the sequence to the encoder (required). shape: \((N, S, E)\).
- tgt (torch.Tensor) – the sequence to the decoder (required). shape: \((N, T, E)\).
- src_mask (torch.Tensor, optional) – the additive mask for the src sequence (optional). shape: \((S, S)\).
- tgt_mask (torch.Tensor, optional) – the additive mask for the tgt sequence (optional). shape: \((T, T)\).
- memory_mask (torch.Tensor, optional) – the additive mask for the encoder output (optional). shape: \((T, S)\).
- src_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for src keys per batch (optional). shape: \((N, S)\).
- tgt_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for tgt keys per batch (optional). shape: \((N, T)\).
- memory_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for memory keys per batch (optional). shape” \((N, S)\).
Returns: output (torch.Tensor) – The output sequence, shape: \((T, N, E)\).
Note ([src/tgt/memory]_mask should be filled with) – float(‘-inf’) for the masked positions and float(0.0) else. These masks ensure that predictions for position i depend only on the unmasked positions j and are applied identically for each sequence in a batch. [src/tgt/memory]_key_padding_mask should be a ByteTensor where False values are positions that should be masked with float(‘-inf’) and True values will be unchanged. This mask ensures that no information will be taken from position i if it is masked, and has a separate mask for each sequence in a batch.
Note (Due to the multi-head attention architecture in the) – transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode.
where S is the source sequence length, T is the target sequence length, N is the batchsize, E is the feature number
-
-
class
flambe.nn.transformer_sru.
TransformerSRUEncoder
(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidirectional: bool = False, **kwargs: Dict[str, Any])[source]¶ Bases:
flambe.nn.Module
A TransformerSRUEncoder with an SRU replacing the FFN.
-
forward
(self, src: torch.Tensor, state: Optional[torch.Tensor] = None, mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None)[source]¶ Pass the input through the endocder layers in turn.
Parameters: - src (torch.Tensor) – The sequnce to the encoder (required).
- state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
- mask (torch.Tensor, optional) – The mask for the src sequence (optional).
- padding_mask (torch.Tensor, optional) – The mask for the src keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.
-
-
class
flambe.nn.transformer_sru.
TransformerSRUDecoder
(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, **kwargs: Dict[str, Any])[source]¶ Bases:
flambe.nn.Module
A TransformerSRUDecoderwith an SRU replacing the FFN.
-
forward
(self, tgt: torch.Tensor, memory: torch.Tensor, state: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]¶ Pass the inputs (and mask) through the decoder layer in turn.
Parameters: - tgt (torch.Tensor) – The sequence to the decoder (required).
- memory (torch.Tensor) – The sequence from the last layer of the encoder (required).
- state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
- tgt_mask (torch.Tensor, optional) – The mask for the tgt sequence (optional).
- memory_mask (torch.Tensor, optional) – The mask for the memory sequence (optional).
- padding_mask (torch.Tensor, optional) – The mask for the tgt keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.
- memory_key_padding_mask (torch.Tensor, optional) – The mask for the memory keys per batch (optional).
Returns: Return type: torch.Tensor
-
-
class
flambe.nn.transformer_sru.
TransformerSRUEncoderLayer
(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidirectional: bool = False, **kwargs: Dict[str, Any])[source]¶ Bases:
flambe.nn.Module
A TransformerSRUEncoderLayer with an SRU replacing the FFN.
-
forward
(self, src: torch.Tensor, state: Optional[torch.Tensor] = None, src_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None)[source]¶ Pass the input through the endocder layer.
Parameters: - src (torch.Tensor) – The sequence to the encoder layer (required).
- state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
- src_mask (torch.Tensor, optional) – The mask for the src sequence (optional).
- padding_mask (torch.Tensor, optional) – The mask for the src keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.
Returns: - torch.Tensor – Output Tensor of shape [S x B x H]
- torch.Tensor – Output state of the SRU of shape [N x B x H]
-
-
class
flambe.nn.transformer_sru.
TransformerSRUDecoderLayer
(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, **kwargs: Dict[str, Any])[source]¶ Bases:
flambe.nn.Module
A TransformerSRUDecoderLayer with an SRU replacing the FFN.
-
forward
(self, tgt: torch.Tensor, memory: torch.Tensor, state: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]¶ Pass the inputs (and mask) through the decoder layer.
Parameters: - tgt (torch.Tensor) – The sequence to the decoder layer (required).
- memory (torch.Tensor) – The sequence from the last layer of the encoder (required).
- state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
- tgt_mask (torch.Tensor, optional) – The mask for the tgt sequence (optional).
- memory_mask (torch.Tensor, optional) – the mask for the memory sequence (optional).
- padding_mask (torch.Tensor, optional) – the mask for the tgt keys per batch (optional).
- memory_key_padding_mask (torch.Tensor, optional) – the mask for the memory keys per batch (optional).
Returns: Output Tensor of shape [S x B x H]
Return type: torch.Tensor
-