Daily Reading

Daily Reading

Tensor2Tensor: One Model to Learn to Them All

Summary & Intuitions

  • multi-modality multi-task learning
  • modality-specific subnets: typical pipelines
  • modality-agnostic body: separable convolution (row conv + depth-wise conv, due to 1-d, 2-d inputs) and attention mechanism (self-attended + Query-presence-attended)
  • joint training of tasks with deficient and sufficient data

Contributions

  • engineering considerations of modality subnets
    • language input: linear mapping
    • language output: linear mapping + softmax
    • image input: 2 separable conv + 1 pooling + residual link
    • categorical output: 3 separable conv + 1 pooling + residual link + 2 separable conv + GAP + linear
    • Audio input and output: wave (1d) or spectrogram (2d) is the same as aforementioned image input

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer

Summary & Intuitions

  • weight matrices based experts
  • block based dropout
  • learnable dropout and expert contribution weight

Contributions

  • noisy top-k softmax gating network outputs sparsity vector, which is used for weighted sum and choose experts (i.e., 0 value of one component means absence of associated expert)
  • large scale with thousands of experts

Mixture Content Selection for Diverse Sequence Generation

Summary & Intuitions

  • hard mixture-of-expert with 0, 1 selective vector
  • separate selection and generation, because joint training need intractable enumeration
  • training by heuristic focus and testing by sampling focus (i.e., guidance as the target of selector during training)

Contributions

  • modeling latent focus explicitly by Bi-GRU + Bernoulli distribution, which has same length as the input sequence
  • E-step: select expert have best fitting to guidance
  • M-step: concatenate sequence embedding with focus vector of the best expert and generate sequence as usual
  • guidance design ranges from word stem occurrence to minimal cover according to specific generation task