Home

Practice in Huggingface Transformers

Practice in Huggingface Transformers

This post is just a note summarizing my practice on Pre-trained Language Models (PLMs) using Huggingface Transformers Package. I believe that these points I am summarizing is also confusing to other rookies of coding for transformer-based NLP models.

Click to read more ...

Variational Inference

Variational Inference

One of the core problems of modern statistics is to approximate difficult-to-compute (intractable) probability (usually conditional) densities. There are two main stream methods to solve it: Monte Carlo sampling (MCMC) and Variational inference. The former focuses on fitting continuous curve through a large amount of discrete values, while the latter employs a tractable simple distribution to approach the true distribution. Let’s understand it by an example of Variational Auto-encoder (VAE).

Click to read more ...

SSL Survey

SSL Survey

We are on the cusp of a major research revolution, which is made by deep learning. In my perspective, there two outstanding contributions on the network architecture in this revolution. They are $\textit{ResNet}$ and $Transformer$. As the research exploration continues to deepen and especially the increase of computational capacity, the technology using unlabeled data attracts more and more concentrations. There is no doubt that the self-supervised learning (SSL) is a direction deserve diving into and a general methodology contribution in the revolution. Therefore, this post will focus survey the cutting-edge development of SSL from the following aspects: theory guarantee, image SSL, sequence SSL and graph SSL.

Click to read more ...

Subjective, Objective, Assumption and Modeling

Subjective, Objective, Assumption and Modeling

Overview

There are 2 main methods to estimate parameters in statistics. i.e. frequency-based method (Maximum Likelihood Estimation) and Bayesian-based method (Bayesian Estimation). Frankly speaking, we are supposed to get a deep insight to it and have a intuitive understanding on estimation. In this note, I provisionally offer some dimensions to explore it. i.e. motivation, theoretical guarantee and algorithm. Maybe I will enrich it in my future study. Note that there is just a difference in modeling unknown thing between 2 methods, but both are statistical method, which aims to estimate the whole distribution from the sampling data (in some case, do hypothesis then verify it).

Click to read more ...

Daily Reading

Daily Reading

A Diversity-Promoting Objective Function for Neural Conversation Models

Summary & Intuitions

  • mutual information between source (message) and target (response)
  • lack of theoretical guarantee

Contributions

  • decompose formula of mutual information:
    • anti-lm: penalize not only high-frequency, generic responses but also grammatical sentence $\rightarrow$ weights of tokens decrease monotonically (early important + lm dominant later)
    • bidi-lm: not searching but reranking (generate grammatical sequences and then re-rank them according to the objective of inversed probability)

Click to read more ...

Daily Reading

Daily Reading

Tensor2Tensor: One Model to Learn to Them All

Summary & Intuitions

  • multi-modality multi-task learning
  • modality-specific subnets: typical pipelines
  • modality-agnostic body: separable convolution (row conv + depth-wise conv, due to 1-d, 2-d inputs) and attention mechanism (self-attended + Query-presence-attended)
  • joint training of tasks with deficient and sufficient data

Contributions

  • engineering considerations of modality subnets
    • language input: linear mapping
    • language output: linear mapping + softmax
    • image input: 2 separable conv + 1 pooling + residual link
    • categorical output: 3 separable conv + 1 pooling + residual link + 2 separable conv + GAP + linear
    • Audio input and output: wave (1d) or spectrogram (2d) is the same as aforementioned image input

Click to read more ...