# Variational Inference

One of the core problems of modern statistics is to approximate difficult-to-compute (intractable) probability (usually conditional) densities. There are two main stream methods to solve it: Monte Carlo sampling (MCMC) and Variational inference. The former focuses on fitting continuous curve through a large amount of discrete values, while the latter employs a tractable simple distribution to approach the true distribution. Let’s understand it by an example of Variational Auto-encoder (VAE).

# SSL Survey

We are on the cusp of a major research revolution, which is made by deep learning. In my perspective, there two outstanding contributions on the network architecture in this revolution. They are $\textit{ResNet}$ and $Transformer$. As the research exploration continues to deepen and especially the increase of computational capacity, the technology using unlabeled data attracts more and more concentrations. There is no doubt that the self-supervised learning (SSL) is a direction deserve diving into and a general methodology contribution in the revolution. Therefore, this post will focus survey the cutting-edge development of SSL from the following aspects: theory guarantee, image SSL, sequence SSL and graph SSL.

# Subjective, Objective, Assumption and Modeling

## Overview

There are 2 main methods to estimate parameters in statistics. i.e. frequency-based method (Maximum Likelihood Estimation) and Bayesian-based method (Bayesian Estimation). Frankly speaking, we are supposed to get a deep insight to it and have a intuitive understanding on estimation. In this note, I provisionally offer some dimensions to explore it. i.e. motivation, theoretical guarantee and algorithm. Maybe I will enrich it in my future study. Note that there is just a difference in modeling unknown thing between 2 methods, but both are statistical method, which aims to estimate the whole distribution from the sampling data (in some case, do hypothesis then verify it).

## A Diversity-Promoting Objective Function for Neural Conversation Models

### Summary & Intuitions

• mutual information between source (message) and target (response)
• lack of theoretical guarantee

### Contributions

• decompose formula of mutual information:
• anti-lm: penalize not only high-frequency, generic responses but also grammatical sentence $\rightarrow$ weights of tokens decrease monotonically (early important + lm dominant later)
• bidi-lm: not searching but reranking (generate grammatical sequences and then re-rank them according to the objective of inversed probability)

## Tensor2Tensor: One Model to Learn to Them All

### Summary & Intuitions

• modality-specific subnets: typical pipelines
• modality-agnostic body: separable convolution (row conv + depth-wise conv, due to 1-d, 2-d inputs) and attention mechanism (self-attended + Query-presence-attended)
• joint training of tasks with deficient and sufficient data

### Contributions

• engineering considerations of modality subnets
• language input: linear mapping
• language output: linear mapping + softmax
• image input: 2 separable conv + 1 pooling + residual link
• categorical output: 3 separable conv + 1 pooling + residual link + 2 separable conv + GAP + linear
• Audio input and output: wave (1d) or spectrogram (2d) is the same as aforementioned image input

## Overview

A general framework of XX task is made up of 3 phases: extraction, understanding and reasoning.