Home

LM Survey

LM Survey

Overview

My last post was mainly based on the question generation papers from ACL’20. However, there are quite a few papers from previous years due to recurssive reading process. Indeed I know little about history of language models and typical NLP tasks. Let’s step forward by defining some notations or denotations. As we know, the unit of NLP is tokens or token embeddings. Note a token doesn’t mean a word actually, though we can regard a word as a printablt token. That means there are some non-printable tokens, i.e. Classifier tag, Separator tag, Begin of Sequence, Start of Sequence and End of Sequence et al. . Printability and non-printability are the notions from ASCII codes (i.e. 31 non-printable + 95 printable + 1 non-printable). By the way, It is obvious that ASCII characters, Unicode characters or any available single-width characters belong to printable tokens. In addition to token, another core concept is sequence. Because a single token scarcely occures alone, it co-occures with other tokens based on ordering or temporal depenency, which is also the key challenge of modeling natural languages. Therefore, sequence embeddings (the list of token embddings in this sequence) is the content of data flow in NLP tasks. On the contrary, feature maps (the collection of multiple channels of image features) is the counterpart in CV tasks. Besides, like croping and resizing to procure well-shaped batch data in CV tasks, we need truncating or padding to process too long or too short sentences. Padding operation, however, inevitably compute extra gradients to the input text during back propagation. Conventionally we feed a mask with the input text to specify the gradientless span within the text, which also has an ambiguous and confusing name attention_mask in Transfromer based architectures.

Let’s go through the state-of-the-art models at the time and have a comprehensive understanding of transfer learning in NLP.

Click to read more ...

QG Survey

QG Survey

Overview

This post mainly consists of papers on Question Generation (QG) from ACL’20 and controllable diversity.

Learn from Your Neighbor: Learning Multimodal Mapping from Sparse Annotations

Summary & Intuitions

  • place not penalization but brief on plausible predictions
  • inspired by semi-supervised learning, annotated neighbors are important signals for diversity
  • neighbor definition and neighbor penalization

Contributions

  • neighbor definition: semantic space and distance metrics (similarity)
  • neighbor penalization:
    • label-missing multi-label (or multi-way) classification: similarity as loss weight
    • sequence generation from image or text input:
      • overall weight: similarity from double inputs
      • current token attends neighbor input for: (address the issue of unrelated visual objects and semantic tokens)
        • token-wise weight
        • modulated sequence generation of neighbors (language attention)
        • choose image region or feature (visual attention)
      • note: non-trivial derivatives of weights is necessary

Click to read more ...

GNN Survey

[TOC]

GNN Survey

Overview

In my opinion, graph neural networks is worth exploring within research field, not engineering field, especially large scale application. Anyway, this survey is a naive summarization of my recent reading. Writing in English is just a practice of my poor writing ability.

Without whistles and bells, there are some equivalent concepts or notions in graph terminology. First of all, graph signal is general representation, which stands for collection of node embeddings or node signals whatever domain we talk about. Graph Fourier Transformation (GFT) is just discrete version Fourier Transformation (FT). Fourier Transformation (FT) converts temporal signals to frequency signal with the help of operator $e^{-iwt}$ (sine, cosine base function), and vice versa. In a similar way, GFT converts spatial domain to spectral domain with the help of $\phi^T$ (eigen base vector).

Note spatial domain is also known as vertex domain , graph domain or data space domain. In the contrast, Spectral domain is also denoted as feature space domain.

Click to read more ...

CPP-Exception

CPP-Exception

some moral or ethics

// Example 2(b): Very Buggy Class
//
class X : Y {
  T* t_;
  Z* z_;
public:
  X()
  try
    : Y(1)
    , t_( new T( static_cast<Y*>(this) )
    , z_( new Z( static_cast<Y*>(this), t_ ) )
  {
    /*...*/
  }
  catch(...)
  // Y::Y or T::T or Z::Z or X::X's body has thrown
  {
    // Q: should I delete t_ or z_? (note: not legal C++)
  }
};

Therefore the status quo can be summarized as follows:

Moral #1: Constructor function-try-block handlers have only one purpose – to translate an exception. (And maybe to do logging or some other side effects.) They are not useful for any other purpose.

Moral #2: Since destructors should never emit an exception, destructor function-try-blocks have no practical use at all.[6] There should never be anything for them to detect, and even if there were something to detect because of evil code, the handler is not very useful for doing anything about it because it can not suppress the exception.

Moral #3: Always perform unmanaged resource acquisition in the constructor body, never in initializer lists. In other words, either use “resource acquisition is initialization” (thereby avoiding unmanaged resources entirely) or else perform the resource acquisition in the constructor body.

For example, building on Example 2(b), say T was char and t_ was a plain old char* that was new[]’d in the initializer-list; then in the handler there would be no way to delete[] it. The fix would be to instead either wrap the dynamically allocated memory resource (e.g., change char* to string) or new[] it in the constructor body where it can be safely cleaned up using a local try-block or otherwise.

Moral #4: Always clean up unmanaged resource acquisition in local try-block handlers within the constructor or destructor body, never in constructor or destructor function-try-block handlers.

Moral #5: If a constructor has an exception specification, that exception specification must allow for the union of all possible exceptions that could be thrown by base and member subobjects. As Holmes might add, “It really must, you know.” (Indeed, this is the way that the implicitly generated constructors are declared; see GotW #69.)

Moral #6: If a constructor of a member object can throw but you can get along without said member, hold it by pointer and use the pointer’s nullness to remember whether you’ve got one or not, as usual. Use the Pimpl idiom to group such “optional” members so you only have to allocate once.

And finally, one last moral that overlaps with the above but is worth restating in its own right:

Moral #7: Prefer using “resource acquisition is initialization” to manage resources. Really, really, really. It will save you more headaches than you can probably imagine.

Click to read more ...

padding

padding

  • ARIES algorithm

    aka $[‘eriz]$ 白羊宫

    • A Transaction Recovery Method Supporting Fine Granularity Locking and Partial Rollback Using Write Ahead Logging

Click to read more ...

SVM

SVM 简单分析

原始式

\[\min_{\omega, b}\frac{1}{2}||w||^2 \\ s.t. (y_i(\omega^Tx_i+b) - 1) \ge 0\\ \alpha_i \ge 0 \\ 拉格朗日最值充分条件 => 对偶问题 \\ L() = \frac{1}{2}||w||^2 + \sum_{i = 1}^{m}\alpha_i(y_i(\omega^Tx_i+b) - 1) \\ \omega = \sum_{i=1}^{m}\alpha_iy_ix_i \\ 0 = \sum_{i = 1}^{m}\alpha_iy_i \\ 整理得 \\ \max_{\alpha} \sum_{i = 1}^{m}\alpha_i - \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_jx_i^Tx_j \\ s.t. \alpha_i \ge 0 \\ 0 = \sum_{i = 1}^{m}\alpha_iy_i \\\]

Click to read more ...