Cheatsheet for Backpropagation
Mind Map of Reinforcement Learning
Dec 2016/Jun 2017. Special Talks on Scientific Writing [slides]
27 Mar 2017 |
|||
11 May/08 Jun |
Deep learning is far beyond CNNs, RNNs, etc. In these two seminars, Yunchuan and I introduced several recent techniques of sequence (sentence) generation, including sampling approaches, reinforcement learning, and variational autoencoding. | ||
23/27 Apr |
Trans* is a family method of learning the vector representations of entities and relations in a knowledge base (or a knowledge graph---Don't ask me the difference). From |h+t-r| started all. | ||
04 Apr 2016 |
(Courtesy of Yunchuan) A combination of convolutional neural network (CNN) and Monta Carto tree search (MCTS). |
||
27 Mar 2016 |
A series work from Noah's Ark Lab, Huawei. My understanding is to design a (complicated) neural network to mimic human behaviors: modeling a sentence, querying a table/KB, selecting a field/column, selecting a row, copying something, etc. Several challenges of end-to-end neuralized learning include differentiability, supervision, scalability. |
||
26 Mar 2016 |
(Courtesy of Yu Wu) Ankyrin G (AnkG) plays a critical role at the axon initial segment (AIS). AnkG downregulartion induces impaired selective filtering machineary at AIS. Impaired AIS filtering might underlie functional defects in APP/PS1 neurons. Disclaimer: I am not an expert in neural science. |
||
08 Mar 2016 |
A combition of neural networks and game theory. Imagine that we have two agents Generator and Discriminator: G generates fake samples, while D tries to distinguish these fake samples in disguise. The objective is to minimizeG maxD V(D,G). |
||
28 Oct 2015 |
Variational Autoencoders |
(By Yunchuan) Variational autoencoders give a distribution of hidden variables, z, while traditional autoencoders compute z in a deterministic way. But why is it useful in practice? |
|
28 Oct 2015 |
Including EasyAdapt, instance weighting, and structural correspondence learning. I am, in fact, curious about adaptation in neural network-based settings. However, NNs are adaptable by the incremental/multi-task training nature. Therefore, there is little point, as far as I can currently see, in NN adaptation. Nevertheless, I have conducted a series of comparative studies to shed more light on transferring knowledge in neural networks [pdf (EMNLP-16)]. |
||
21 Oct 2015 |
Let x be visible variables, and z be invisible (hidden) ones. Estimating p(x) is usually difficult because we have to sum/integrate over z. A variational lower bound peaks when z~p(z|x), which is oftentimes intractable. The mixture of Gaussian, for example, assumes z in parametric forms, i.e., Gaussian. In VI in general, we still have to restrict the form of z, but not in a parametric way. A typical approximation is factorization, that it, p(z)=| |i p(zi). |
||
14 Oct 2015 |
Attention-based Networks |
(By Hao Peng) The encoding-decoding model opens a new era of sequence generation. It is unrealistic, however, to encode a very long input sequence to a fixed vector. The attention mechanism is designed to aggregate information over the input sequence by an adaptable weighted sum. Selected Papers: NIPS'14, pp. 3104--3112, ICLR'15, ICML'15 EMNLP'15, pp. 319--389 EMNLP'15, pp. 1412--1421 |
|
14 Oct 2015 |
We wrap up discourse analysis by PCFG-based discourse parsing, which requires probabilistic context-free grammar in general. |
||
23 Sep 2015 |
We shall also explore various NLP research topics, and discourse analysis, discussed in this seminar, precedes our horizon expansion. Notice that the slide is nothing but snapshots of papers in the proceedings, and in fact has little substance. |
||
22 Jul 2015 |
I am a tyro in variational inference. Please refer to Ch 10, Pattern Recognition and Machine Learning. |
||
Bad news: Thursday evening's
seminars are suspended temporarily. |
|||
Ch 1: Losses, Risks and Decision Principles |
|||
|
Resources: |
My
textual digest, highlighting some meaningful philosophy
discussion in the textbook. |
|
Ch 3: Prior Information and Subjective Probability [digest, note, slide by Dr. Yu] |
|||
Frequentist vs Bayesian | |||
30 Apr 2015 |
1. (By Yangyang Lu) A guided tour to selected papers. |
||
Equipped with Bayesian logistic regression and GP in general, we find GP classification is easy except the seemingly overwhelming formulas. |
|||
29 Apr 2015 |
(Courtesy of Yunchuan Chen) God does not play dices, but we human do. As inference in many machine learning models is intractable, we have to resort to some approximations, among which are sampling methods. The idea of sampling is straightforward---if we want to estimante p(Head) of a coin, one approach is to go through all mathematical and physical details, which does not seem to be a good idea; an alternative is to toss the coin multiple times, giving a fairly good estimation of p(Head). However, how to design efficient sampling algorithms is a $64,000,000 question. |
||
23 Apr 2015 |
Linear
Classification |
We first wrap up our discussion of Gaussian processes by introducing hyperparameter learning in kernels. Then we introduce linear classification models, including discriminant functions, probabilistic generative/ discriminative models, and Bayesian logistic regression (with special interest). Linear classification is easy---my good old friend, logistic regression, always serves as a baseline method in various applications. Through a systematic study, however, we can grasp the main idea behind a range of machine learning techniques. This seminar also precedes our future discussion on GP classfication. |
|
17 Apr 2015 |
Sum Product Networks |
(By Weizhuo Li) On some theortical aspects of SPNs, e.g., normalizing, decompositionality, etc. Weizhuo also highlighted a '11 NIPS paper on deep architectures vis-a-vis shallow ones. |
|
16 Apr 2015 |
(Courtesy of Yangyang Lu) |
||
09 Apr 2015 |
Gaussian
Processes |
In this seminar, we introduce Gaussian process regression,
which extends Bayesian linear regression with kernels. However, as
far as I am concerned, the two models are not equivalent, even
with finite basis functions. If I were wrong, please feel free to
tell me. |
|
14 Jan 2015 |
(Courtesy of Weizhuo Li) Sum product networks (SPNs) are a way of decomposition joint distributions. Most inference is tractable w.r.t. the size the the SPN network. However, it seems that graphical models, if converted to SPNs, have exponential numbers of nodes in SPNs. The story confirms the "no free lunch theorem." As in general no perfect "I-map" exists for most real-world applications, what we have to do is to capture important aspects by ignoring unimportant ones. |
||
7 Jan 2015 |
One of the most core concepts in deep learning is that "do things wrongly and hope they work." G. Hinton introduced CD-k algorithm for fast training restricted Boltzmann machines; he also introduced layer-wise RBM pretraining for neural networks, opening an era of deep learning. |
||
19 Dec 2014 |
Copulas |
Given marginal distributions, the joint distribution in not unique because of all possible kinds of independencies among varibles. A copula is defined as the joint distribution on a unit cube with uniform marginals. It can (just can) capture nontrivial independencies and link marginals with joint distributions. Sklar's theorem says, Copula(Marginals)=Joint |
|