7月 17

Error BackPropagation 实践 & PRML 5.3

本文主要记录BP矩阵实现。鉴于GPU服务器还没买来,现在暂时用C++编码,反正矩阵上的实现都是一样的。若公式显示有问题需要多刷新几次。

Back Propagation虽然很简单,但是每次要实现就又不太清楚了,这次可以记下来,便于温习。主要是矩阵形式有点费脑子。正好要弄个Autoencoder,就以它为例。网络如下。弄好了可以用来搞Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions (Socher)。

正好在这个例子中,推导跟PRML5.3是一模一样的,直接抄PRML。把上面那个图一层Autoencoder抠出来,配上PRML Figure 5.7的符号,构成下图。

主要公式:$\frac{\partial E_n}{\partial w_{ji}}=\frac{\partial E_n}{\partial a_j}\frac{\partial a_j}{\partial w_{ji}}$,其中右侧$\frac{\partial a_j}{\partial w_{ji}}=z_i=x_i$

​左侧:$\delta_j=\frac{\partial E_n}{\partial a_j}=\sum\limits_k \frac{\partial E_n}{\partial a_k}\frac{\partial a_k}{\partial a_{j}}=f'(a_j)\sum\limits_k w_{kj}\delta_k$

现利用上面公式计算 PRML 5.3.2 中的例子,准备工作如下:

语言:C++,VS2012,矩阵库:MTL 4,依赖:boost,其中initializer_list出错,要不安装Microsoft Visual C++ Compiler Nov 2012 CTP,要不直接下载本文中的那个文件到include即可。

符号表示:
​WS = WORD_EMBEDDING_SIZE​
$w^{(1)}$ = mtr_encode
$w^{(2)}$ = mtr_decode
$\delta_k$ = vec_delta_out
$\delta_j$ = vec_delta_hidden
$x_i$ = vec_input
$z_i$ = vec_rep
$y_k$ = vec_output
 

图例如下,所有向量均为列向量:

首先应正向传播

mtl_col_vector vec_input(2*WORD_EMBEDDING_SIZE, 0.0);
mtl_col_vector vec_output(2*WORD_EMBEDDING_SIZE, 0.0);
mtl_col_vector vec_a(WORD_EMBEDDING_SIZE, 0.0);

//the size of row is the size of result vector
//the size of column is the size of input vector
mtl_row_matrix mtr_encode(WORD_EMBEDDING_SIZE, 2*WORD_EMBEDDING_SIZE, 0.0);
mtl_row_matrix mtr_decode(2*WORD_EMBEDDING_SIZE, WORD_EMBEDDING_SIZE, 0.0);
vec_a = mtr_encode * vec_input;

for(int j=0; j<WORD_EMBEDDING_SIZE; j++)
	    vec_rep[j] = (exp(vec_a[j])-exp(-1.0*vec_a[j]))/(exp(vec_a[j])+exp(-1.0*vec_a[j]));

vec_output = mtr_decode * vec_rep;

开始Back Propagation,第一步:

$\delta_k=y_k-t_k$  (PRML 5.65)

vec_delta_out = vec_output - vec_input

第二步,最关键的一步,计算$\delta_j$,

$\delta_j=(1-z_j^2)\sum\limits_{k=1}^{K}w_{kj}\delta_k$  (PRML 5.66)

//计算左半部分 1-z_j^2
mtl_col_vector vec_rep_square(WORD_EMBEDDING_SIZE, 0.0);
for(int j=0; j<word_embedding_size; j++)
    vec_rep_square[j] = 1.0 - vec_rep[j]*vec_rep[j];

//计算右半部分,如下图
mtl_col_vector vec_sigma_kj;
vec_sigma_kj = mtl::matrix::trans(mtr_decode)*vec_delta_out;

//不想查接口了,我自己直接替他把两个向量乘一下
mtl_col_vector vec_delta_hidden;
for(int j=0; j<WORD_EMBEDDING_SIZE; j++)
    vec_delta_hidden[j] = vec_rep_square[j] * vec_sigma_kj[j];

第三步,调整权重,

$\frac{\partial E_n}{\partial w_{ji}^{(1)}} = \delta_j x_i$  $\frac{\partial E_n}{\partial w_{kj}^{(2)}} = \delta_k z_j$(PRML 5.67)

mtl_row_matrix mtr_adjust_encode;
mtl_row_matrix mtr_adjust_decode;

mtr_adjust_encode = (LEARN_RATE*vec_delta_hidden) * mtl::vector::trans(vec_input);
mtr_adjust_decode = (LEARN_RATE*vec_delta_out) * mtl::vector::trans(vec_rep);

mtr_encode -= mtr_adjust_encode;
mtr_decode -= mtr_adjust_decode;

 

最后应该Check Gradient,见Standford UFLDL课程

// gradient checking
mtl_row_matrix w1, w2;
w1 = mtr_adjust_encode/LEARN_RATE;
w2 = mtr_adjust_decode/LEARN_RATE;
double epsilon = 1e-5;

for(int i=0; i<WORD_EMBEDDING_SIZE; i++)
{
	for(int j=0; j<WORD_EMBEDDING_SIZE*2; j++)
    {
		mtl_row_matrix w_offset = mtr_encode;
		w_offset[i][j] -= epsilon;

		//calc forword error
		mtl_col_vector vec_tmp_a(WORD_EMBEDDING_SIZE, 0.0);
		mtl_col_vector vec_tmp_rep(WORD_EMBEDDING_SIZE, 0.0);
		vec_tmp_a = w_offset * vec_input;

		for(int k=0; k<WORD_EMBEDDING_SIZE; k++)
			vec_tmp_rep[k] = (exp(vec_tmp_a[k])-exp(-1.0*vec_tmp_a[k]))/(exp(vec_tmp_a[k])+exp(-1.0*vec_tmp_a[k]));

		mtl_col_vector vec_tmp_out(2*WORD_EMBEDDING_SIZE, 0.0);
		vec_tmp_out = mtr_decode * vec_tmp_rep;

		double tmp_error = 0.0;
		for(int k=0; k<2*WORD_EMBEDDING_SIZE; k++) {
			tmp_error += (vec_input[k]-vec_tmp_out[k])*(vec_input[k]-vec_tmp_out[k])/2.0;
		}

		double gi= (derror - tmp_error) / (epsilon);

		cout<<gi<<" "<<w1[i][j]<<endl;

	}
}

TestDemo:

7月 09

Deep Learning for NLP 文章列举

慢慢补充
大部分文章来自:
包括从他们里面的论文里找到的related work
 
Word Embedding Learnig
Antoine Bordes, et al. 【AAAI'11】Learning Structured Embeddings of Knowledge Bases
our model learns one embedding for each entity (i.e. one low dimensional vector) and one operator for each relation (i.e. a matrix).
Ronan Collobert, et al.【JMLR'12】Natural Language Processing (Almost) from Scratch
 
待读列表:
Semi-supervised learning of compact document representations with deep networks
【UAI'13】Modeling Documents with a Deep Boltzmann Machine
 
Language Model
博士论文:Statistical Language Models based on Neural Networks 这人貌似在ICASSP上有个文章
 
Sentiment
 
other NLP 以下内容见socher主页
Parsing with Compositional Vector Grammars
Better Word Representations with Recursive Neural Networks for Morphology
Semantic Compositionality through Recursive Matrix-Vector Spaces
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks
Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing
 
Tutorials
Ronan Collobert and Jason Weston【NIPS'09】Deep Learning for Natural Language Processing
Richard Socher, et al.【NAACL'13】【ACL'12】Deep Learning for NLP
Yoshua Bengio【ICML'12】Representation Learning
Leon Bottou, Natural language processing and weak supervision
 
6月 03

Deep Learning 学习资料

一年以前也发过一个类似的日志,可惜后来没好好去学,这回真的要好好学了,之后相关资料都记到这里。
1. hinton的主页: http://www.cs.toronto.edu/~hinton/
2. Deep Learning 主页: http://deeplearning.net/
3. http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial 这个高级货之前一直记着的,竟然忘记了,终于想起来了!资料列表竟然连这个也没有岂不搞笑。
 
其他琐碎:
http://www.iro.umontreal.ca/~lisa/publications2/index.php/authors/show/1
 
论文:
Hinton,【AI'89】Connectionist learning procedures
dropout的论文:Improving neural networks by preventing co-adaptation of feature detectors
【RNN】Distributed Representations, Simple Recurrent Networks, and Grammatical Structure
【RAE】Recursive distributed representations
【RAE】Linear Recursive Distributed Representations
 
【CNN】Gradient-Based Learning Applied to Document Recognition
【CNN】Notes on Convolutional Neural Network
【CNN】Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis
【CNN】【NIPS'12】Imagenet classification with deep convolutional neural networks
4月 18

Deep Belief Networks 学习资料

相比于LDA,DBN还算是比较新的比较火的东东。LDA已经被人做烂了,因此研究之是为了做好Baseline,学习DBN可是要用的,要好好学习,希望能找到灵感。LDA网上学习资料很多,而DBN相对来说就比较少了,因此要在这里记录一下:

1. hinton的主页: http://www.cs.toronto.edu/~hinton/

2. Deep Learning 主页: http://deeplearning.net/