mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-22 23:00:57 +08:00
Update 20201130 An attempt at implementing char-rnn with PyTorch.md
temp work save
This commit is contained in:
parent
6a5ec16cb7
commit
6b60d003c8
@ -7,28 +7,29 @@
|
||||
[#]: via: (https://jvns.ca/blog/2020/11/30/implement-char-rnn-in-pytorch/)
|
||||
[#]: author: (Julia Evans https://jvns.ca/)
|
||||
|
||||
An attempt at implementing char-rnn with PyTorch
|
||||
用PyTorch实现char-rnn
|
||||
======
|
||||
|
||||
Hello! I spent a bunch of time in the last couple of weeks implementing a version of [char-rnn][1] with PyTorch. I’d never trained a neural network before so this seemed like a fun way to start.
|
||||
你好!在过去的几周里,我花了很多时间用PyTorch实现了一个[char-rnn][1]的版本。我以前从未训练过神经网络,所以这可能是一个有趣的开始。
|
||||
|
||||
The idea here (from [The Unreasonable Effectiveness of Recurrent Neural Networks][1]) is that you can train a character-based recurrent neural network on some text and get surprisingly good results.
|
||||
这个想法(来自[The Unreasonable Effectiveness of Recurrent Neural Networks][1])可以让你在文本上训练一个基于字符的循环神经网络,并得到一些出乎意料的结果。
|
||||
|
||||
I didn’t quite get the results I was hoping for, but I wanted to share some example code & results in case it’s useful to anyone else getting started with PyTorch and RNNs.
|
||||
不过,虽然没有得到我想要的结果,但是我还是想分享一些示例代码和结果,希望对其他开始尝试使用PyTorch和RNNs的人有帮助。
|
||||
|
||||
Here’s the Jupyter notebook with the code: [char-rnn in PyTorch.ipynb][2]. If you click “Open in Colab” at the top, you can open it in Google’s Colab service where at least right now you can get a free GPU to do training on. The whole thing is maybe 75 lines of code, which I’ll attempt to somewhat explain in this blog post.
|
||||
这是Jupyter notebook格式的代码:[char-rnn in PyTorch.ipynb][2]。你可以点击这个网页最上面那个按钮Open in Colab,就可以在Google的Colab服务中打开,并使用免费的GPU进行训练。所有的东西加起来大概有75行代码,我将在这篇博文中尽可能地详细解释。
|
||||
|
||||
### step 1: prepare the data
|
||||
### 第一步:准备数据
|
||||
|
||||
First up: we download the data! I used [Hans Christian Anderson’s fairy tales][3] from Project Gutenberg.
|
||||
首先,我们要下载数据。我使用的是古登堡项目中的这个数据:[Hans Christian Anderson’s fairy tales][3]。
|
||||
|
||||
```
|
||||
!wget -O fairy-tales.txt
|
||||
```
|
||||
|
||||
Here’s the code to prepare the data. I’m using the `Vocab` class from fastai, which can turn a bunch of letters into a “vocabulary” and then use that vocabulary to turn letters into numbers.
|
||||
这个是准备数据的代码。我使用fastai库中的Vocab类进行数据处理,它能将一堆字母转换成“词表”,然后用这个“词表”把字母变成数字。
|
||||
|
||||
Then we’re left with a big array of numbers (`training_set`) that we can use to train a model.
|
||||
之后我们就得到了一大串数字(`训练集`),我们可以在这上面训练我们的模型。
|
||||
|
||||
```
|
||||
from fastai.text import *
|
||||
@ -39,14 +40,13 @@ num_letters = len(v.itos)
|
||||
```
|
||||
|
||||
### step 2: define a model
|
||||
### 第二步:定义模型
|
||||
|
||||
This is a wrapper around PyTorch’s LSTM class. It does 3 main things in addition to just wrapping the LSTM class:
|
||||
|
||||
1. one hot encode the input vectors, so that they’re the right dimension
|
||||
2. add another linear transformation after the LSTM, because the LSTM outputs a vector with size `hidden_size`, and we need a vector that has size `input_size` so that we can turn it into a character
|
||||
3. Save the LSTM hidden vector (which is actually 2 vectors) as an instance variable and run `.detach()` on it after every round. (I struggle to articulate what `.detach()` does, but my understanding is that it kind of “ends” the calculation of the derivative of the model)
|
||||
|
||||
这个是Pytorch中LSTM类的封装。除了封装LSTM类以外,它还做了三件事:
|
||||
|
||||
1. 对输入向量进行one-hot编码,使得他们具有正确的维度。
|
||||
2. 在LSTM层后一层添加一个线性变换,因为LSTM输出的是一个长度为`hidden_size`的向量,我们需要的是一个长度为`input_size`的向量这样才能把它变成一个字符。
|
||||
3. 把LSTM隐藏层的输出向量(实际上有2个向量)保存成实例变量,然后在每轮运行结束后执行`.detach()`函数。(我很难解释清`.detach()`的作用,但我的理解是,它在某种程度上“结束”了模型的求导计算(译者注:detach()函数是将该张量的requires_grad参数设置为False,即反向传播到该张量就结束。))
|
||||
|
||||
```
|
||||
class MyLSTM(nn.Module):
|
||||
@ -68,23 +68,21 @@ class MyLSTM(nn.Module):
|
||||
return self.h2o(l_output)
|
||||
```
|
||||
|
||||
This code also does something kind of magical that isn’t obvious at all – if you pass it in a vector of inputs (like [1,2,3,4,5,6]), corresponding to 6 letters, my understanding is that `nn.LSTM` will internally update the hidden vector 6 times using [backpropagation through time][4].
|
||||
这个代码还做了一些比较神奇但是不太明显的功能。如果你的输入是一个向量(比如[1,2,3,4,5,6]),对应六个字母,那么我的理解是`nn.LSTM`会在内部使用[backpropagation through time][4]更新隐藏向量6次
|
||||
|
||||
### step 3: write some training code
|
||||
### 第三步:编写训练代码
|
||||
|
||||
This model won’t just train itself!
|
||||
|
||||
I started out trying to use a training helper class from the `fastai` library (which is a wrapper around PyTorch). I found that kind of confusing because I didn’t understand what it was doing, so I ended up writing my own training code.
|
||||
|
||||
Here’s some code to show basically what 1 round of training looks like (the `epoch()` method). Basically what this is doing is repeatedly:
|
||||
|
||||
1. Give the RNN a string like `and they ought not to teas` (as a vector of numbers, of course)
|
||||
2. Get the prediction for the next letter
|
||||
3. Compute the loss between what the RNN predicted, and the real next letter (`e`, because tease ends in `e`)
|
||||
4. Calculate the gradient (`loss.backward()`)
|
||||
5. Change the weights in the model in the direction of the gradient (`self.optimizer.step()`)
|
||||
模型不会自己训练自己的!
|
||||
|
||||
我最开始的时候尝试用`fastai`库中的一个helper类(也是PyTorch中的封装)。我有点疑惑因为我不知道它在做什么,所以最后我自己编写了模型训练代码。
|
||||
|
||||
下面这些代码(epoch()方法)就是有关于一轮训练过程的基本信息。基本上就是重复做下面这几件事情:
|
||||
1. 往RNN模型中传入一个字符串,比如`and they ought not to teas`。(要以数字向量的形式传入)
|
||||
2. 得到下一个字母的预测结果。
|
||||
3. 计算RNN模型预测结果和真实的下一个字母之间的损失函数。(`e`,因为tease是以`e`结尾的)
|
||||
4. 计算梯度。(用`loss.backward()`函数)
|
||||
5. 沿着梯度下降的方向修改模型中参数的权重。(用`self.optimizer.step()`函数)
|
||||
|
||||
```
|
||||
class Trainer():
|
||||
@ -107,6 +105,9 @@ class Trainer():
|
||||
```
|
||||
|
||||
### let `nn.LSTM` do backpropagation through time, don’t do it myself
|
||||
### 使用`nn.LSTM`沿着时间反向传播,不要自己写代码。
|
||||
|
||||
|
||||
|
||||
Originally I wrote my own code to pass in 1 letter at a time to the LSTM and then periodically compute the derivative, kind of like this:
|
||||
|
||||
@ -179,7 +180,8 @@ my results are nowhere near as good as Karpathy’s so far, maybe due to one of
|
||||
|
||||
|
||||
|
||||
But I got some vaguely coherent results! Hooray!
|
||||
但我得到了一些大致说得过去的结果!还不错!
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user