mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
Update 20201130 An attempt at implementing char-rnn with PyTorch.md
temp work
This commit is contained in:
parent
6b60d003c8
commit
326eac0656
@ -108,8 +108,7 @@ class Trainer():
|
||||
### 使用`nn.LSTM`沿着时间反向传播,不要自己写代码。
|
||||
|
||||
|
||||
|
||||
Originally I wrote my own code to pass in 1 letter at a time to the LSTM and then periodically compute the derivative, kind of like this:
|
||||
开始的时候我自己写的代码,每次传一个字母到LSTM层中,之后定期计算导数,就像下面这样:
|
||||
|
||||
```
|
||||
for i in range(20):
|
||||
@ -122,19 +121,20 @@ loss.backward()
|
||||
self.optimizer.step()
|
||||
```
|
||||
|
||||
This passes in 20 letters (one at a time), and then takes a training step at the end. This is called [backpropagation through time][4] and Karpathy mentions using this method in his blog post.
|
||||
这段代码每次传入一个字母,并且在最后训练了一次。这个步骤就被称为反向传播[backpropagation through time][4],Karpathy在他的博客中就是用这种方法。
|
||||
|
||||
This kind of worked, but my loss would go down for a while and then kind of spike later in training. I still don’t understand why this happened, but when I switched to instead just passing in 20 characters at a time to the LSTM (as the `seq_len` dimension) and letting it do the backpropagation itself, things got a lot better.
|
||||
这个方法有些用处,我编写的损失函数开始能够下降一段时间,但之后就会出现峰值。我不知道为什么会出现这种现象,但之后我改为一次传入20个字符之后,再进行反向传播,情况就变好了。
|
||||
|
||||
### step 4: train the model!
|
||||
### 第四步:训练模型!
|
||||
|
||||
I reran this training code over the same data maybe 300 times, until I got bored and it started outputting text that looked vaguely like English. This took about an hour or so.
|
||||
我在同样的数据上重复执行了这个训练代码大概300次,直到模型开始输出一些看起来像英文的文本。差不多花了一个多小时吧。
|
||||
|
||||
In this case I didn’t really care too much about overfitting, but if you were doing this for a Real Reason it would be good to run it on a validation set.
|
||||
这种情况下我也不关注模型是不是过拟合了,但是如果你在真实场景中训练模型,应该要在验证集上验证你的模型。
|
||||
|
||||
### step 5: generate some output!
|
||||
### 第五步:生成输出!
|
||||
|
||||
The last thing we need to do is to generate some output from the model! I wrote some helper methods to generate text from the model (`make_preds` and `next_pred`). It’s mostly just trying to get the dimensions of things right, but here’s the main important bit:
|
||||
最后一件要做的事就是用这个模型生成一些输出。我写了一个helper方法从这个训练好的模型中生成文本(`make_preds`和`next_pred`)。这里主要是把向量的维度对齐,重要的一点是:
|
||||
|
||||
```
|
||||
output = rnn(input)
|
||||
@ -142,13 +142,11 @@ prediction_vector = F.softmax(output/temperature)
|
||||
letter = v.textify(torch.multinomial(prediction_vector, 1).flatten(), sep='').replace('_', ' ')
|
||||
```
|
||||
|
||||
Basically what’s going on here is that
|
||||
|
||||
1. the RNN outputs a vector of numbers (`output`), one for each letter/punctuation in our alphabet.
|
||||
2. The `output` vector isn’t **yet** a vector of probabilities, so `F.softmax(output/temperature)` turns it into a bunch of probabilities (aka “numbers that add up to 1”). `temperature` kind of controls how much to weight higher probabilities – in the limit if you set temperature=0.0000001, it’ll always pick the letter with the highest probability.
|
||||
3. `torch.multinomial(prediction_vector)` takes the vector of probabilities and uses those probabilites to pick an index in the vector (like 12)
|
||||
4. `v.textify` turns “12” into a letter
|
||||
|
||||
基本是做的事情就是这些:
|
||||
1. RNN层为字母表中的每一个字母或者符号输出一个数值向量(`output`)。
|
||||
2. 这个输出向量**并不是**一个概率的向量,所以需要`F.softmax(output/temperature)`操作,将其转换为概率值(也就是所有数值加起来和为1)。`temperature`某种程度上控制了对更高概率的权重,在限制范围内,如果设置temperature=0.0000001,它将始终选择概率最高的字母。
|
||||
3. `torch.multinomial(prediction_vector)`用于获取概率向量,并使用这些概率在向量中选择一个索引(如12)。
|
||||
4. `v.textify`把“12”转换为字母。
|
||||
|
||||
|
||||
If we want 300 characters worth of text, we just repeat this process 300 times.
|
||||
|
Loading…
Reference in New Issue
Block a user