Update 20201130 An attempt at implementing char-rnn with PyTorch.md

temp work
This commit is contained in:
zxp 2020-12-11 00:16:05 +08:00
parent 6b60d003c8
commit 326eac0656

View File

@ -108,8 +108,7 @@ class Trainer():
### 使用`nn.LSTM`沿着时间反向传播,不要自己写代码。 ### 使用`nn.LSTM`沿着时间反向传播,不要自己写代码。
开始的时候我自己写的代码每次传一个字母到LSTM层中之后定期计算导数就像下面这样
Originally I wrote my own code to pass in 1 letter at a time to the LSTM and then periodically compute the derivative, kind of like this:
``` ```
for i in range(20): for i in range(20):
@ -122,19 +121,20 @@ loss.backward()
self.optimizer.step() self.optimizer.step()
``` ```
This passes in 20 letters (one at a time), and then takes a training step at the end. This is called [backpropagation through time][4] and Karpathy mentions using this method in his blog post. 这段代码每次传入一个字母,并且在最后训练了一次。这个步骤就被称为反向传播[backpropagation through time][4]Karpathy在他的博客中就是用这种方法。
This kind of worked, but my loss would go down for a while and then kind of spike later in training. I still dont understand why this happened, but when I switched to instead just passing in 20 characters at a time to the LSTM (as the `seq_len` dimension) and letting it do the backpropagation itself, things got a lot better. 这个方法有些用处我编写的损失函数开始能够下降一段时间但之后就会出现峰值。我不知道为什么会出现这种现象但之后我改为一次传入20个字符之后再进行反向传播情况就变好了。
### step 4: train the model! ### step 4: train the model!
### 第四步:训练模型!
I reran this training code over the same data maybe 300 times, until I got bored and it started outputting text that looked vaguely like English. This took about an hour or so. 我在同样的数据上重复执行了这个训练代码大概300次直到模型开始输出一些看起来像英文的文本。差不多花了一个多小时吧。
In this case I didnt really care too much about overfitting, but if you were doing this for a Real Reason it would be good to run it on a validation set. 这种情况下我也不关注模型是不是过拟合了,但是如果你在真实场景中训练模型,应该要在验证集上验证你的模型。
### step 5: generate some output! ### 第五步:生成输出!
The last thing we need to do is to generate some output from the model! I wrote some helper methods to generate text from the model (`make_preds` and `next_pred`). Its mostly just trying to get the dimensions of things right, but heres the main important bit: 最后一件要做的事就是用这个模型生成一些输出。我写了一个helper方法从这个训练好的模型中生成文本`make_preds`和`next_pred`)。这里主要是把向量的维度对齐,重要的一点是:
``` ```
output = rnn(input) output = rnn(input)
@ -142,13 +142,11 @@ prediction_vector = F.softmax(output/temperature)
letter = v.textify(torch.multinomial(prediction_vector, 1).flatten(), sep='').replace('_', ' ') letter = v.textify(torch.multinomial(prediction_vector, 1).flatten(), sep='').replace('_', ' ')
``` ```
Basically whats going on here is that 基本是做的事情就是这些:
1. RNN层为字母表中的每一个字母或者符号输出一个数值向量`output`)。
1. the RNN outputs a vector of numbers (`output`), one for each letter/punctuation in our alphabet. 2. 这个输出向量**并不是**一个概率的向量,所以需要`F.softmax(output/temperature)`操作将其转换为概率值也就是所有数值加起来和为1。`temperature`某种程度上控制了对更高概率的权重在限制范围内如果设置temperature=0.0000001,它将始终选择概率最高的字母。
2. The `output` vector isnt **yet** a vector of probabilities, so `F.softmax(output/temperature)` turns it into a bunch of probabilities (aka “numbers that add up to 1”). `temperature` kind of controls how much to weight higher probabilities in the limit if you set temperature=0.0000001, itll always pick the letter with the highest probability. 3. `torch.multinomial(prediction_vector`用于获取概率向量并使用这些概率在向量中选择一个索引如12
3. `torch.multinomial(prediction_vector)` takes the vector of probabilities and uses those probabilites to pick an index in the vector (like 12) 4. `v.textify`把“12”转换为字母。
4. `v.textify` turns “12” into a letter
If we want 300 characters worth of text, we just repeat this process 300 times. If we want 300 characters worth of text, we just repeat this process 300 times.