Update 20201130 An attempt at implementing char-rnn with PyTorch.md

temp work
2025-03-24 02:20:09 +08:00 · 2020-12-11 00:16:05 +08:00 · 2020-12-11 00:16:05 +08:00 · 326eac0656
commit 326eac0656
parent 6b60d003c8
1 changed files with 13 additions and 15 deletions
--- a/sources/tech/20201130
+++ b/sources/tech/20201130
@ -108,8 +108,7 @@ class Trainer():
 ### 使用`nn.LSTM`沿着时间反向传播，不要自己写代码。


-
-Originally I wrote my own code to pass in 1 letter at a time to the LSTM and then periodically compute the derivative, kind of like this:
+开始的时候我自己写的代码，每次传一个字母到LSTM层中，之后定期计算导数，就像下面这样：

 ```
 for i in range(20):
@ -122,19 +121,20 @@ loss.backward()
 self.optimizer.step()
 ```

-This passes in 20 letters (one at a time), and then takes a training step at the end. This is called [backpropagation through time][4] and Karpathy mentions using this method in his blog post.
+这段代码每次传入一个字母，并且在最后训练了一次。这个步骤就被称为反向传播[backpropagation through time][4]，Karpathy在他的博客中就是用这种方法。

-This kind of worked, but my loss would go down for a while and then kind of spike later in training. I still don’t understand why this happened, but when I switched to instead just passing in 20 characters at a time to the LSTM (as the `seq_len` dimension) and letting it do the backpropagation itself, things got a lot better.
+这个方法有些用处，我编写的损失函数开始能够下降一段时间，但之后就会出现峰值。我不知道为什么会出现这种现象，但之后我改为一次传入20个字符之后，再进行反向传播，情况就变好了。

 ### step 4: train the model!
+### 第四步：训练模型！

-I reran this training code over the same data maybe 300 times, until I got bored and it started outputting text that looked vaguely like English. This took about an hour or so.
+我在同样的数据上重复执行了这个训练代码大概300次，直到模型开始输出一些看起来像英文的文本。差不多花了一个多小时吧。

-In this case I didn’t really care too much about overfitting, but if you were doing this for a Real Reason it would be good to run it on a validation set.
+这种情况下我也不关注模型是不是过拟合了，但是如果你在真实场景中训练模型，应该要在验证集上验证你的模型。

-### step 5: generate some output!
+### 第五步：生成输出！

-The last thing we need to do is to generate some output from the model! I wrote some helper methods to generate text from the model (`make_preds` and `next_pred`). It’s mostly just trying to get the dimensions of things right, but here’s the main important bit:
+最后一件要做的事就是用这个模型生成一些输出。我写了一个helper方法从这个训练好的模型中生成文本（`make_preds`和`next_pred`）。这里主要是把向量的维度对齐，重要的一点是：

 ```
 output = rnn(input)
@ -142,13 +142,11 @@ prediction_vector = F.softmax(output/temperature)
 letter = v.textify(torch.multinomial(prediction_vector, 1).flatten(), sep='').replace('_', ' ')
 ```

-Basically what’s going on here is that
-
-  1. the RNN outputs a vector of numbers (`output`), one for each letter/punctuation in our alphabet.
-  2. The `output` vector isn’t **yet** a vector of probabilities, so `F.softmax(output/temperature)` turns it into a bunch of probabilities (aka “numbers that add up to 1”). `temperature` kind of controls how much to weight higher probabilities – in the limit if you set temperature=0.0000001, it’ll always pick the letter with the highest probability.
-  3. `torch.multinomial(prediction_vector)` takes the vector of probabilities and uses those probabilites to pick an index in the vector (like 12)
-  4. `v.textify` turns “12” into a letter
-
+基本是做的事情就是这些：
+  1. RNN层为字母表中的每一个字母或者符号输出一个数值向量（`output`）。
+  2. 这个输出向量**并不是**一个概率的向量，所以需要`F.softmax(output/temperature)`操作，将其转换为概率值（也就是所有数值加起来和为1）。`temperature`某种程度上控制了对更高概率的权重，在限制范围内，如果设置temperature=0.0000001，它将始终选择概率最高的字母。
+  3. `torch.multinomial(prediction_vector）`用于获取概率向量，并使用这些概率在向量中选择一个索引（如12）。
+  4. `v.textify`把“12”转换为字母。


 If we want 300 characters worth of text, we just repeat this process 300 times.