11 KiB
这个想法(来自The Unreasonable Effectiveness of Recurrent Neural Networks)可以让你在文本上训练一个基于字符的循环神经网络,并得到一些出乎意料的结果。
这是Jupyter notebook格式的代码:char-rnn in PyTorch.ipynb。你可以点击这个网页最上面那个按钮Open in Colab,就可以在Google的Colab服务中打开,并使用免费的GPU进行训练。所有的东西加起来大概有75行代码,我将在这篇博文中尽可能地详细解释。
step 1: prepare the data
首先,我们要下载数据。我使用的是古登堡项目中的这个数据:Hans Christian Anderson’s fairy tales。
!wget -O fairy-tales.txt
from fastai.text import *
text = unidecode.unidecode(open('fairy-tales.txt').read())
v = Vocab.create((x for x in text), max_vocab=400, min_freq=1)
training_set = torch.Tensor(v.numericalize([x for x in text])).type(torch.LongTensor).cuda()
num_letters = len(v.itos)
step 2: define a model
- 对输入向量进行one-hot编码,使得他们具有正确的维度。
- 在LSTM层后一层添加一个线性变换,因为LSTM输出的是一个长度为
的向量这样才能把它变成一个字符。 - 把LSTM隐藏层的输出向量(实际上有2个向量)保存成实例变量,然后在每轮运行结束后执行
class MyLSTM(nn.Module):
def __init__(self, input_size, hidden_size):
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.h2o = nn.Linear(hidden_size, input_size)
self.hidden = None
def forward(self, input):
input = torch.nn.functional.one_hot(input, num_classes=self.input_size).type(torch.FloatTensor).cuda().unsqueeze(0)
if self.hidden is None:
l_output, self.hidden = self.lstm(input)
l_output, self.hidden = self.lstm(input, self.hidden)
self.hidden = (self.hidden[0].detach(), self.hidden[1].detach())
return self.h2o(l_output)
会在内部使用backpropagation through time更新隐藏向量6次
step 3: write some training code
- 往RNN模型中传入一个字符串,比如
and they ought not to teas
。(要以数字向量的形式传入) - 得到下一个字母的预测结果。
- 计算RNN模型预测结果和真实的下一个字母之间的损失函数。(
结尾的) - 计算梯度。(用
函数) - 沿着梯度下降的方向修改模型中参数的权重。(用
class Trainer():
def __init__(self):
self.rnn = MyLSTM(input_size, hidden_size).cuda()
self.optimizer = torch.optim.Adam(self.rnn.parameters(), amsgrad=True, lr=lr)
def epoch(self):
i = 0
while i < len(training_set) - 40:
seq_len = random.randint(10, 40)
input, target = training_set[i:i+seq_len],training_set[i+1:i+1+seq_len]
i += seq_len
# forward pass
output = self.rnn(input)
loss = F.cross_entropy(output.squeeze()[-1:], target[-1:])
# compute gradients and take optimizer step
let nn.LSTM
do backpropagation through time, don’t do it myself
Originally I wrote my own code to pass in 1 letter at a time to the LSTM and then periodically compute the derivative, kind of like this:
for i in range(20):
input, target = next(iter)
output, hidden = self.lstm(input, hidden)
loss = F.cross_entropy(output, target)
hidden = hidden.detach()
This passes in 20 letters (one at a time), and then takes a training step at the end. This is called backpropagation through time and Karpathy mentions using this method in his blog post.
This kind of worked, but my loss would go down for a while and then kind of spike later in training. I still don’t understand why this happened, but when I switched to instead just passing in 20 characters at a time to the LSTM (as the seq_len
dimension) and letting it do the backpropagation itself, things got a lot better.
step 4: train the model!
I reran this training code over the same data maybe 300 times, until I got bored and it started outputting text that looked vaguely like English. This took about an hour or so.
In this case I didn’t really care too much about overfitting, but if you were doing this for a Real Reason it would be good to run it on a validation set.
step 5: generate some output!
The last thing we need to do is to generate some output from the model! I wrote some helper methods to generate text from the model (make_preds
and next_pred
). It’s mostly just trying to get the dimensions of things right, but here’s the main important bit:
output = rnn(input)
prediction_vector = F.softmax(output/temperature)
letter = v.textify(torch.multinomial(prediction_vector, 1).flatten(), sep='').replace('_', ' ')
Basically what’s going on here is that
- the RNN outputs a vector of numbers (
), one for each letter/punctuation in our alphabet. - The
vector isn’t yet a vector of probabilities, soF.softmax(output/temperature)
turns it into a bunch of probabilities (aka “numbers that add up to 1”).temperature
kind of controls how much to weight higher probabilities – in the limit if you set temperature=0.0000001, it’ll always pick the letter with the highest probability. torch.multinomial(prediction_vector)
takes the vector of probabilities and uses those probabilites to pick an index in the vector (like 12)v.textify
turns “12” into a letter
If we want 300 characters worth of text, we just repeat this process 300 times.
the results!
Here’s some generated output from the model where I set temperature = 1
in the prediction function. It’s kind of like English, which is pretty impressive given that this model needed to “learn” English from scratch and is totally based on character sequences.
It doesn’t make any sense, but what did we expect really.
“An who was you colotal said that have to have been a little crimantable and beamed home the beetle. “I shall be in the head of the green for the sound of the wood. The pastor. “I child hand through the emperor’s sorthes, where the mother was a great deal down the conscious, which are all the gleam of the wood they saw the last great of the emperor’s forments, the house of a large gone there was nothing of the wonded the sound of which she saw in the converse of the beetle. “I shall know happy to him. This stories herself and the sound of the young mons feathery in the green safe.”
“That was the pastor. The some and hand on the water sound of the beauty be and home to have been consider and tree and the face. The some to the froghesses and stringing to the sea, and the yellow was too intention, he was not a warm to the pastor. The pastor which are the faten to go and the world from the bell, why really the laborer’s back of most handsome that she was a caperven and the confectioned and thoughts were seated to have great made
Here’s some more generated output at temperature=0.1
, which weights its character choices closer to “just pick the highest probability character every time”. This makes the output a lot more repetitive:
ole the sound of the beauty of the beetle. “She was a great emperor of the sea, and the sun was so warm to the confectioned the beetle. “I shall be so many for the beetle. “I shall be so many for the beetle. “I shall be so standen for the world, and the sun was so warm to the sea, and the sun was so warm to the sea, and the sound of the world from the bell, where the beetle was the sea, and the sound of the world from the bell, where the beetle was the sea, and the sound of the wood flowers and the sound of the wood, and the sound of the world from the bell, where the world from the wood, and the sound of the
It’s weirdly obsessed with beetles and confectioners, and the sun, and the sea. Seems fine!
that’s all!
my results are nowhere near as good as Karpathy’s so far, maybe due to one of the following:
- not enough training data
- I got bored with training after an hour and didn’t have the patience to babysit the Colab notebook for longer
- he used a 2-layer LSTM with more hidden parameters than me, I have 1 layer
- something else entirely
via: https://jvns.ca/blog/2020/11/30/implement-char-rnn-in-pytorch/
作者:Julia Evans 选题:lujun9972 译者:zhangxiangping 校对:校对者ID