(六)PyTorch学习笔记——NLP实战

1、one-hot encoding

用如下所示表示
w

w

字符

[0,0,,1,,0,0]|V| elements


其中
V

V

是我们的词汇表, 1 是
w

w

的特征位置.其他的单词也是在其他位置有一个1, 在另外的位置都是0.

2、Getting Dense Word Embeddings(密集字嵌入)

(1)介绍

刚开始的稀疏one-hot 向量是我们刚定义向量的特殊形式,其中单词的相似度为 0, 然后我们可以给每一个单词一些独特的语义属性.这些向量是密集的 , 也就是说他们是非零的.如果每一个属性都是一维, 那我们可以给一个向量代表一个单词, 像这样:


qmathematician=2.3can run,9.4likes coffee,5.5majored in Physics,

q

mathematician

=

[

2.3

can run

,

9.4

likes coffee

,

5.5

majored in Physics

,

]


qphysicist=2.5can run,9.1likes coffee,6.4majored in Physics,

q

physicist

=

[

2.5

can run

,

9.1

likes coffee

,

6.4

majored in Physics

,

]



单词之间的相似度:


Similarity(physicist,mathematician)=qphysicistqmathematicianq\physicistqmathematician=cos(ϕ)

Similarity

(

physicist

,

mathematician

)

=

q

physicist

q

mathematician

q

\physicist

q

mathematician

=

cos

(

ϕ

)



总结,单词嵌入是一个单词语义的表示,语义信息的有效编码可能与手头任务相关.

(2)Word Embeddings in Pytorch(Pytorch中的单词嵌入)

单词嵌入时需要对每 一个单词定义一个特别的索引. 这些将是查找表中的键. 也就是说,嵌入被储存为一个 |V|×D 矩阵, 其中 D 是嵌入的维度, 这样的词被赋予了索引 i 它的嵌入被储存在矩阵的 第 i 行. 在所有的代码中, 从单词到索引的映射是一个命名的字典 word_to_ix.

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
word_to_ix = {"hello": 0, "world": 1}
embeds = nn.Embedding(2, 5)  # 参数说明(2是word_to_ix 的单词总数, 5是嵌入的维度)
lookup_tensor = torch.LongTensor([word_to_ix["hello"]])
hello_embed = embeds(autograd.Variable(lookup_tensor))
print(hello_embed)

3、N-Gram 语言模型

在n-gram语言模型中,给定一系列单词 w , 我们需要计算


P(wi|wi1,wi2,,win+1)

P

(

w

i

|

w

i

1

,

w

i

2

,

,

w

i

n

+

1

)



其中,


wi

w

i

是句子中第


i

i

个单词

CONTEXT_SIZE = 2
EMBEDDING_DIM = 10
# 我们将使用 Shakespeare Sonnet 2
test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()
# 我们应该对输入进行标记,但是我们将忽略它
# 建造一系列元组.  每个元组 ([ word_i-2, word_i-1 ], 都是目标单词)
trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])
            for i in range(len(test_sentence) - 2)]
# 输出前 3, 为了让你看到他的各式
print(trigrams[:3])

vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}


class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs


losses = []
loss_function = nn.NLLLoss()
model = NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)
optimizer = optim.SGD(model.parameters(), lr=0.001)

for epoch in range(10):
    total_loss = torch.Tensor([0])
    for context, target in trigrams:

        # 步骤 1. 准备好进入模型的数据 (例如将单词转换成整数索引,并将其封装在变量中)
        context_idxs = [word_to_ix[w] for w in context]
        context_var = autograd.Variable(torch.LongTensor(context_idxs))

        # 步骤 2. 回调 *积累* 梯度. 在进入一个实例前,需要将之前的实力梯度置零
        model.zero_grad()

        # 步骤 3. 运行反向传播,得到单词的概率分布
        log_probs = model(context_var)

        # 步骤 4. 计算损失函数. (再次注意, Torch需要将目标单词封装在变量中)
        loss = loss_function(log_probs, autograd.Variable(
            torch.LongTensor([word_to_ix[target]])))

        # 步骤 5. 反向传播并更新梯度
        loss.backward()
        optimizer.step()

        total_loss += loss.data
    losses.append(total_loss)
print(losses)  # 在训练集中每次迭代损失都会减小!

4、Continuous Bag-of-Words连续单词包模型 (CBOW)

CBOW模型如下所示.给定一个目标单词

wi

和 N 代表单词每一遍的滑窗距,


wi1,,wiNwi+1,,wi+N

w

i

1

,

,

w

i

N

w

i

+

1

,

,

w

i

+

N

, 将所有上下文词统称为 C ,CBOW试图去最小化如下


logp(wi|C)=logSoftmax(A(wCqw)+b)

log

p

(

w

i

|

C

)

=

log

Softmax

(

A

(

w

C

q

w

)

+

b

)



其中,


qw

q

w

是单词


w

w

<script type=”math/tex” id=”MathJax-Element-236″>w</script>的嵌入

CONTEXT_SIZE = 2  # 左右各2个单词
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

# 通过从 `raw_text` 得到一组单词, 进行去重操作
vocab = set(raw_text)
vocab_size = len(vocab)

word_to_ix = {word: i for i, word in enumerate(vocab)}
data = []
for i in range(2, len(raw_text) - 2):
    context = [raw_text[i - 2], raw_text[i - 1],
               raw_text[i + 1], raw_text[i + 2]]
    target = raw_text[i]
    data.append((context, target))
print(data[:5])


class CBOW(nn.Module):

    def __init__(self):
        pass

    def forward(self, inputs):
        pass

# 创建模型并且训练. 这里有一些函数可以在使用模型之前帮助你准备数据


def make_context_vector(context, word_to_ix):
    idxs = [word_to_ix[w] for w in context]
    tensor = torch.LongTensor(idxs)
    return autograd.Variable(tensor)


make_context_vector(data[0][0], word_to_ix)  # 例子

标签