欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

首頁(yè)綜合正文

評(píng)論

柚子快報(bào)邀請(qǐng)碼778899分享：人工智能自然語(yǔ)言處理前饋網(wǎng)絡(luò)

Wayfair家居出海者綜合2025-05-05240

柚子快報(bào)邀請(qǐng)碼778899分享：人工智能自然語(yǔ)言處理前饋網(wǎng)絡(luò)

http://yzkb.51969.com/

?一、實(shí)驗(yàn)介紹?

1.1實(shí)驗(yàn)內(nèi)容?

感知器是現(xiàn)存最簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)，是神經(jīng)網(wǎng)絡(luò)的基礎(chǔ)，而他的一個(gè)缺點(diǎn)是不能學(xué)習(xí)數(shù)據(jù)中存在的一些非常重要的模式，當(dāng)數(shù)據(jù)點(diǎn)存在非此即彼(XOR)的情況時(shí),在這種情況下，決策邊界非線性可分，此時(shí)感知器失敗。

因此在本次實(shí)驗(yàn)中將探索前饋網(wǎng)絡(luò)神經(jīng)模型，以及兩種前饋神經(jīng)網(wǎng)絡(luò)：多層感知機(jī)和卷積神經(jīng)網(wǎng)絡(luò)。

多層感知機(jī)在結(jié)構(gòu)上擴(kuò)展了簡(jiǎn)單感知機(jī)，將多個(gè)感知器分組在一個(gè)單層，并將多個(gè)層疊加在一起。

卷積神經(jīng)網(wǎng)絡(luò)，是在處理數(shù)字信號(hào)時(shí)受窗口濾波器的啟發(fā)，通過(guò)窗口特性，卷積神經(jīng)網(wǎng)絡(luò)能夠在輸入中學(xué)習(xí)局部化模式，這不僅使其成為計(jì)算機(jī)視覺(jué)的主軸，而且是檢測(cè)單詞和句子等序列數(shù)據(jù)中的子結(jié)構(gòu)的理想候選。

在本實(shí)驗(yàn)中，多層感知器和卷積神經(jīng)網(wǎng)絡(luò)被分組在一起，因?yàn)樗鼈兌际乔梆伾窠?jīng)網(wǎng)絡(luò)，并且與另一類(lèi)神經(jīng)網(wǎng)絡(luò)——遞歸神經(jīng)網(wǎng)絡(luò)(RNNs)形成對(duì)比，遞歸神經(jīng)網(wǎng)絡(luò)(RNNs)允許反饋(或循環(huán))，這樣每次計(jì)算都可以從之前的計(jì)算中獲得信息。

1.2實(shí)驗(yàn)要點(diǎn)

通過(guò)實(shí)例“帶有多層感知機(jī)的姓氏分類(lèi)”，掌握多層感知機(jī)在分層計(jì)算中的運(yùn)用。

掌握每種類(lèi)型的神經(jīng)網(wǎng)絡(luò)層對(duì)它所計(jì)算的數(shù)據(jù)張量的大小和形狀的影響。

1.3實(shí)驗(yàn)環(huán)境

Python 3.6.7

二、多層感知機(jī)(MLP)

多層感知器(MLP)被認(rèn)為是最基本的神經(jīng)網(wǎng)絡(luò)構(gòu)建模塊之一。最簡(jiǎn)單的MLP是對(duì)感知器的擴(kuò)展。感知器將數(shù)據(jù)向量作為輸入，計(jì)算出一個(gè)輸出值。在MLP中，許多感知器被分組，以便單個(gè)層的輸出是一個(gè)新的向量，而不是單個(gè)輸出值。在PyTorch中，正如您稍后將看到的，這只需設(shè)置線性層中的輸出特性的數(shù)量即可完成。MLP的另一個(gè)方面是，它將多個(gè)層與每個(gè)層之間的非線性結(jié)合在一起。

最簡(jiǎn)單的MLP，如圖所示，由三個(gè)表示階段和兩個(gè)線性層組成。第一階段是輸入向量。這是給定給模型的向量。在“示例:對(duì)餐館評(píng)論的情緒進(jìn)行分類(lèi)”中，輸入向量是Yelp評(píng)論的一個(gè)收縮的one-hot表示。給定輸入向量，第一個(gè)線性層計(jì)算一個(gè)隱藏向量——表示的第二階段。隱藏向量之所以這樣被調(diào)用，是因?yàn)樗俏挥谳斎牒洼敵鲋g的層的輸出。我們所說(shuō)的“層的輸出”是什么意思?理解這個(gè)的一種方法是隱藏向量中的值是組成該層的不同感知器的輸出。使用這個(gè)隱藏的向量，第二個(gè)線性層計(jì)算一個(gè)輸出向量。在像Yelp評(píng)論分類(lèi)這樣的二進(jìn)制任務(wù)中，輸出向量仍然可以是1。在多類(lèi)設(shè)置中，將在本實(shí)驗(yàn)后面的“示例:帶有多層感知器的姓氏分類(lèi)”一節(jié)中看到，輸出向量是類(lèi)數(shù)量的大小。雖然在這個(gè)例子中，我們只展示了一個(gè)隱藏的向量，但是有可能有多個(gè)中間階段，每個(gè)階段產(chǎn)生自己的隱藏向量。最終的隱藏向量總是通過(guò)線性層和非線性的組合映射到輸出向量。

mlp的力量來(lái)自于添加第二個(gè)線性層和允許模型學(xué)習(xí)一個(gè)線性分割的的中間表示——該屬性的能表示一個(gè)直線(或更一般的,一個(gè)超平面)可以用來(lái)區(qū)分?jǐn)?shù)據(jù)點(diǎn)落在線(或超平面)的哪一邊的。學(xué)習(xí)具有特定屬性的中間表示，如分類(lèi)任務(wù)是線性可分的，這是使用神經(jīng)網(wǎng)絡(luò)的最深刻后果之一，也是其建模能力的精髓。

MLP除了簡(jiǎn)單的感知器之外，還有一個(gè)額外的計(jì)算層。用PyTorch的兩個(gè)線性模塊可以實(shí)例化這個(gè)想法，線性對(duì)象被命名為fc1和fc2，它們遵循一個(gè)通用約定，即將線性模塊稱(chēng)為“完全連接層”，簡(jiǎn)稱(chēng)為“fc層”。除了這兩個(gè)線性層外，還有一個(gè)修正的線性單元(ReLU)非線性，它在被輸入到第二個(gè)線性層之前應(yīng)用于第一個(gè)線性層的輸出。由于層的順序性，必須確保層中的輸出數(shù)量等于下一層的輸入數(shù)量。使用兩個(gè)線性層之間的非線性是必要的，因?yàn)闆](méi)有它，兩個(gè)線性層在數(shù)學(xué)上等價(jià)于一個(gè)線性層4，因此不能建模復(fù)雜的模式。MLP的實(shí)現(xiàn)只實(shí)現(xiàn)反向傳播的前向傳遞。這是因?yàn)镻yTorch根據(jù)模型的定義和向前傳遞的實(shí)現(xiàn)，自動(dòng)計(jì)算出如何進(jìn)行向后傳遞和梯度更新。

import torch.nn as nn

import torch.nn.functional as F

class MultilayerPerceptron(nn.Module):

def __init__(self, input_dim, hidden_dim, output_dim):

"""

Args:

input_dim (int): the size of the input vectors

hidden_dim (int): the output size of the first Linear layer

output_dim (int): the output size of the second Linear layer

"""

super(MultilayerPerceptron, self).__init__()

self.fc1 = nn.Linear(input_dim, hidden_dim)

self.fc2 = nn.Linear(hidden_dim, output_dim)

def forward(self, x_in, apply_softmax=False):

"""The forward pass of the MLP

Args:

x_in (torch.Tensor): an input data tensor.

x_in.shape should be (batch, input_dim)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, output_dim)

"""

intermediate = F.relu(self.fc1(x_in))

output = self.fc2(intermediate)

if apply_softmax:

output = F.softmax(output, dim=1)

return output

由于MLP實(shí)現(xiàn)的通用性，可以為任何大小的輸入建模。為了演示，我們使用大小為3的輸入維度、大小為4的輸出維度和大小為100的隱藏維度。請(qǐng)注意，在print語(yǔ)句的輸出中，每個(gè)層中的單元數(shù)很好地排列在一起，以便為維度3的輸入生成維度4的輸出。

batch_size = 2 # number of samples input at once

input_dim = 3

hidden_dim = 100

output_dim = 4

# Initialize model

mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)

print(mlp)

我們可以通過(guò)傳遞一些隨機(jī)輸入來(lái)快速測(cè)試模型的“連接”，如示例4-3所示。因?yàn)槟Ｐ瓦€沒(méi)有經(jīng)過(guò)訓(xùn)練，所以輸出是隨機(jī)的。在花費(fèi)時(shí)間訓(xùn)練模型之前，這樣做是一個(gè)有用的完整性檢查。請(qǐng)注意PyTorch的交互性是如何讓我們?cè)陂_(kāi)發(fā)過(guò)程中實(shí)時(shí)完成所有這些工作的，這與使用NumPy或panda沒(méi)有太大區(qū)別:

import torch

def describe(x):

print("Type: {}".format(x.type()))

print("Shape/size: {}".format(x.shape))

print("Values: \n{}".format(x))

x_input = torch.rand(batch_size, input_dim)

describe(x_input)

綜上所述，mlp是將張量映射到其他張量的線性層。在每一對(duì)線性層之間使用非線性來(lái)打破線性關(guān)系，并允許模型扭曲向量空間。在分類(lèi)設(shè)置中，這種扭曲應(yīng)該導(dǎo)致類(lèi)之間的線性可分性。另外，可以使用softmax函數(shù)將MLP輸出解釋為概率，但是不應(yīng)該將softmax與特定的損失函數(shù)一起使用，因?yàn)榈讓訉?shí)現(xiàn)可以利用高級(jí)數(shù)學(xué)/計(jì)算捷徑。

三、卷積神經(jīng)網(wǎng)絡(luò)(CNN)

CNNs的名稱(chēng)和基本功能源于經(jīng)典的數(shù)學(xué)運(yùn)算卷積。卷積已經(jīng)應(yīng)用于各種工程學(xué)科，包括數(shù)字信號(hào)處理和計(jì)算機(jī)圖形學(xué)。一般來(lái)說(shuō)，卷積使用程序員指定的參數(shù)。這些參數(shù)被指定來(lái)匹配一些功能設(shè)計(jì)，如突出邊緣或抑制高頻聲音。事實(shí)上，許多Photoshop濾鏡都是應(yīng)用于圖像的固定卷積運(yùn)算。然而，在深度學(xué)習(xí)和本實(shí)驗(yàn)中，我們從數(shù)據(jù)中學(xué)習(xí)卷積濾波器的參數(shù)，因此它對(duì)于解決當(dāng)前的任務(wù)是最優(yōu)的。

在數(shù)字圖像處理中，我們可以運(yùn)用卷積來(lái)提取圖像特征，輸入是待提取圖像的灰度矩陣。而對(duì)于姓氏的分類(lèi)問(wèn)題，我們可以使用姓氏的one-hot向量矩陣作為卷積輸入。

四、實(shí)驗(yàn)步驟

4.1 姓氏分類(lèi)

在本節(jié)中，我們將MLP應(yīng)用于將姓氏分類(lèi)到其原籍國(guó)的任務(wù)。從公開(kāi)觀察到的數(shù)據(jù)推斷人口統(tǒng)計(jì)信息(如國(guó)籍)具有從產(chǎn)品推薦到確保不同人口統(tǒng)計(jì)用戶獲得公平結(jié)果的應(yīng)用。人口統(tǒng)計(jì)和其他自我識(shí)別信息統(tǒng)稱(chēng)為“受保護(hù)屬性”?！霸诮：彤a(chǎn)品中使用這些屬性時(shí)，必須小心?！蔽覀兪紫葘?duì)每個(gè)姓氏的字符進(jìn)行拆分，并像對(duì)待“示例:將餐館評(píng)論的情緒分類(lèi)”中的單詞一樣對(duì)待它們。除了數(shù)據(jù)上的差異，字符層模型在結(jié)構(gòu)和實(shí)現(xiàn)上與基于單詞的模型基本相似.

應(yīng)該從這個(gè)例子中吸取的一個(gè)重要教訓(xùn)是，MLP的實(shí)現(xiàn)和訓(xùn)練是從我們?cè)诘?章中看到的感知器的實(shí)現(xiàn)和培訓(xùn)直接發(fā)展而來(lái)的。事實(shí)上，我們?cè)趯?shí)驗(yàn)3中提到了這個(gè)例子，以便更全面地了解這些組件。此外，我們不包括“例子:餐館評(píng)論的情緒分類(lèi)”中看到的代碼。

本節(jié)的其余部分將從姓氏數(shù)據(jù)集及其預(yù)處理步驟的描述開(kāi)始。然后，我們使用詞匯表、向量化器和DataLoader類(lèi)逐步完成從姓氏字符串到向量化小批處理的管道。如果你通讀了實(shí)驗(yàn)3，應(yīng)該知道，這里只是做了一些小小的修改。

我們將通過(guò)描述姓氏分類(lèi)器模型及其設(shè)計(jì)背后的思想過(guò)程來(lái)繼續(xù)本節(jié)。MLP類(lèi)似于我們?cè)趯?shí)驗(yàn)3中看到的感知器例子，但是除了模型的改變，我們?cè)谶@個(gè)例子中引入了多類(lèi)輸出及其對(duì)應(yīng)的損失函數(shù)。在描述了模型之后，我們完成了訓(xùn)練例程。訓(xùn)練程序與“示例:對(duì)餐館評(píng)論的情緒進(jìn)行分類(lèi)”非常相似，因此為了簡(jiǎn)潔起見(jiàn)，我們?cè)谶@里不像在該部分中那樣深入，可以回顧這一節(jié)內(nèi)容

4.2 姓氏數(shù)據(jù)集

姓氏數(shù)據(jù)集，它收集了來(lái)自18個(gè)不同國(guó)家的10,000個(gè)姓氏，這些姓氏是作者從互聯(lián)網(wǎng)上不同的姓名來(lái)源收集的。該數(shù)據(jù)集將在本課程實(shí)驗(yàn)的幾個(gè)示例中重用，并具有一些使其有趣的屬性。第一個(gè)性質(zhì)是它是相當(dāng)不平衡的。排名前三的課程占數(shù)據(jù)的60%以上:27%是英語(yǔ)，21%是俄語(yǔ)，14%是阿拉伯語(yǔ)。剩下的15個(gè)民族的頻率也在下降——這也是語(yǔ)言特有的特性。第二個(gè)特點(diǎn)是，在國(guó)籍和姓氏正字法(拼寫(xiě))之間有一種有效和直觀的關(guān)系。有些拼寫(xiě)變體與原籍國(guó)聯(lián)系非常緊密(比如“O ‘Neill”、“Antonopoulos”、“Nagasawa”或“Zhu”)。

為了創(chuàng)建最終的數(shù)據(jù)集，我們從一個(gè)比課程補(bǔ)充材料中包含的版本處理更少的版本開(kāi)始，并執(zhí)行了幾個(gè)數(shù)據(jù)集修改操作。第一個(gè)目的是減少這種不平衡——原始數(shù)據(jù)集中70%以上是俄文，這可能是由于抽樣偏差或俄文姓氏的增多。為此，我們通過(guò)選擇標(biāo)記為俄語(yǔ)的姓氏的隨機(jī)子集對(duì)這個(gè)過(guò)度代表的類(lèi)進(jìn)行子樣本。接下來(lái)，我們根據(jù)國(guó)籍對(duì)數(shù)據(jù)集進(jìn)行分組，并將數(shù)據(jù)集分為三個(gè)部分:70%到訓(xùn)練數(shù)據(jù)集，15%到驗(yàn)證數(shù)據(jù)集，最后15%到測(cè)試數(shù)據(jù)集，以便跨這些部分的類(lèi)標(biāo)簽分布具有可比性。

class SurnameDataset(Dataset):

def __init__(self, surname_df, vectorizer):

"""

Args:

name_df (pandas.DataFrame): the dataset

vectorizer (SurnameVectorizer): vectorizer instatiated from dataset

"""

self.surname_df = surname_df

self._vectorizer = vectorizer

self.train_df = self.surname_df[self.surname_df.split=='train']

self.train_size = len(self.train_df)

self.val_df = self.surname_df[self.surname_df.split=='val']

self.validation_size = len(self.val_df)

self.test_df = self.surname_df[self.surname_df.split=='test']

self.test_size = len(self.test_df)

self._lookup_dict = {'train': (self.train_df, self.train_size),

'val': (self.val_df, self.validation_size),

'test': (self.test_df, self.test_size)}

self.set_split('train')

# Class weights

class_counts = surname_df.nationality.value_counts().to_dict()

def sort_key(item):

return self._vectorizer.nationality_vocab.lookup_token(item[0])

sorted_counts = sorted(class_counts.items(), key=sort_key)

frequencies = [count for _, count in sorted_counts]

self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)

@classmethod

def load_dataset_and_make_vectorizer(cls, surname_csv):

"""Load dataset and make a new vectorizer from scratch

Args:

surname_csv (str): location of the dataset

Returns:

an instance of SurnameDataset

"""

surname_df = pd.read_csv(surname_csv)

train_surname_df = surname_df[surname_df.split=='train']

return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))

@classmethod

def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):

"""Load dataset and the corresponding vectorizer.

Used in the case in the vectorizer has been cached for re-use

Args:

surname_csv (str): location of the dataset

vectorizer_filepath (str): location of the saved vectorizer

Returns:

an instance of SurnameDataset

"""

surname_df = pd.read_csv(surname_csv)

vectorizer = cls.load_vectorizer_only(vectorizer_filepath)

return cls(surname_df, vectorizer)

@staticmethod

def load_vectorizer_only(vectorizer_filepath):

"""a static method for loading the vectorizer from file

Args:

vectorizer_filepath (str): the location of the serialized vectorizer

Returns:

an instance of SurnameDataset

"""

with open(vectorizer_filepath) as fp:

return SurnameVectorizer.from_serializable(json.load(fp))

def save_vectorizer(self, vectorizer_filepath):

"""saves the vectorizer to disk using json

Args:

vectorizer_filepath (str): the location to save the vectorizer

"""

with open(vectorizer_filepath, "w") as fp:

json.dump(self._vectorizer.to_serializable(), fp)

def get_vectorizer(self):

""" returns the vectorizer """

return self._vectorizer

def set_split(self, split="train"):

""" selects the splits in the dataset using a column in the dataframe """

self._target_split = split

self._target_df, self._target_size = self._lookup_dict[split]

def __len__(self):

return self._target_size

def __getitem__(self, index):

"""the primary entry point method for PyTorch datasets

Args:

index (int): the index to the data point

Returns:

a dictionary holding the data point's features (x_data) and label (y_target)

"""

row = self._target_df.iloc[index]

surname_matrix = \

self._vectorizer.vectorize(row.surname)

nationality_index = \

self._vectorizer.nationality_vocab.lookup_token(row.nationality)

return {'x_surname': surname_matrix,

'y_nationality': nationality_index}

我們先定義一個(gè)名為SurnameDataset的類(lèi)，它繼承自PyTorch的Dataset類(lèi)，專(zhuān)門(mén)用于處理姓氏數(shù)據(jù)。這個(gè)類(lèi)首先在初始化時(shí)接收一個(gè)數(shù)據(jù)集和一個(gè)矢量化器，然后根據(jù)數(shù)據(jù)集中的分割標(biāo)簽將數(shù)據(jù)分為訓(xùn)練集、驗(yàn)證集和測(cè)試集。每個(gè)子集都有對(duì)應(yīng)的數(shù)據(jù)框和大小。在初始化時(shí)，還會(huì)計(jì)算每個(gè)類(lèi)別的權(quán)重，方便處理類(lèi)別不平衡的問(wèn)題。這個(gè)類(lèi)還提供了一些方法，比如加載數(shù)據(jù)集并創(chuàng)建矢量化器、從文件加載矢量化器、保存矢量化器等。set_split方法用于選擇當(dāng)前使用的數(shù)據(jù)分割，__len__方法返回當(dāng)前選擇的數(shù)據(jù)集大小，而__getitem__方法則根據(jù)索引返回具體的數(shù)據(jù)點(diǎn)，包括矢量化后的姓氏和對(duì)應(yīng)的國(guó)籍標(biāo)簽。這些功能使得SurnameDataset類(lèi)能夠方便地在PyTorch中使用，用于訓(xùn)練和評(píng)估模型。

4.3 模型構(gòu)建(MLP)

SurnameClassifier是MLP的實(shí)現(xiàn)。我們定義一個(gè)簡(jiǎn)單的MLP，其中只包含兩個(gè)線性層，第一個(gè)線性層將輸入向量映射到中間向量，并對(duì)該向量應(yīng)用非線性。第二線性層將中間向量映射到預(yù)測(cè)向量。使用ReLU或者Softmax作為激活函數(shù)。

class SurnameClassifier(nn.Module):

""" A 2-layer Multilayer Perceptron for classifying surnames """

def __init__(self, input_dim, hidden_dim, output_dim):

"""

Args:

input_dim (int): the size of the input vectors

hidden_dim (int): the output size of the first Linear layer

output_dim (int): the output size of the second Linear layer

"""

super(SurnameClassifier, self).__init__()

self.fc1 = nn.Linear(input_dim, hidden_dim)

self.fc2 = nn.Linear(hidden_dim, output_dim)

def forward(self, x_in, apply_softmax=False):

"""The forward pass of the classifier

Args:

x_in (torch.Tensor): an input data tensor.

x_in.shape should be (batch, input_dim)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, output_dim)

"""

intermediate_vector = F.relu(self.fc1(x_in))

prediction_vector = self.fc2(intermediate_vector)

if apply_softmax:

prediction_vector = F.softmax(prediction_vector, dim=1)

return prediction_vector

4.4 模型構(gòu)建(CNN)

構(gòu)建CNN的模型需要根據(jù)輸入和出處張量的大小來(lái)做具體分析。

class SurnameClassifier(nn.Module):

def __init__(self, initial_num_channels, num_classes, num_channels):

"""

Args:

initial_num_channels (int): size of the incoming feature vector

num_classes (int): size of the output prediction vector

num_channels (int): constant channel size to use throughout network

"""

super(SurnameClassifier, self).__init__()

self.convnet = nn.Sequential(

nn.Conv1d(in_channels=initial_num_channels,

out_channels=num_channels, kernel_size=3),

nn.ELU(),

nn.Conv1d(in_channels=num_channels, out_channels=num_channels,

kernel_size=3, stride=2),

nn.ELU(),

nn.Conv1d(in_channels=num_channels, out_channels=num_channels,

kernel_size=3, stride=2),

nn.ELU(),

nn.Conv1d(in_channels=num_channels, out_channels=num_channels,

kernel_size=3),

nn.ELU()

)

self.fc = nn.Linear(num_channels, num_classes)

def forward(self, x_surname, apply_softmax=False):

"""The forward pass of the classifier

Args:

x_surname (torch.Tensor): an input data tensor.

x_surname.shape should be (batch, initial_num_channels, max_surname_length)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, num_classes)

"""

features = self.convnet(x_surname).squeeze(dim=2)

prediction_vector = self.fc(features)

if apply_softmax:

prediction_vector = F.softmax(prediction_vector, dim=1)

return prediction_vector

在前向傳播方法中，輸入的姓氏張量首先通過(guò)卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行特征提取，然后將特征壓縮并通過(guò)全連接層得到預(yù)測(cè)結(jié)果。如果設(shè)置了apply_softmax標(biāo)志，最終的輸出會(huì)經(jīng)過(guò)Softmax激活函數(shù)，轉(zhuǎn)換為概率分布。這意味著，這個(gè)分類(lèi)器可以將輸入的姓氏向量分類(lèi)到不同的類(lèi)別中。簡(jiǎn)單來(lái)說(shuō)，這個(gè)模型接收姓氏的特征信息，通過(guò)多個(gè)卷積層提取特征，并最終輸出每個(gè)類(lèi)別的預(yù)測(cè)概率。

4.5 模型訓(xùn)練

訓(xùn)練步驟：

清零梯度：在每個(gè)批次開(kāi)始時(shí)，將優(yōu)化器的梯度清零。

計(jì)算輸出：通過(guò)分類(lèi)器的前向傳播計(jì)算預(yù)測(cè)值y_pred。

計(jì)算損失：使用交叉熵?fù)p失函數(shù)計(jì)算損失值，并更新運(yùn)行損失。

反向傳播：通過(guò)損失計(jì)算梯度。

梯度更新：使用優(yōu)化器更新模型參數(shù)。

計(jì)算準(zhǔn)確率：計(jì)算當(dāng)前批次的準(zhǔn)確率，并更新運(yùn)行準(zhǔn)確率。

更新進(jìn)度條：顯示當(dāng)前批次的損失和準(zhǔn)確率。

classifer = classifier.to(args.device)

dataset.class_weights = dataset.class_weights.to(args.device)

loss_func = nn.CrossEntropyLoss(weight=dataset.class_weights)

optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,

mode='min', factor=0.5,

patience=1)

train_state = make_train_state(args)

epoch_bar = tqdm_notebook(desc='training routine',

total=args.num_epochs,

position=0)

dataset.set_split('train')

train_bar = tqdm_notebook(desc='split=train',

total=dataset.get_num_batches(args.batch_size),

position=1,

leave=True)

dataset.set_split('val')

val_bar = tqdm_notebook(desc='split=val',

total=dataset.get_num_batches(args.batch_size),

position=1,

leave=True)

try:

for epoch_index in range(args.num_epochs):

train_state['epoch_index'] = epoch_index

# Iterate over training dataset

# setup: batch generator, set loss and acc to 0, set train mode on

dataset.set_split('train')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.0

running_acc = 0.0

classifier.train()

for batch_index, batch_dict in enumerate(batch_generator):

# the training routine is these 5 steps:

# --------------------------------------

# step 1. zero the gradients

optimizer.zero_grad()

# step 2. compute the output

y_pred = classifier(batch_dict['x_surname'])

# step 3. compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# step 4. use loss to produce gradients

loss.backward()

# step 5. use optimizer to take gradient step

optimizer.step()

# -----------------------------------------

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

# update bar

train_bar.set_postfix(loss=running_loss, acc=running_acc,

epoch=epoch_index)

train_bar.update()

train_state['train_loss'].append(running_loss)

train_state['train_acc'].append(running_acc)

# Iterate over val dataset

# setup: batch generator, set loss and acc to 0; set eval mode on

dataset.set_split('val')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.

running_acc = 0.

classifier.eval()

for batch_index, batch_dict in enumerate(batch_generator):

# compute the output

y_pred = classifier(batch_dict['x_surname'])

# step 3. compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

val_bar.set_postfix(loss=running_loss, acc=running_acc,

epoch=epoch_index)

val_bar.update()

train_state['val_loss'].append(running_loss)

train_state['val_acc'].append(running_acc)

train_state = update_train_state(args=args, model=classifier,

train_state=train_state)

scheduler.step(train_state['val_loss'][-1])

if train_state['stop_early']:

break

train_bar.n = 0

val_bar.n = 0

epoch_bar.update()

except KeyboardInterrupt:

print("Exiting loop")

4.6 模型測(cè)試

使用事先訓(xùn)練好的模型便可以實(shí)現(xiàn)簡(jiǎn)單的國(guó)際分類(lèi)，并且可以根據(jù)該姓氏判斷出其最有前幾個(gè)可能所處的國(guó)籍的概率。

def predict_topk_nationality(name, classifier, vectorizer, k=5):

vectorized_name = vectorizer.vectorize(name)

vectorized_name = torch.tensor(vectorized_name).view(1, -1)

prediction_vector = classifier(vectorized_name, apply_softmax=True)

probability_values, indices = torch.topk(prediction_vector, k=k)

# returned size is 1,k

probability_values = probability_values.detach().numpy()[0]

indices = indices.detach().numpy()[0]

results = []

for prob_value, index in zip(probability_values, indices):

nationality = vectorizer.nationality_vocab.lookup_index(index)

results.append({'nationality': nationality,

'probability': prob_value})

return results

new_surname = input("Enter a surname to classify: ")

classifier = classifier.to("cpu")

k = int(input("How many of the top predictions to see? "))

if k > len(vectorizer.nationality_vocab):

print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")

k = len(vectorizer.nationality_vocab)

predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)

print("Top {} predictions:".format(k))

print("===================")

for prediction in predictions:

print("{} -> {} (p={:0.2f})".format(new_surname,

prediction['nationality'],

prediction['probability']))

MLP預(yù)測(cè)：

CNN預(yù)測(cè)：

五、實(shí)驗(yàn)小結(jié)

在這個(gè)實(shí)驗(yàn)中，我們用多層感知器（MLP）和卷積神經(jīng)網(wǎng)絡(luò)（CNN）來(lái)預(yù)測(cè)姓氏的國(guó)籍。我們分別訓(xùn)練了一個(gè)MLP模型和一個(gè)CNN模型。結(jié)果顯示，雖然MLP模型訓(xùn)練速度更快，但CNN模型在測(cè)試集上的準(zhǔn)確率更高，表現(xiàn)得更好。總體來(lái)說(shuō)，MLP模型簡(jiǎn)單高效，適合快速實(shí)驗(yàn)，而CNN模型雖然訓(xùn)練時(shí)間較長(zhǎng)，但在處理復(fù)雜特征時(shí)效果更佳。這次實(shí)驗(yàn)幫助我們理解了這兩種模型在任務(wù)中的不同表現(xiàn)和適用場(chǎng)景。

柚子快報(bào)邀請(qǐng)碼778899分享：人工智能自然語(yǔ)言處理前饋網(wǎng)絡(luò)

http://yzkb.51969.com/

參考閱讀

評(píng)論可見(jiàn)，查看隱藏內(nèi)容

本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理，出于傳遞更多信息之目的，不代表金鑰匙跨境贊同其觀點(diǎn)和立場(chǎng)。

轉(zhuǎn)載請(qǐng)注明，如有侵權(quán)，聯(lián)系刪除。

本文鏈接：http://m.gantiao.com.cn/post/19213195.html