使用Tensorflow的多层感知器 tensorflow多机训练

bigegpt 2024-10-07 06:34 36 浏览

在这篇文章中，我们将使用TensorFlow构建一个神经网络（多层感知器）并成功训练它以识别图像中的数字。Tensorflow是一个非常流行的深度学习框架，该笔记将指导用这个库构建一个神经网络。如果你想了解什么是多层感知器，你可以看看以前的文章，用Numpy从头开始构建了一个多层感知器。

让我们从导入数据开始。作为Keras，一个高级深度学习库已经将MNIST数据作为其默认数据的一部分，我们将从那里导入数据集并将其拆分为训练和测试集。Python代码如下：

## Loading MNIST dataset from keras
import keras
from sklearn.preprocessing import LabelBinarizer
import matplotlib.pyplot as plt
%matplotlib inline
 
def load_dataset(flatten=False):
 (X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
 
 # normalize x
 X_train = X_train.astype(float) / 255.
 X_test = X_test.astype(float) / 255.
 
 # we reserve the last 10000 training examples for validation
 X_train, X_val = X_train[:-10000], X_train[-10000:]
 y_train, y_val = y_train[:-10000], y_train[-10000:]
 
 if flatten:
 X_train = X_train.reshape([X_train.shape[0], -1])
 X_val = X_val.reshape([X_val.shape[0], -1])
 X_test = X_test.reshape([X_test.shape[0], -1])
 
 return X_train, y_train, X_val, y_val, X_test, y_test
 
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
## Printing dimensions
print(X_train.shape, y_train.shape)
## Visualizing the first digit
plt.imshow(X_train[0], cmap="Greys");

如我们所见，当前数据的维数为n28x28，我们将首先在N*784中对图像进行flattening ，并对目标变量进行one-hot编码。Python代码如下：

## Changing dimension of input images from N*28*28 to N*784
X_train = X_train.reshape((X_train.shape[0],X_train.shape[1]*X_train.shape[2]))
X_test = X_test.reshape((X_test.shape[0],X_test.shape[1]*X_test.shape[2]))
 
print('Train dimension:');print(X_train.shape)
print('Test dimension:');print(X_test.shape)
 
## Changing labels to one-hot encoded vector
lb = LabelBinarizer()
y_train = lb.fit_transform(y_train)
y_test = lb.transform(y_test)
print('Train labels dimension:');print(y_train.shape)
print('Test labels dimension:');print(y_test.shape)

现在我们已经处理了数据，让我们开始使用tensorflow构建我们的多层感知器。我们将从导入所需的Python库开始。

## Importing required libraries
import numpy as np
import tensorflow as tf
from sklearn.metrics import roc_auc_score, accuracy_score
s = tf.InteractiveSession()

tf.InteractiveSession（）是一种直接运行tensorflow模型的方法，无需在我们想要运行模型时实例化图形。我们将构建784（输入）-512（隐藏层1）-256（隐藏层2）-10（输出）神经网络模型。让我们通过定义初始化变量来开始我们的模型构建。Python代码如下：

## Defining various initialization parameters for 784-512-256-10 MLP model
num_classes = y_train.shape[1]
num_features = X_train.shape[1]
num_output = y_train.shape[1]
num_layers_0 = 512
num_layers_1 = 256
starter_learning_rate = 0.001
regularizer_rate = 0.1

在tensorflow中，我们为输入变量和输出变量以及我们想要跟踪的任何变量定义占位符。

# Placeholders for the input data
input_X = tf.placeholder('float32',shape =(None,num_features),name="input_X")
input_y = tf.placeholder('float32',shape = (None,num_classes),name='input_Y')
## for dropout layer
keep_prob = tf.placeholder(tf.float32)

由于dense 层需要权重和偏差，它们需要以零均值和小方差的随机正态分布初始化(1/square root of the number of features)。

## Weights initialized by random normal function with std_dev = 1/sqrt(number of input features)
weights_0 = tf.Variable(tf.random_normal([num_features,num_layers_0], stddev=(1/tf.sqrt(float(num_features)))))
bias_0 = tf.Variable(tf.random_normal([num_layers_0]))
 
weights_1 = tf.Variable(tf.random_normal([num_layers_0,num_layers_1], stddev=(1/tf.sqrt(float(num_layers_0)))))
bias_1 = tf.Variable(tf.random_normal([num_layers_1]))
 
weights_2 = tf.Variable(tf.random_normal([num_layers_1,num_output], stddev=(1/tf.sqrt(float(num_layers_1)))))
bias_2 = tf.Variable(tf.random_normal([num_output]))

现在我们将开始编写图计算以开发我们的784（输入）-512（隐藏层1）-256（隐藏层2）-10（输出）模型。我们将每层的输入乘以其各自的权重并添加偏差项。在权重和偏差之后，我们需要添加激活; 我们将对隐藏层使用ReLU激活，对最终输出层使用softmax以获得类概率分数。还要防止过度拟合; 让我们在每个隐藏层之后添加一些drop out。Dropout 是在我们的网络中创建冗余的一个基本概念，这可以带来更好的泛化。

## Initializing weigths and biases
hidden_output_0 = tf.nn.relu(tf.matmul(input_X,weights_0)+bias_0)
hidden_output_0_0 = tf.nn.dropout(hidden_output_0, keep_prob)
 
hidden_output_1 = tf.nn.relu(tf.matmul(hidden_output_0_0,weights_1)+bias_1)
hidden_output_1_1 = tf.nn.dropout(hidden_output_1, keep_prob)
 
predicted_y = tf.sigmoid(tf.matmul(hidden_output_1_1,weights_2) + bias_2)
现在我们需要定义一个损失函数来优化我们的权重和偏差，我们将使用带有logits的softmax交叉熵来预测和正确的标签。我们还将为我们的网络添加一些L2正则化。
## Defining the loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=predicted_y,labels=input_y)) \
 + regularizer_rate*(tf.reduce_sum(tf.square(bias_0)) + tf.reduce_sum(tf.square(bias_1)))

现在我们需要为我们的网络定义一个优化器和学习率来优化给定损失函数上的权重和偏差。我们将使用指数衰减我们的学习率每5 epochs减少15%的学习。对于优化器，我们将使用Adam优化器。

## Variable learning rate
learning_rate = tf.train.exponential_decay(starter_learning_rate, 0, 5, 0.85, staircase=True)
## Adam optimzer for finding the right weight
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss,var_list=[weights_0,weights_1,weights_2,
 bias_0,bias_1,bias_2])

我们完成了模型构建。让我们定义精度度量来评估我们的模型性能，因为损失函数是非直观的。

## Metrics definition
correct_prediction = tf.equal(tf.argmax(y_train,1), tf.argmax(predicted_y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

我们现在将开始训练我们的训练数据网络并同时评估我们的测试数据集网络。我们将使用尺寸为128的批量优化，并将其训练为14个epochs，以获得98％以上的准确度。

## Training parameters
batch_size = 128
epochs=14
dropout_prob = 0.6
 
training_accuracy = []
training_loss = []
testing_accuracy = []
 
s.run(tf.global_variables_initializer())
for epoch in range(epochs): 
 arr = np.arange(X_train.shape[0])
 np.random.shuffle(arr)
 for index in range(0,X_train.shape[0],batch_size):
 s.run(optimizer, {input_X: X_train[arr[index:index+batch_size]],
 input_y: y_train[arr[index:index+batch_size]],
 keep_prob:dropout_prob})
 training_accuracy.append(s.run(accuracy, feed_dict= {input_X:X_train, 
 input_y: y_train,keep_prob:1}))
 training_loss.append(s.run(loss, {input_X: X_train, 
 input_y: y_train,keep_prob:1}))
 
 ## Evaluation of model
 testing_accuracy.append(accuracy_score(y_test.argmax(1), 
 s.run(predicted_y, {input_X: X_test,keep_prob:1}).argmax(1)))
 print("Epoch:{0}, Train loss: {1:.2f} Train acc: {2:.3f}, Test acc:{3:.3f}".format(epoch,
 training_loss[epoch],
 training_accuracy[epoch],
 testing_accuracy[epoch]))

让我们将可视化训练和测试准确率作为epoch的数量的函数。

## Plotting chart of training and testing accuracy as a function of iterations
iterations = list(range(epochs))
plt.plot(iterations, training_accuracy, label='Train')
plt.plot(iterations, testing_accuracy, label='Test')
plt.ylabel('Accuracy')
plt.xlabel('iterations')
plt.show()
print("Train Accuracy: {0:.2f}".format(training_accuracy[-1]))
print("Test Accuracy:{0:.2f}".format(testing_accuracy[-1]))

正如我们所看到的，我们已经成功地训练了一个多层感知器，它是用tensorflow编写的，具有很高的验证精度！

tf.nn.softmax

上一篇：代码|MNIST集上实现逻辑回归，TensorFlow描述
下一篇：Tensorflow分类loss函数总结 tensorflow绘制loss曲线

使用Tensorflow的多层感知器 tensorflow多机训练

相关推荐

idea本地配置连接远程hadoop集群的一些网络问题解决汇总

无缓存不行?例行升级的入门级阿斯加特AN2 SSD装机点评

Ceph运维手册(基于P版本)

Docker 命令大全（docker命令大全记录表）

替代Docker build的Buildah简单介绍

Docker Desktop安装使用指南:零基础教程

大数据开发前要做什么准备?8台Hadoop服务器进行集群规划前配置

Tensorflow分类loss函数总结 tensorflow绘制loss曲线

R语言学习笔记(七) -离散型数据的模型预测2

iOS Runtime详解