Simon Shi的小站

人工智能,机器学习, 强化学习,大模型,自动驾驶

0%

Tensorflow 1.x

[TOC]

tensorflow

TF Keras python
TF1.12 Keras2.2.4
TF1.13 Keras2.2.4
TF1.14 Keras2.2.5 3.6,3.7
TF1.15 Keras2.2.5

resnet模型搭建

  • eager keras resnet (转pb比较麻烦)
  • tf slim resnet(DDZ)
  • tf pure resnet (DDZ v2)
  • keras

TF cmd flags

1
2
3
4
TF_CPP_MIN_LOG_LEVEL 取值 0 : 0也是默认值,输出所有信息
TF_CPP_MIN_LOG_LEVEL 取值 1 : 屏蔽通知信息
TF_CPP_MIN_LOG_LEVEL 取值 2 : 屏蔽通知信息和警告信息
TF_CPP_MIN_LOG_LEVEL 取值 3 : 屏蔽通知信息、警告信息和报错信息

BN训练技巧

demo1:

1
2
3
4
5
6
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope)
updates_op = tf.group(*update_ops)
with tf.control_dependencies([updates_op]):
losses = tf.get_collection('losses', scope) # 4
# Calculate the total loss for the current tower.
total_loss = tf.add_n(losses, name='total_loss') # 5
1
2
3
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = opt.apply_gradients(mean_grads, global_step=global_step)

demo2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from tensorflow.python.training import moving_averages
# 为每个通道计算均值、标准差
mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')
# 新建或建立测试阶段使用的batch均值、标准差
moving_mean = tf.get_variable('moving_mean',
params_shape, tf.float32,
initializer=tf.constant_initializer(0.0, tf.float32),
trainable=False)
moving_variance = tf.get_variable('moving_variance',
params_shape,
tf.float32,initializer=tf.constant_initializer(1.0, tf.float32),
trainable=False)
# 添加batch均值和标准差的更新操作(滑动平均)
# moving_mean = moving_mean * decay + mean * (1 - decay)
# moving_variance = moving_variance * decay + variance * (1 - decay)
self._extra_train_ops.append(moving_averages.assign_moving_average(
moving_mean, mean, 0.9))
self._extra_train_ops.append(moving_averages.assign_moving_average(
moving_variance, variance, 0.9))

EMA

shadow_variable = decay * shadow_variable + (1 - decay) * variable

demo1:

1
2
3
4
5
## https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py
# Track the moving averages of all trainable variables.
variable_averages = tf.train.ExponentialMovingAverage(train_config.MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
train_op = tf.group(apply_gradient_op, variables_averages_op)

demo2:

1
2
3
4
5
6
# 设置exponential moving average
variable_averages = tf.train.ExponentialMovingAverage(0.999, self.global_step, name='avg')
losses = tf.get_collection('cost')
variables_averages_op = variable_averages.apply(losses)
# 将所有的更新捆绑到一个训练操作中
train_op = tf.group(apply_op, variables_averages_op)

多GPU训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
next_img, next_label = iterator.get_next()
image_splits = tf.split(next_img, num_gpus)
label_splits = tf.split(next_label, num_gpus)
tower_grads = []
tower_loss = []
counter = 0
for d in self.gpu_id:
with tf.device('/gpu:%s' % d):
with tf.name_scope('%s_%s' % ('tower', d)):
cross_entropy = build_train_model(image_splits[counter], label_splits[counter], for_training=True)
counter += 1
with tf.variable_scope("loss"):
grads = opt.compute_gradients(cross_entropy)
tower_grads.append(grads)
tower_loss.append(cross_entropy)
tf.get_variable_scope().reuse_variables()

mean_loss = tf.stack(axis=0, values=tower_loss)
mean_loss = tf.reduce_mean(mean_loss, 0)
mean_grads = util.average_gradients(tower_grads)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = opt.apply_gradients(mean_grads, global_step=global_step)

APIs

accuracy

accuracy, update_op = tf.metrics.accuracy(labels=x, predictions=y)

tf.metrics.accuracy**返回**两个值,**accuracy**为到上一个batch为止的准确度,**update_op**为更新本批次后的准确度。

BN:

tf.layers.BatchNormalization

tf.layers.batch_normalization

tf.keras.layers.BatchNormalization

tf.nn.batch_normalization

① tf.nn.moments 这个函数的输出就是BN需要的mean和variance。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
tensorflow中实现BN算法的各种函数

tf.nn.batch_normalization()是一个低级的操作函数,调用者需要自己处理张量的平均值和方差

tf.nn.fused_batch_norm是另一个低级的操作函数,和前者十分相似,不同之处在于它针对4维输入张量进行了优化,这是卷积神经网络中常见的情况,而前者tf.nn.batch_normalization则接受任何等级大于1的张量

tf.layers.batch_normalization是对先前操作的高级封装,最大的不同在于它负责创建和管理运行张量的均值和方差,并尽可可能地快速融合计算,通常,这个函数应该是你的默认选择

tf.contrib.layers.batch_norm是batch_norm的早期实现,其升级的核心api版本为(tf.layers.batch_normalization),不推荐使用它,因为它可能会在未来的版本中丢失.

tf.nn.batch_norm_with_global_normalization是另一个被弃用的操作,现在这个函数会委托给tf.nn.batch_normalization执行,在未来这个函数会被放弃

keras.layers.BatchNormalization是BN算法的keras实现,这个函数在后端会调用tensorflow的tf.nn.batch_normalization函数


————————————————
版权声明:本文为CSDN博主「wanghua609」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_38145317/article/details/96132250

ref:

tensorflow 中batch normalize 的使用(Slim.BN) https://blog.csdn.net/jiruiyang/article/details/77202674

BN 详解和使用Tensorflow实现(参数理解)https://www.cnblogs.com/WSX1994/p/10949079.html

训练技巧

BN

is_training

tf.GraphKeys.UPDATE_OPS

EMA

loss

l2_loss

tower_loss

grad

average_grad

ISSUEs

1、tf.gradients VS tf.train.Optimizer.computer_gradients

tf.gradients不允许您计算雅可比矩阵,它会汇总每个输出的梯度(类似于实际雅可比矩阵的每列的总和)。事实上,在TensorFlow中没有“好”的计算Jacobians的方法(基本上你必须每个输出调用tf.gradients一次,see this issue)。

关于tf.train.Optimizer.compute_gradients,是的,其结果基本相同,但是自动处理一些细节并以稍微方便的输出格式。如果你看看the implementation,你会发现它的核心是对tf.gradients(在这种情况下是别名gradients.gradients)的调用,但是对于优化器实现来说已经实现了周围的逻辑非常有用。另外,将它作为一种方法允许子类中的可扩展行为,以实现某种优化策略(不太可能在步骤compute_gradients步骤中)或辅助目的(如跟踪或调试)。

2、梯度裁剪

tf.clip_by_global_norm(gradients, max_gradient_norm)

  1. t_list 表示梯度张量
  2. clip_norm是截取的比率

global_norm = sqrt(sum(l2norm(t)**2 for t in t_list))

t_list[i] = t_list[i] * clip_norm / max(global_norm, clip_norm)

也就是分为两步:

  1. 计算所有梯度的平方和global_norm
  2. 如果梯度平方和 global_norm 超过我们指定的clip_norm,那么就对梯度进行缩放;否则就按照原本的计算结果,这个应该很好理解。

https://bigquant.com/community/t/topic/121493

Keras

1
2
3
#from tensorflow.keras.layers import Layer
#换成
from tensorflow.python.keras.layers import Layer