[TOC]
tensorflow
TF
Keras
python
TF1.12
Keras2.2.4
TF1.13
Keras2.2.4
TF1.14
Keras2.2.5
3.6,3.7
TF1.15
Keras2.2.5
resnet模型搭建
TF cmd flags 1 2 3 4 TF_CPP_MIN_LOG_LEVEL 取值 0 : 0也是默认值,输出所有信息 TF_CPP_MIN_LOG_LEVEL 取值 1 : 屏蔽通知信息 TF_CPP_MIN_LOG_LEVEL 取值 2 : 屏蔽通知信息和警告信息 TF_CPP_MIN_LOG_LEVEL 取值 3 : 屏蔽通知信息、警告信息和报错信息
BN训练技巧 demo1: 1 2 3 4 5 6 update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope) updates_op = tf.group(*update_ops) with tf.control_dependencies([updates_op]): losses = tf.get_collection('losses' , scope) total_loss = tf.add_n(losses, name='total_loss' )
1 2 3 update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = opt.apply_gradients(mean_grads, global_step=global_step)
demo2: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 from tensorflow.python.training import moving_averagesmean, variance = tf.nn.moments(x, [0 , 1 , 2 ], name='moments' ) moving_mean = tf.get_variable('moving_mean' , params_shape, tf.float32, initializer=tf.constant_initializer(0.0 , tf.float32), trainable=False ) moving_variance = tf.get_variable('moving_variance' , params_shape, tf.float32,initializer=tf.constant_initializer(1.0 , tf.float32), trainable=False ) self ._extra_train_ops.append(moving_averages.assign_moving_average( moving_mean, mean, 0.9 )) self ._extra_train_ops.append(moving_averages.assign_moving_average( moving_variance, variance, 0.9 ))
EMA shadow_variable = decay * shadow_variable + (1 - decay) * variable
demo1: 1 2 3 4 5 variable_averages = tf.train.ExponentialMovingAverage(train_config.MOVING_AVERAGE_DECAY, global_step) variables_averages_op = variable_averages.apply(tf.trainable_variables()) train_op = tf.group(apply_gradient_op, variables_averages_op)
demo2: 1 2 3 4 5 6 variable_averages = tf.train.ExponentialMovingAverage(0.999 , self .global_step, name='avg' ) losses = tf.get_collection('cost' ) variables_averages_op = variable_averages.apply(losses) train_op = tf.group(apply_op, variables_averages_op)
多GPU训练 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 next_img, next_label = iterator.get_next() image_splits = tf.split(next_img, num_gpus) label_splits = tf.split(next_label, num_gpus) tower_grads = [] tower_loss = [] counter = 0 for d in self .gpu_id: with tf.device('/gpu:%s' % d): with tf.name_scope('%s_%s' % ('tower' , d)): cross_entropy = build_train_model(image_splits[counter], label_splits[counter], for_training=True ) counter += 1 with tf.variable_scope("loss" ): grads = opt.compute_gradients(cross_entropy) tower_grads.append(grads) tower_loss.append(cross_entropy) tf.get_variable_scope().reuse_variables() mean_loss = tf.stack(axis=0 , values=tower_loss) mean_loss = tf.reduce_mean(mean_loss, 0 ) mean_grads = util.average_gradients(tower_grads) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = opt.apply_gradients(mean_grads, global_step=global_step)
APIs accuracy
accuracy, update_op = tf.metrics.accuracy(labels=x, predictions=y)
tf.metrics.accuracy**返回**两个值,**accuracy**为到上一个batch为止的准确度,**update_op**为更新本批次后的准确度。
BN: tf.layers.BatchNormalization
tf.layers.batch_normalization
tf.keras.layers.BatchNormalization
tf.nn.batch_normalization
① tf.nn.moments 这个函数的输出就是BN需要的mean和variance。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 tensorflow中实现BN算法的各种函数 tf.nn.batch_normalization()是一个低级的操作函数,调用者需要自己处理张量的平均值和方差 tf.nn.fused_batch_norm是另一个低级的操作函数,和前者十分相似,不同之处在于它针对4维输入张量进行了优化,这是卷积神经网络中常见的情况,而前者tf.nn.batch_normalization则接受任何等级大于1的张量 tf.layers.batch_normalization是对先前操作的高级封装,最大的不同在于它负责创建和管理运行张量的均值和方差,并尽可可能地快速融合计算,通常,这个函数应该是你的默认选择 tf.contrib.layers.batch_norm是batch_norm的早期实现,其升级的核心api版本为(tf.layers.batch_normalization),不推荐使用它,因为它可能会在未来的版本中丢失. tf.nn.batch_norm_with_global_normalization是另一个被弃用的操作,现在这个函数会委托给tf.nn.batch_normalization执行,在未来这个函数会被放弃 keras.layers.BatchNormalization是BN算法的keras实现,这个函数在后端会调用tensorflow的tf.nn.batch_normalization函数 ———————————————— 版权声明:本文为CSDN博主「wanghua609」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。 原文链接:https://blog.csdn.net/weixin_38145317/article/details/96132250
ref:
tensorflow 中batch normalize 的使用(Slim.BN) https://blog.csdn.net/jiruiyang/article/details/77202674
BN 详解和使用Tensorflow实现(参数理解)https://www.cnblogs.com/WSX1994/p/10949079.html
训练技巧 BN is_training
tf.GraphKeys.UPDATE_OPS
EMA
loss l2_loss
tower_loss
grad average_grad
ISSUEs 1、tf.gradients VS tf.train.Optimizer.computer_gradients tf.gradients 不允许您计算雅可比矩阵,它会汇总每个输出的梯度(类似于实际雅可比矩阵的每列的总和)。事实上,在TensorFlow中没有“好”的计算Jacobians的方法(基本上你必须每个输出调用tf.gradients一次,see this issue )。
关于tf.train.Optimizer.compute_gradients ,是的,其结果基本相同,但是自动处理一些细节并以稍微方便的输出格式。如果你看看the implementation ,你会发现它的核心是对tf.gradients(在这种情况下是别名gradients.gradients)的调用,但是对于优化器实现来说已经实现了周围的逻辑非常有用。另外,将它作为一种方法允许子类中的可扩展行为,以实现某种优化策略(不太可能在步骤compute_gradients步骤中)或辅助目的(如跟踪或调试)。
2、梯度裁剪
tf.clip_by_global_norm(gradients, max_gradient_norm)
t_list 表示梯度张量
clip_norm是截取的比率
global_norm = sqrt(sum(l2norm(t)**2 for t in t_list))
t_list[i] = t_list[i] * clip_norm / max(global_norm, clip_norm)
也就是分为两步:
计算所有梯度的平方和global_norm
如果梯度平方和 global_norm 超过我们指定的clip_norm,那么就对梯度进行缩放;否则就按照原本的计算结果,这个应该很好理解。
https://bigquant.com/community/t/topic/121493
Keras 1 2 3 from tensorflow.python.keras.layers import Layer