Jeremy

tfprof A Profiling Tool for TensorFlow Models

为了提高模型的性能,有时候我们需要分析一个模型各个组成op的内存占用和时间耗费。而这个需求,Tensorflow已经在最新的版本中提供了一个工具来满足,那就是TFPROF。

这个工具位于tensorflow/tools/tfprof文件夹中,它的主要特性如下:

  1. Measure model parameters, float operations, tensor shapes.
  2. Measure op execution times, requested memory size and device placement.
  3. Inspect checkpoint tensors’ shapes and their values.
  4. Explore model based on name scope or graph structure.
  5. Selectively grouping/filtering/accounting/ordering ops.

下面我们主要介绍它的Python接口。

检查变量的shape和size

1
2
3
4
5
6
7
8
# Print trainable variable parameter statistics to stdout.
param_stats = tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
tfprof_options=tf.contrib.tfprof.model_analyzer.
TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
# param_stats is tensorflow.tfprof.TFProfNode proto. It organize the statistics
# of each graph node in tree scructure. Let's print the root below.
sys.stdout.write('total_params: %d\n' % param_stats.total_parameters)

检查浮点操作的数量

1
2
3
4
5
6
7
8
9
10
# Print to stdout an analysis of the number of floating point operations in the
# model broken down by individual operations.
#
# Note: Only Ops with RegisterStatistics('flops') defined have flop stats. It
# also requires complete shape information. It is common that shape is unknown
# statically. To complete the shape, provide run-time shape information with
# tf.RunMetadata to the API (See next example on how to provide RunMetadata).
tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
tfprof_options=tf.contrib.tfprof.model_analyzer.FLOAT_OPS_OPTIONS)

检查运行时间和内存使用

要检查各个OP的运行时间和内存使用,我们需要进行如下的设置:

1
2
3
4
5
6
7
# Generate the meta information for the model that contains the memory usage
# and timing information.
run_metadata = tf.RunMetadata()
with tf.Session() as sess:
_ = sess.run(train_op,
options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
run_metadata = run_metadata)

最后,我们可以运行

1
2
3
4
5
6
7
8
9
10
``` python
# Print to stdout an analysis of the memory usage and the timing information
# from running the graph broken down by operations.
tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
run_meta=run_metadata,
tfprof_options=tf.contrib.tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)
)

其中tfprof_options选项是一个dict,我们上面让它等于PRINT_ALL_TIMING_MEMORY,这是个预定义好的dict,具体如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
PRINT_ALL_TIMING_MEMORY = {
'max_depth': 10000,
'min_bytes': 1, # Only >=1
'min_micros': 1, # Only >=1
'min_params': 0,
'min_float_ops': 0,
'device_regexes': ['.*'],
'order_by': 'name',
'account_type_regexes': ['.*'],
'start_name_regexes': ['.*'],
'trim_name_regexes': [],
'show_name_regexes': ['.*'],
'hide_name_regexes': [],
'account_displayed_op_only': True,
'select': ['micros', 'bytes'],
'viz': False,
'dump_to_file': ''
}

我们可以需要修改这个tfprof_options内部的值。

Refs: