杭州网站建设出名,蚌埠市建设工程质监站网站,小学生家长网站建设需求,物流网站建设的需求分析文章目录 重要提示设置model.predict(...) 和 pd_dataframe_to_tf_dataset 函数model.predict(...) 和手动的TF数据集model.predict(...)和model.predict_on_batch()在字典上的使用使用YDF格式进行推理 TensorFlow决策森林#xff08;
TF-DF#xff09;的
预测。 在本文中model.predict(...) 和 pd_dataframe_to_tf_dataset 函数model.predict(...) 和手动的TF数据集model.predict(...)和model.predict_on_batch()在字典上的使用使用YDF格式进行推理 TensorFlow决策森林
TF-DF的
预测。 在本文中您将学习使用
Python API使用之前训练过的
TF-DF模型生成预测的不同方法。 备注在本文中展示的Python API易于使用非常适合实验。然而其他API如TensorFlow Serving和C API更适合生产系统因为它们更快速和更稳定。所有Serving API的详尽列表可在这里找到。
在本文中您将会
使用model.predict()函数在使用pd_dataframe_to_tf_dataset创建的TensorFlow数据集上进行预测。使用model.predict()函数在手动创建的TensorFlow数据集上进行预测。使用model.predict()函数在Numpy数组上进行预测。使用CLI API进行预测。使用CLI API对模型的推理速度进行基准测试。
重要提示
用于预测的数据集应与用于训练的数据集具有相同的特征名称和类型。如果未能这样做很可能会引发错误。
例如使用两个特征f1和f2训练模型并尝试在没有f2的数据集上生成预测将失败。请注意将某些或全部特征值设置为“缺失”是可以的。同样如果训练一个f2是数值特征例如float32的模型并将该模型应用于f2是文本特征例如字符串的数据集将会失败。
尽管Keras API对其进行了抽象但在Python中实例化的模型例如使用tfdf.keras.RandomForestModel()和从磁盘加载的模型例如使用tf.keras.models.load_model()可能会有不同的行为。值得注意的是Python实例化的模型会自动应用必要的类型转换。例如如果将float64特征提供给期望float32特征的模型这种转换会隐式地执行。然而对于从磁盘加载的模型这种转换是不可能的。因此训练数据和推断数据的类型始终要完全相同。
设置
首先我们安装 TensorFlow Decision Forests…
# 安装tensorflow_decision_forests库
!pip install tensorflow_decision_forestsCollecting tensorflow_decision_forestsUsing cached tensorflow_decision_forests-1.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.2 MB)
Requirement already satisfied: wheel in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (0.37.1)
Requirement already satisfied: six in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.16.0)
Requirement already satisfied: absl-py in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.3.0)
Requirement already satisfied: tensorflow~2.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.11.0)
Collecting wurlitzerUsing cached wurlitzer-3.0.3-py3-none-any.whl (7.3 kB)
Requirement already satisfied: numpy in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.24.0rc2)
Requirement already satisfied: pandas in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.5.2)
Requirement already satisfied: tensorflow-estimator2.12,2.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (2.11.0)
Requirement already satisfied: h5py2.9.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (3.7.0)
Requirement already satisfied: wrapt1.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (1.14.1)
Requirement already satisfied: opt-einsum2.3.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (3.3.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem0.23.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (0.28.0)
Requirement already satisfied: libclang13.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (14.0.6)
Requirement already satisfied: packaging in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (22.0)
Requirement already satisfied: grpcio2.0,1.24.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (1.51.1)
Requirement already satisfied: typing-extensions3.6.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (4.4.0)
Requirement already satisfied: gast0.4.0,0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (0.4.0)
Requirement already satisfied: protobuf3.20,3.9.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (3.19.6)
Requirement already satisfied: termcolor1.1.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (2.1.1)
Requirement already satisfied: setuptools in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (65.6.3)
Requirement already satisfied: astunparse1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (1.6.3)
Requirement already satisfied: google-pasta0.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: flatbuffers2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (22.12.6)
Requirement already satisfied: keras2.12,2.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (2.11.0)
Requirement already satisfied: tensorboard2.12,2.11 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~2.11.0-tensorflow_decision_forests) (2.11.0)
Requirement already satisfied: python-dateutil2.8.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas-tensorflow_decision_forests) (2.8.2)
Requirement already satisfied: pytz2020.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas-tensorflow_decision_forests) (2022.6)
Requirement already satisfied: google-auth-oauthlib0.5,0.4.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (0.4.6)
Requirement already satisfied: werkzeug1.0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (2.2.2)
Requirement already satisfied: requests3,2.21.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (2.28.1)
Requirement already satisfied: google-auth3,1.6.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: markdown2.6.8 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (3.4.1)
Requirement already satisfied: tensorboard-data-server0.7.0,0.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (1.8.1)
Requirement already satisfied: pyasn1-modules0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth3,1.6.3-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (0.3.0rc1)
Requirement already satisfied: cachetools6.0,2.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth3,1.6.3-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (5.2.0)
Requirement already satisfied: rsa5,3.1.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth3,1.6.3-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (4.9)
Requirement already satisfied: requests-oauthlib0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth-oauthlib0.5,0.4.1-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (1.3.1)
Requirement already satisfied: importlib-metadata4.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown2.6.8-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (5.1.0)
Requirement already satisfied: idna4,2.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests3,2.21.0-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (3.4)
Requirement already satisfied: certifi2017.4.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests3,2.21.0-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (2022.12.7)
Requirement already satisfied: urllib31.27,1.21.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests3,2.21.0-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (1.26.13)
Requirement already satisfied: charset-normalizer3,2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests3,2.21.0-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (2.1.1)
Requirement already satisfied: MarkupSafe2.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from werkzeug1.0.1-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (2.1.1)
Requirement already satisfied: zipp0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from importlib-metadata4.4-markdown2.6.8-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (3.11.0)
Requirement already satisfied: pyasn10.6.0,0.4.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pyasn1-modules0.2.1-google-auth3,1.6.3-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (0.5.0rc2)
Requirement already satisfied: oauthlib3.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests-oauthlib0.7.0-google-auth-oauthlib0.5,0.4.1-tensorboard2.12,2.11-tensorflow~2.11.0-tensorflow_decision_forests) (3.2.2)
Installing collected packages: wurlitzer, tensorflow_decision_forests
Successfully installed tensorflow_decision_forests-1.1.0 wurlitzer-3.0.3…并导入此示例中使用的库。
# 导入所需的库
import tensorflow_decision_forests as tfdf # 导入决策森林库
import os # 导入操作系统库
import numpy as np # 导入numpy库用于数值计算
import pandas as pd # 导入pandas库用于数据处理
import tensorflow as tf # 导入tensorflow库用于构建和训练模型
import math # 导入math库用于数学计算
2022-12-14 12:06:51.603857: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library libnvinfer.so.7; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-14 12:06:51.603946: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library libnvinfer_plugin.so.7; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-14 12:06:51.603955: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.model.predict(...) 和 pd_dataframe_to_tf_dataset 函数
TensorFlow Decision Forests 实现了 Keras 模型 API。 因此TF-DF 模型具有 predict 函数用于进行预测。该函数以 TensorFlow Dataset 作为输入并输出一个预测数组。 创建 TensorFlow dataset 的最简单方法是使用 Pandas 和 tfdf.keras.pd_dataframe_to_tf_dataset(...) 函数。
下面的示例展示了如何使用 pd_dataframe_to_tf_dataset 创建一个 TensorFlow dataset。
# 创建一个名为pd_dataset的DataFrame对象
pd_dataset pd.DataFrame({feature_1: [1,2,3], # 创建一个名为feature_1的列包含值1,2,3feature_2: [a, b, c], # 创建一个名为feature_2的列包含值a,b,clabel: [0, 1, 0], # 创建一个名为label的列包含值0,1,0
})feature_1feature_2label01a012b123c0
# 将Pandas数据集转换为TensorFlow数据集
tf_dataset tfdf.keras.pd_dataframe_to_tf_dataset(pd_dataset, labellabel)# 遍历TensorFlow数据集中的每个样本
for features, label in tf_dataset:# 打印特征print(Features:, features)# 打印标签print(label:, label)Features: {feature_1: tf.Tensor: shape(3,), dtypeint64, numpyarray([1, 2, 3]), feature_2: tf.Tensor: shape(3,), dtypestring, numpyarray([ba, bb, bc], dtypeobject)}
label: tf.Tensor([0 1 0], shape(3,), dtypeint64)注意“pd_”代表“pandas”。 “tf_”代表“TensorFlow”。
TensorFlow数据集是一个输出值序列的函数。这些值可以是简单的数组称为张量也可以是组织成结构的数组例如组织在字典中的数组。
以下示例展示了在一个玩具数据集上进行训练和推断使用predict的过程
# 创建一个Pandas的训练数据集
pd_train_dataset pd.DataFrame({feature_1: np.random.rand(1000), # 创建一个包含1000个随机数的特征1列feature_2: np.random.rand(1000), # 创建一个包含1000个随机数的特征2列
})# 添加一个标签列标签值为特征1是否大于特征2的布尔值
pd_train_dataset[label] pd_train_dataset[feature_1] pd_train_dataset[feature_2] # 返回创建的训练数据集
pd_train_datasetfeature_1feature_2label00.6830350.952359False10.4866410.669202False20.6855800.967570False30.2338150.725952False40.2501870.503956False............9950.6766690.043817True9960.5648270.605345False9970.9969680.488901True9980.9873900.097840True9990.6921320.738431False
1000 rows × 3 columns
# 创建一个包含两个特征的数据集
pd_serving_dataset pd.DataFrame({feature_1: np.random.rand(500), # 创建一个包含500个随机数的特征1列feature_2: np.random.rand(500), # 创建一个包含500个随机数的特征2列
})# 输出数据集
pd_serving_datasetfeature_1feature_200.3264670.68915110.8074470.07519820.0950110.94767630.8513190.81910040.4883050.274047.........4950.4808030.2380474960.6335650.7229664970.9452470.1283794980.2679380.5034274990.1858480.901847
500 rows × 2 columns
让我们将Pandas数据框转换为TensorFlow数据集
# 将Pandas数据集转换为TensorFlow数据集
tf_train_dataset tfdf.keras.pd_dataframe_to_tf_dataset(pd_train_dataset, labellabel)# 将Pandas数据集转换为用于模型服务的TensorFlow数据集
tf_serving_dataset tfdf.keras.pd_dataframe_to_tf_dataset(pd_serving_dataset)我们现在可以在tf_train_dataset上训练一个模型
# 创建一个RandomForestModel对象并设置verbose参数为0不显示训练过程的详细信息
model tfdf.keras.RandomForestModel(verbose0)# 使用tf_train_dataset数据集对模型进行训练
model.fit(tf_train_dataset)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089[INFO 2022-12-14T12:06:58.98162849300:00 kernel.cc:1175] Loading model from path /tmpfs/tmp/tmp0b3hukdi/model/ with prefix 0234a68d9d6c49ee
[INFO 2022-12-14T12:06:59.01796168500:00 abstract_model.cc:1306] Engine RandomForestOptPred built
[INFO 2022-12-14T12:06:59.01799324400:00 kernel.cc:1021] Use fast generic engineWARNING:tensorflow:AutoGraph could not transform function simple_ml_inference_op_with_handle at 0x7f76793294c0 and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY10) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform function simple_ml_inference_op_with_handle at 0x7f76793294c0 and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY10) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with tf.autograph.experimental.do_not_convertkeras.callbacks.History at 0x7f76701969d0然后在tf_serving_dataset上生成预测结果
# 使用模型对tf_serving_dataset进行预测并打印出前10个预测结果
predictions model.predict(tf_serving_dataset, verbose0)[:10]
print(predictions)array([[0. ],[0.99999917],[0. ],[0.29666647],[0.99999917],[0. ],[0.99999917],[0.99999917],[0.99999917],[0. ]], dtypefloat32)model.predict(...) 和手动的TF数据集
在前一节中我们展示了如何使用pd_dataframe_to_tf_dataset函数创建一个TF数据集。这个选项简单但不适用于大型数据集。相反TensorFlow提供了几个选项来创建一个TensorFlow数据集。 下面的例子展示了如何使用tf.data.Dataset.from_tensor_slices()函数创建一个数据集。
# 创建一个数据集对象使用tf.data.Dataset.from_tensor_slices()方法将一个列表[1,2,3,4,5]转换为数据集
dataset tf.data.Dataset.from_tensor_slices([1,2,3,4,5])# 遍历数据集中的每个元素
for value in dataset:# 打印当前元素的值使用value.numpy()方法将Tensor对象转换为NumPy数组print(value:, value.numpy())value: 1
value: 2
value: 3
value: 4
value: 5TensorFlow 模型的训练采用小批量训练方式而不是逐个输入样本被分组成“批次”。对于神经网络批次大小会影响模型的质量最佳值需要在训练过程中由用户确定。对于决策森林批次大小对模型没有影响。然而为了兼容性的原因TensorFlow 决策森林要求数据集被分批处理。可以使用 batch() 函数进行分批处理。
# 创建一个数据集对象使用tf.data.Dataset.from_tensor_slices()方法将一个列表[1,2,3,4,5]转换为数据集
# 使用batch()方法将数据集分成大小为2的批次
dataset tf.data.Dataset.from_tensor_slices([1,2,3,4,5]).batch(2)# 遍历数据集中的每个批次
for value in dataset:# 打印当前批次的值使用numpy()方法将张量转换为numpy数组print(value:, value.numpy())value: [1 2]
value: [3 4]
value: [5]TensorFlow决策森林期望数据集具有以下两种结构之一
特征标签特征标签权重
特征可以是一个二维数组其中每列是一个特征每行是一个示例也可以是一个数组字典。
以下是一个与TensorFlow决策森林兼容的数据集示例
# 创建一个包含单个2D数组的数据集
tf_dataset tf.data.Dataset.from_tensor_slices(([[1,2],[3,4],[5,6]], # 特征[0,1,0], # 标签)).batch(2)# 遍历数据集中的每个批次
for features, label in tf_dataset:print(features:, features) # 打印特征print(label:, label) # 打印标签features: tf.Tensor(
[[1 2][3 4]], shape(2, 2), dtypeint32)
label: tf.Tensor([0 1], shape(2,), dtypeint32)
features: tf.Tensor([[5 6]], shape(1, 2), dtypeint32)
label: tf.Tensor([0], shape(1,), dtypeint32)# 创建一个包含特征字典的数据集
tf_dataset tf.data.Dataset.from_tensor_slices(({feature_1: [1,2,3], # 特征1feature_2: [4,5,6], # 特征2},[0,1,0], # 标签)).batch(2) # 批量大小为2# 遍历数据集中的每个批次
for features, label in tf_dataset:print(features:, features) # 打印特征字典print(label:, label) # 打印标签features: {feature_1: tf.Tensor: shape(2,), dtypeint32, numpyarray([1, 2], dtypeint32), feature_2: tf.Tensor: shape(2,), dtypeint32, numpyarray([4, 5], dtypeint32)}
label: tf.Tensor([0 1], shape(2,), dtypeint32)
features: {feature_1: tf.Tensor: shape(1,), dtypeint32, numpyarray([3], dtypeint32), feature_2: tf.Tensor: shape(1,), dtypeint32, numpyarray([6], dtypeint32)}
label: tf.Tensor([0], shape(1,), dtypeint32)让我们使用第二个选项来训练一个模型。
# 导入必要的库已经完成不需要再添加import语句
# 生成一个包含两个特征和一个标签的数据集
# 特征1和特征2都是100个随机数
# 标签是一个100个元素的布尔型数组每个元素都是随机生成的大于等于0.5为True小于0.5为False
tf_dataset tf.data.Dataset.from_tensor_slices(({feature_1: np.random.rand(100),feature_2: np.random.rand(100),},np.random.rand(100) 0.5, # Label)).batch(2)# 创建一个随机森林模型
# verbose0表示不输出训练过程中的详细信息
model tfdf.keras.RandomForestModel(verbose0)# 使用生成的数据集进行训练
model.fit(tf_dataset)[INFO 2022-12-14T12:07:00.41657576300:00 kernel.cc:1175] Loading model from path /tmpfs/tmp/tmpvzrrxxmw/model/ with prefix 0bc6f955d2d1456e
[INFO 2022-12-14T12:07:00.44051618600:00 kernel.cc:1021] Use fast generic enginekeras.callbacks.History at 0x7f75f016e220predict函数可以直接在训练数据集上使用
# 使用模型对tf_dataset进行预测verbose0表示不显示进度条
# 返回结果为前10个预测值
model.predict(tf_dataset, verbose0)[:10]array([[0.43666634],[0.58999956],[0.42999968],[0.73333275],[0.75666606],[0.20666654],[0.67666614],[0.66666615],[0.82333267],[0.3999997 ]], dtypefloat32)model.predict(...)和model.predict_on_batch()在字典上的使用
在某些情况下可以使用数组或数组字典而不是TensorFlow数据集来使用predict函数。
以下示例使用先前训练过的模型和一个NumPy数组字典。
# 使用模型对输入数据进行预测返回前10个预测结果
model.predict({feature_1: np.random.rand(100),feature_2: np.random.rand(100),}, verbose0)[:10]array([[0.6533328 ],[0.5399996 ],[0.2133332 ],[0.22999986],[0.16333325],[0.18333323],[0.3766664 ],[0.5066663 ],[0.20333321],[0.8633326 ]], dtypefloat32)在前面的示例中数组会自动分批处理。或者可以使用predict_on_batch函数来确保所有的示例都在同一个批次中运行。
# 获取前10个预测结果
model.predict_on_batch({feature_1: np.random.rand(100),feature_2: np.random.rand(100),})[:10]array([[0.54666626],[0.21666653],[0.18333323],[0.5299996 ],[0.5499996 ],[0.12666662],[0.6299995 ],[0.06000001],[0.33999977],[0.08999998]], dtypefloat32)**注意**如果predict在原始数据上无法工作例如上面的示例请尝试使用predict_on_batch函数或将原始数据转换为TensorFlow数据集。
使用YDF格式进行推理
这个例子展示了如何使用CLI API其他Serving APIs之一运行一个经过训练的TF-DF模型。我们还将使用Benchmark工具来测量模型的推理速度。
让我们先训练并保存一个模型
# 创建一个梯度提升树模型对象verbose参数设置为0表示不输出训练过程的详细信息
model tfdf.keras.GradientBoostedTreesModel(verbose0)# 将pandas的训练数据集转换为TensorFlow的数据集并指定label列作为标签
train_dataset tfdf.keras.pd_dataframe_to_tf_dataset(pd_train_dataset, labellabel)# 使用转换后的训练数据集来训练模型
model.fit(train_dataset)# 将训练好的模型保存到文件中
model.save(my_model)2022-12-14 12:07:00.950798: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1765] Subsample hyperparameter given but sampling method does not match.
2022-12-14 12:07:00.950839: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1778] GOSS alpha hyperparameter given but GOSS is disabled.
2022-12-14 12:07:00.950846: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1787] GOSS beta hyperparameter given but GOSS is disabled.
2022-12-14 12:07:00.950852: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1799] SelGB ratio hyperparameter given but SelGB is disabled.
[INFO 2022-12-14T12:07:01.16035765900:00 kernel.cc:1175] Loading model from path /tmpfs/tmp/tmpo37712qo/model/ with prefix 391746915b7842cb
[INFO 2022-12-14T12:07:01.16473684700:00 kernel.cc:1021] Use fast generic engine
WARNING:absl:Found untraced functions such as call_get_leaves, _update_step_xla while saving (showing 2 of 2). These functions will not be directly callable after loading.INFO:tensorflow:Assets written to: my_model/assetsINFO:tensorflow:Assets written to: my_model/assets让我们也将数据集导出为一个csv文件
# 将pd_serving_dataset保存为dataset.csv文件
pd_serving_dataset.to_csv(dataset.csv)让我们下载并提取Yggdrasil Decision Forests的CLI工具。
# 下载 Yggdrasil Decision Forests 的命令行工具
!wget https://github.com/google/yggdrasil-decision-forests/releases/download/1.0.0/cli_linux.zip# 解压缩下载的文件
!unzip cli_linux.zip--2022-12-14 12:07:01-- https://github.com/google/yggdrasil-decision-forests/releases/download/1.0.0/cli_linux.zip
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/360444739/bfcd0b9d-5cbc-42a8-be0a-02131875f9a6?X-Amz-AlgorithmAWS4-HMAC-SHA256X-Amz-CredentialAKIAIWNJYAX4CSVEH53A%2F20221214%2Fus-east-1%2Fs3%2Faws4_requestX-Amz-Date20221214T120701ZX-Amz-Expires300X-Amz-Signature94e7b8fd2c219cbe6305222b34f566360eb9fea8ea35e8303519f09b04744b93X-Amz-SignedHeadershostactor_id0key_id0repo_id360444739response-content-dispositionattachment%3B%20filename%3Dcli_linux.zipresponse-content-typeapplication%2Foctet-stream [following]
--2022-12-14 12:07:01-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/360444739/bfcd0b9d-5cbc-42a8-be0a-02131875f9a6?X-Amz-AlgorithmAWS4-HMAC-SHA256X-Amz-CredentialAKIAIWNJYAX4CSVEH53A%2F20221214%2Fus-east-1%2Fs3%2Faws4_requestX-Amz-Date20221214T120701ZX-Amz-Expires300X-Amz-Signature94e7b8fd2c219cbe6305222b34f566360eb9fea8ea35e8303519f09b04744b93X-Amz-SignedHeadershostactor_id0key_id0repo_id360444739response-content-dispositionattachment%3B%20filename%3Dcli_linux.zipresponse-content-typeapplication%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31516027 (30M) [application/octet-stream]
Saving to: ‘cli_linux.zip’cli_linux.zip 0%[ ] 0 --.-KB/s
cli_linux.zip 2%[ ] 727.40K 3.47MB/s
cli_linux.zip 13%[ ] 4.01M 9.90MB/s
cli_linux.zip 53%[ ] 16.01M 26.1MB/s
cli_linux.zip 100%[] 30.06M 38.2MB/s in 0.8s 2022-12-14 12:07:03 (38.2 MB/s) - ‘cli_linux.zip’ saved [31516027/31516027]Archive: cli_linux.zipinflating: README inflating: cli.txt inflating: train inflating: show_model inflating: show_dataspec inflating: predict inflating: infer_dataspec inflating: evaluate inflating: convert_dataset inflating: benchmark_inference inflating: edit_model inflating: synthetic_dataset inflating: grpc_worker_main inflating: LICENSE inflating: CHANGELOG.md 最后让我们进行预测
备注
TensorFlow决策森林TF-DF基于Yggdrasil决策森林YDF库并且TF-DF模型始终在内部包含一个YDF模型。将TF-DF模型保存到磁盘时TF-DF模型目录包含一个assets子目录其中包含YDF模型。此YDF模型可与所有YDF工具一起使用。在下一个示例中我们将使用predict和benchmark_inference工具。有关更多详细信息请参阅模型格式文档。YDF工具假定数据集的类型是使用前缀指定的例如csv:。有关更多详细信息请参阅YDF用户手册。
# 该代码是用于执行预测的脚本# 导入必要的库# 执行预测
# 使用./predict命令来执行预测
# --modelmy_model/assets参数指定了模型的路径
# --datasetcsv:dataset.csv参数指定了数据集的路径和格式
# --outputcsv:predictions.csv参数指定了预测结果的输出路径和格式
!./predict --modelmy_model/assets --datasetcsv:dataset.csv --outputcsv:predictions.csv[INFO abstract_model.cc:1296] Engine GradientBoostedTreesQuickScorerExtended built
[INFO predict.cc:133] Run predictions with semi-fast engine我们现在可以看一下预测结果
# 读取CSV文件predictions.csv并将其存储为一个DataFrame对象
data pd.read_csv(predictions.csv)1200.9667790.03322110.0317730.96822720.9667790.03322130.6000730.39992740.0308850.969115.........4950.0308850.9691154960.9482520.0517484970.0317730.9682274980.9669960.0330044990.9667790.033221
500 rows × 2 columns
模型的推理速度可以使用基准推理工具来测量。
**注意**在YDF版本1.1.0之前基准推理中使用的数据集需要有一个__LABEL列。
# 创建一个空的标签列
pd_serving_dataset[__LABEL] 0# 将数据集保存为csv文件
pd_serving_dataset.to_csv(dataset.csv)# 运行benchmark_inference脚本进行推理性能测试# 参数说明
# --model指定模型的路径这里是my_model/assets
# --dataset指定数据集的路径和格式这里是csv:dataset.csv表示数据集是以csv格式存储在dataset.csv文件中
# --batch_size指定每个推理批次的大小这里是100
# --warmup_runs指定预热运行的次数用于消除冷启动的影响这里是10次
# --num_runs指定总共运行的次数用于统计平均推理性能这里是50次
!./benchmark_inference \--modelmy_model/assets \--datasetcsv:dataset.csv \--batch_size100 \--warmup_runs10 \--num_runs50[INFO benchmark_inference.cc:245] Loading model
[INFO benchmark_inference.cc:248] The model is of type: GRADIENT_BOOSTED_TREES
[INFO benchmark_inference.cc:250] Loading dataset
[INFO benchmark_inference.cc:259] Found 3 compatible fast engines.
[INFO benchmark_inference.cc:262] Running GradientBoostedTreesGeneric
[INFO decision_forest.cc:639] Model loaded with 27 root(s), 1471 node(s), and 2 input feature(s).
[INFO benchmark_inference.cc:262] Running GradientBoostedTreesQuickScorerExtended
[INFO benchmark_inference.cc:262] Running GradientBoostedTreesOptPred
[INFO decision_forest.cc:639] Model loaded with 27 root(s), 1471 node(s), and 2 input feature(s).
[INFO benchmark_inference.cc:268] Running the slow generic engine
batch_size : 100 num_runs : 50
time/example(us) time/batch(us) method
----------------------------------------0.22425 22.425 GradientBoostedTreesOptPred [virtual interface]0.2465 24.65 GradientBoostedTreesQuickScorerExtended [virtual interface]0.6875 68.75 GradientBoostedTreesGeneric [virtual interface]1.825 182.5 Generic slow engine
----------------------------------------在这个基准测试中我们可以看到不同推理引擎的推理速度。例如“time/example(us) 0.6315”在不同运行中可能会有所变化表示一个示例的推理需要0.63微秒。也就是说模型每秒可以运行约160万次。
**注意**TF-DF和其他API总是会自动选择可用的最快推理引擎。