当前位置：首页 > news >正文

杭州模板网站建设phpcms律师网站模板

news 2025/11/14 22:27:09

杭州模板网站建设,phpcms律师网站模板,做网站公司做网站公司,怎么做网站优文章目录前言旧模型训练新模型训练参数查看组合参数训练学习模型评估前言在机器学习-随机森林算法预测温度一文中#xff0c;通过增大模型训练数据集和训练特征的方式去优化模型的性能#xff0c;本文将记录第三方种优化方式#xff0c;通过调整随机森林创建模型参数的方… 文章目录前言旧模型训练新模型训练参数查看组合参数训练学习模型评估前言在机器学习-随机森林算法预测温度一文中通过增大模型训练数据集和训练特征的方式去优化模型的性能本文将记录第三方种优化方式通过调整随机森林创建模型参数的方式去优化模型即调参。这里调参和神经网络使用验证集调整超参数概念不太一样所以不会去使用验证集。本文调参将使用RandomizedSearchCV函数去交叉验证不同参数组合的模型性能选择最优性能的参数组合模型。旧模型训练为了缩短参数训练的时间对比旧模型将先用部分数据 2016年的不含ws_1、prcp_1、snwd_1三个特征值的参见文章机器学习-随机森林算法预测温度其评估结果如下误差是 4.16 score 0.843355562598595 MAE是: 4.16409589041096 MSE是: 26.98129152054795 RMSE是 5.194351886477075新模型训练数据集和特征选择和旧模型保持一致只通过调整模型构建参数进行调优。前面的数据探索性分析和数据预处理都一样。从构建模型开始有了变化参数查看 # 建立随机森林模型 from sklearn.ensemble import RandomForestRegressor # 建立预测模型 rf RandomForestRegressor(random_state42) from pprint import pprint # 格式化方式打印json数据 pprint(rf.get_params())输出如下 {bootstrap: True,ccp_alpha: 0.0,criterion: squared_error,max_depth: None,max_features: 1.0,max_leaf_nodes: None,max_samples: None,min_impurity_decrease: 0.0,min_samples_leaf: 1,min_samples_split: 2,min_weight_fraction_leaf: 0.0,monotonic_cst: None,n_estimators: 100,n_jobs: None,oob_score: False,random_state: 42,verbose: 0,warm_start: False}通过查看官网api官网地址发现每个参数都是可以指定的当然指定不同的参数结果肯定也不一样下面将构造一种参数范围让随机森林自己去学习训练评估出最佳参数组合。组合参数 from sklearn.model_selection import RandomizedSearchCV n_estimators [int(x) for x in np.linspace(start100,stop1000,num10)] max_features [1.0,sqrt,log2] max_depth [int(x) for x in np.linspace(10,200,10)] max_depth.append(None) min_samples_split [2,5,10] min_samples_leaf [1,2,4] bootstrap [True,False]random_param {bootstrap: bootstrap,max_depth: max_depth,max_features: max_features,min_samples_leaf: min_samples_leaf,min_samples_split: min_samples_split,n_estimators: n_estimators}上面只是一种可能的参数组合范围参照api文档进行简单枚举训练学习 rf_random RandomizedSearchCV(estimatorrf,param_distributionsrandom_param,n_iter100,scoringneg_mean_absolute_error,cv3,random_state42) rf_random.fit(train_features,train_labels)模型将开始训练如下图等训练程序跑完打印训练学习后的最佳参数 pprint(rf_random.best_params_)如下 {bootstrap: True,max_depth: 73,max_features: 1.0,min_samples_leaf: 2,min_samples_split: 10,n_estimators: 600}模型评估由于代码重复出现对评估代码进行封装 def evaluate(model, test_features, test_labels):pre model.predict(test_features)errors abs(pre - test_labels)print(误差是, round(np.mean(errors), 2))# 得分score model.score(test_features, test_labels)print(score, score)import sklearn.metrics as smprint(MAE是:, sm.mean_absolute_error(pre, test_labels))print(MSE是:, sm.mean_squared_error(pre, test_labels))print(RMSE是, np.sqrt(sm.mean_squared_error(pre, test_labels)))执行评估 best_model rf_random.best_estimator_ evaluate(best_model,test_features,test_labels)结果如下误差是 4.06 得分 0.852906033295568 MAE是: 4.061986168567313 MSE是: 25.336266403102137 RMSE是 5.033514319350064可以看到和一开始的旧模型评估结果相比性能得到了一定幅度提升。

查看全文

http://www.zqtcl.cn/news/848979/