当前位置: 首页 > news >正文

潍坊住房和城乡建设厅网站电话wordpress插件代码

潍坊住房和城乡建设厅网站电话,wordpress插件代码,网站建设布局利于优化,知识库主题 wordpress文章目录1. 逻辑回归二分类2. 垃圾邮件过滤2.1 性能指标2.2 准确率2.3 精准率、召回率2.4 F1值2.5 ROC、AUC3. 网格搜索调参4. 多类别分类5. 多标签分类5.1 多标签分类性能指标本文为 scikit-learn机器学习#xff08;第2版#xff09;学习笔记逻辑回归常用于分类任务 1. 逻… 文章目录1. 逻辑回归二分类2. 垃圾邮件过滤2.1 性能指标2.2 准确率2.3 精准率、召回率2.4 F1值2.5 ROC、AUC3. 网格搜索调参4. 多类别分类5. 多标签分类5.1 多标签分类性能指标本文为 scikit-learn机器学习第2版学习笔记逻辑回归常用于分类任务 1. 逻辑回归二分类 《统计学习方法》逻辑斯谛回归模型 Logistic RegressionLR 定义设 XXX 是连续随机变量 XXX 服从 logistic 分布是指 XXX 具有下列分布函数和密度函数 F(x)P(X≤x)11e−(x−μ)/γF(x) P(X \leq x) \frac{1}{1e^{{-(x-\mu)} / \gamma}}F(x)P(X≤x)1e−(x−μ)/γ1​ f(x)F′(x)e−(x−μ)/γγ(1e−(x−μ)/γ)2f(x)F(x) \frac {e^{{-(x-\mu)} / \gamma}}{\gamma {(1e^{{-(x-\mu)}/\gamma})}^2}f(x)F′(x)γ(1e−(x−μ)/γ)2e−(x−μ)/γ​ 在逻辑回归中当预测概率 阈值预测为正类否则预测为负类 2. 垃圾邮件过滤 从信息中提取 TF-IDF 特征并使用逻辑回归进行分类 import pandas as pd data pd.read_csv(SMSSpamCollection, delimiter\t,headerNone) datadata[data[0]ham][0].count() # 4825 条正常信息 data[data[0]spam][0].count() # 747 条垃圾信息import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split, cross_val_scoreX data[1].values y data[0].values from sklearn.preprocessing import LabelBinarizer lb LabelBinarizer() y lb.fit_transform(y)X_train_raw, X_test_raw, y_train, y_test train_test_split(X, y, random_state520)vectorizer TfidfVectorizer() X_train vectorizer.fit_transform(X_train_raw) X_test vectorizer.transform(X_test_raw)classifier LogisticRegression() classifier.fit(X_train, y_train)pred classifier.predict(X_test) for i, pred_i in enumerate(pred[:5]):print(预测为%s, 信息为%s,真实为%s %(pred_i,X_test_raw[i],y_test[i]))预测为0, 信息为Aww thats the first time u said u missed me without asking if I missed u first. You DO love me! :),真实为[0] 预测为0, 信息为Poor girl cant go one day lmao,真实为[0] 预测为0, 信息为Also remember the beads dont come off. Ever.,真实为[0] 预测为0, 信息为I see the letter B on my car,真实为[0] 预测为0, 信息为My love ! How come it took you so long to leave for Zahers? I got your words on ym and was happy to see them but was sad you had left. I miss you,真实为[0]2.1 性能指标 混淆矩阵 from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt confusion_matrix confusion_matrix(y_test, pred) plt.matshow(confusion_matrix) plt.rcParams[font.sans-serif] SimHei # 消除中文乱码 plt.title(混淆矩阵) plt.ylabel(真实) plt.xlabel(预测) plt.colorbar()2.2 准确率 scores cross_val_score(classifier, X_train, y_train, cv5) print(Accuracies: %s % scores) print(Mean accuracy: %s % np.mean(scores))Accuracies: [0.94976077 0.95933014 0.96650718 0.95215311 0.95688623] Mean accuracy: 0.9569274847434318准确率不是一个很合适的性能指标它不能区分预测错误是正预测为负还是负预测为正 2.3 精准率、召回率 可以参考 [Hands On ML] 3. 分类MNIST手写数字预测 单独只看精准率或者召回率是没有意义的 from sklearn.metrics import precision_score, recall_score, f1_score precisions precision_score(y_test, pred) print(Precision: %s % precisions) recalls recall_score(y_test, pred) print(Recall: %s % recalls)Precision: 0.9852941176470589 预测为垃圾信息的基本上真的是垃圾信息Recall: 0.6979166666666666 有30%的垃圾信息预测为了非垃圾信息2.4 F1值 F1 值是以上精准率和召回率的均衡 f1s f1_score(y_test, pred) print(F1 score: %s % f1s) # F1 score: 0.81707317073170742.5 ROC、AUC 好的分类器AUC面积越接近1越好随机分类器AUC面积为0.5 from sklearn.metrics import roc_curve from sklearn.metrics import roc_auc_scorefalse_positive_rate, recall, thresholds roc_curve(y_test, pred) roc_auc_score roc_auc_score(y_test, pred)plt.title(受试者工作特性) plt.plot(false_positive_rate, recall, b, labelAUC %0.2f % roc_auc_score) plt.legend(loclower right) plt.plot([0, 1], [0, 1], r--) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.0]) plt.ylabel(Recall) plt.xlabel(Fall-out) plt.show()3. 网格搜索调参 import pandas as pd from sklearn.preprocessing import LabelEncoder from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.model_selection import train_test_split from sklearn.metrics import precision_score, recall_score, accuracy_scorepipeline Pipeline([(vect, TfidfVectorizer(stop_wordsenglish)),(clf, LogisticRegression()) ]) parameters {vect__max_df: (0.25, 0.5, 0.75), # 模块name__参数namevect__stop_words: (english, None),vect__max_features: (2500, 5000, None),vect__ngram_range: ((1, 1), (1, 2)),vect__use_idf: (True, False),clf__penalty: (l1, l2),clf__C: (0.01, 0.1, 1, 10), }if __name__ __main__:df pd.read_csv(./SMSSpamCollection, delimiter\t, headerNone)X df[1].valuesy df[0].valueslabel_encoder LabelEncoder()y label_encoder.fit_transform(y)X_train, X_test, y_train, y_test train_test_split(X, y)grid_search GridSearchCV(pipeline, parameters, n_jobs-1, verbose1, scoringaccuracy, cv3)grid_search.fit(X_train, y_train)print(Best score: %0.3f % grid_search.best_score_)print(Best parameters set:)best_parameters grid_search.best_estimator_.get_params()for param_name in sorted(parameters.keys()):print(\t%s: %r % (param_name, best_parameters[param_name]))predictions grid_search.predict(X_test)print(Accuracy: %s % accuracy_score(y_test, predictions))print(Precision: %s % precision_score(y_test, predictions))print(Recall: %s % recall_score(y_test, predictions))Best score: 0.985 Best parameters set:clf__C: 10clf__penalty: l2vect__max_df: 0.5vect__max_features: 5000vect__ngram_range: (1, 2)vect__stop_words: Nonevect__use_idf: True Accuracy: 0.9791816223977028 Precision: 1.0 Recall: 0.8605769230769231调整参数后提高了召回率 4. 多类别分类 电影情绪评价预测 data pd.read_csv(./chapter5_movie_train.csv,header0,delimiter\t) datadata[Sentiment].describe()count 156060.000000 mean 2.063578 std 0.893832 min 0.000000 25% 2.000000 50% 2.000000 75% 3.000000 max 4.000000 Name: Sentiment, dtype: float64平均都是比较中立的情绪 data[Sentiment].value_counts()/data[Sentiment].count()2 0.509945 3 0.210989 1 0.174760 4 0.058990 0 0.045316 Name: Sentiment, dtype: float6450% 的例子都是中立的情绪 from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, accuracy_score, confusion_matrix from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCVdf pd.read_csv(./chapter5_movie_train.csv, header0, delimiter\t) X, y df[Phrase], df[Sentiment].values X_train, X_test, y_train, y_test train_test_split(X, y, train_size0.5)pipeline Pipeline([(vect, TfidfVectorizer(stop_wordsenglish)),(clf, LogisticRegression()) ]) parameters {vect__max_df: (0.25, 0.5),vect__ngram_range: ((1, 1), (1, 2)),vect__use_idf: (True, False),clf__C: (0.1, 1, 10), }grid_search GridSearchCV(pipeline, parameters, n_jobs-1, verbose1, scoringaccuracy) grid_search.fit(X_train, y_train)print(Best score: %0.3f % grid_search.best_score_) print(Best parameters set:) best_parameters grid_search.best_estimator_.get_params() for param_name in sorted(parameters.keys()):print(\t%s: %r % (param_name, best_parameters[param_name]))Best score: 0.619 Best parameters set:clf__C: 10vect__max_df: 0.25vect__ngram_range: (1, 2)vect__use_idf: False性能指标 predictions grid_search.predict(X_test)print(Accuracy: %s % accuracy_score(y_test, predictions)) print(Confusion Matrix:) print(confusion_matrix(y_test, predictions)) print(Classification Report:) print(classification_report(y_test, predictions))Accuracy: 0.6292323465333846 Confusion Matrix: [[ 1013 1742 682 106 11][ 794 5914 6275 637 49][ 196 3207 32397 3686 222][ 28 488 6513 8131 1299][ 1 59 548 2388 1644]] Classification Report:precision recall f1-score support0 0.50 0.29 0.36 35541 0.52 0.43 0.47 136692 0.70 0.82 0.75 397083 0.54 0.49 0.52 164594 0.51 0.35 0.42 4640accuracy 0.63 78030macro avg 0.55 0.48 0.50 78030 weighted avg 0.61 0.63 0.62 780305. 多标签分类 一个实例可以被贴上多个 labels 问题转换 实例的标签(假设为L1,L2)转换成L1 and L2,以此类推缺点产生很多种类的标签且模型只能训练数据中包含的类很多可能无法覆盖到对每个标签训练一个二分类器这个实例是L1吗是L2吗缺点忽略了标签之间的关系 5.1 多标签分类性能指标 汉明损失不正确标签的平均比例0最好杰卡德相似系数预测与真实标签的交集数量 / 并集数量1最好 from sklearn.metrics import hamming_loss, jaccard_score # help(jaccard_score)print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[0.0, 1.0], [1.0, 1.0]])))print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [1.0, 1.0]])))print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [0.0, 1.0]])))print(jaccard_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[0.0, 1.0], [1.0, 1.0]]),averageNone))print(jaccard_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [1.0, 1.0]]),averageNone))print(jaccard_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [0.0, 1.0]]),averageNone))0.0 0.25 0.5 [1. 1.] [0.5 1. ] [0. 1.]
http://www.zqtcl.cn/news/354129/

相关文章:

  • 高校网站建设研究意义餐饮vi设计案例
  • 触屏手机网站网站建设功能模块价格
  • 类似携程网的网站wordpress文章摘要调用
  • 好网站建设公司开发方案联盟营销的网络营销方式
  • logo免费生成网站洛阳网络建站公司
  • 建设工程部网站百度指数功能
  • 个人网站 商业时事新闻2022最新10月
  • 不会代码 怎么做网站网站视频管理系统
  • 网站空间 流量网上卡片制作
  • 网站排名seo软件机关网站源码
  • 网站手机端页面怎么做手机之家
  • 成都电子商务网站大庆城市投资建设网站
  • 电子商务网站费用wordpress 怎么手动更新
  • 中国空间站设计在轨飞行多少年南昌网站建设风格
  • 用php写的网站有哪些暖暖 视频 在线 观看 高清
  • 云空间网站怎么做海南旅游网网页制作
  • 常宁网站免费的ai作图软件
  • 网站建设讲师招聘如何做电商产品推广
  • 让百度收录网站网站开发流程进度表
  • 有几个网站能在百度做推广产品开发管理系统
  • 一个网站项目的价格表dz论坛seo
  • 企业做网站要多少钱哪个网站做动图
  • 知名企业网站例子4s店网站模板
  • 网站建设的信息安全防范技术初级买题做哪个网站好
  • 品牌营销网站建设东莞智通人才招聘网
  • 莒县建设局网站好的网站具备什么条件
  • 威海网站建设怎么样网上怎么推销自己的产品
  • 网站做SEO优化网站建设背景图片大小的修改
  • 看企业网站怎么做到百度秒收WordPress怎么可以上传图片
  • 欧洲手表网站简述jsp网站架构