当前位置：首页 > news >正文

邓海舟网站建设教程手机app界面设计论文

news 2025/11/15 10:36:46

邓海舟网站建设教程,手机app界面设计论文,网站建设编程时注意事项,长沙网站优化步骤之前已经学到了很多监督学习算法#xff0c; 今天的监督学习算法是支持向量机#xff0c;与逻辑回归和神经网络算法相比#xff0c;它在学习复杂的非线性方程时提供了一种更为清晰#xff0c;更强大的方式。 Support Vector Machines SVM hypothesis Example Dataset 1…之前已经学到了很多监督学习算法今天的监督学习算法是支持向量机与逻辑回归和神经网络算法相比它在学习复杂的非线性方程时提供了一种更为清晰更强大的方式。 Support Vector Machines SVM hypothesis Example Dataset 1 import numpy as np import pandas as pd import matplotlib.pyplot as plt import scipy from scipy.io import loadmat from sklearn import svm mat loadmat(ex6data1.mat) print(mat.keys()) X mat[X] y mat[y]def plot_data(X, y):plt.figure(figsize(6, 4))plt.scatter(X[:, 0], X[:, 1], cy.flatten(), cmaprainbow)plt.xlabel(X1)plt.ylabel(X2)plt.legend()plot_data(X, y) plt.show()def plot_boundary(clf, X):x_min, x_max X[:, 0].min() * 1.2, X[:, 0].max() * 1.1y_min, y_max X[:, 1].min() * 1.1, X[:, 1].max() * 1.1xx, yy np.meshgrid(np.linspace(x_min, x_max, 500),np.linspace(y_min, y_max, 500))Z clf.predict(np.c_[xx.ravel(), yy.ravel()])Z Z.reshape(xx.shape)plt.contour(xx, yy, Z)models [svm.SVC(C, kernellinear) for C in [1, 100]] clfs [model.fit(X, y.ravel()) for model in models] title [SVM Decision Boundary with C {} (Example Dataset 1.format(C) for C in [1, 100]] for model, title in zip(clfs, title):plt.figure(figsize(8, 5))plot_data(X, y)plot_boundary(model, X)plt.title(title)plt.show()SVM with Gaussian Kernels Gaussian Kernel def gauss_kernel(x1, x2, sigma):return np.exp(- ((x1 - x2) ** 2).sum() / (2 * sigma ** 2))Example Dataset 2 mat loadmat(ex6data2.mat) X2 mat[X] y2 mat[y] plot_data(X2, y2)sigma 0.1 gamma np.power(sigma, -2.)/2 clf svm.SVC(C1, kernelrbf, gammagamma) modle clf.fit(X2, y2.flatten()) plot_data(X2, y2) plot_boundary(modle, X2)Example Dataset 3 mat3 loadmat(ex6data3.mat) X3, y3 mat3[X], mat3[y] Xval, yval mat3[Xval], mat3[yval] plot_data(X3, y3)Spam Classification Preprocessing Emails with open(emailSample1.txt, r) as f:email f.read()print(email)# 做除了Word Stemming和Removal of non-words的所有处理 def process_email(email):email email.lower()email re.sub([^], , email) # 匹配开头然后所有不是 , 的内容知道结尾相当于匹配...email re.sub((http|https)://[^\s]*, httpaddr, email ) # 匹配//后面不是空白字符的内容遇到空白字符则停止email re.sub([^\s][^\s], emailaddr, email)email re.sub([\$], dollar, email)email re.sub([\d], number, email)return email# 预处理数据返回一个干净的单词列表 def email2TokenList(email):# Ill use the NLTK stemmer because it more accurately duplicates the# performance of the OCTAVE implementation in the assignmentstemmer nltk.stem.porter.PorterStemmer()email process_email(email)# 将邮件分割为单个单词re.split() 可以设置多种分隔符tokens re.split([ \\$\/\#\.\-\:\\*\\\[\]\?\!\{\}\,\\\\_\\;\%], email)# 遍历每个分割出来的内容tokenlist []for token in tokens:# 删除任何非字母数字的字符token re.sub([^a-zA-Z0-9], , token);# Use the Porter stemmer to 提取词根stemmed stemmer.stem(token)# 去除空字符串‘’里面不含任何字符if not len(token): continuetokenlist.append(stemmed)return tokenlistVocabulary List # 提取存在单词的索引 def email2VocabIndices(email, vocab):token email2TokenList(email)index [i for i in range(len(vocab)) if vocab[i] in token ]return indexExtracting Features from Emails # 将email转化为词向量n是vocab的长度。存在单词的相应位置的值置为1其余为0 def email2FeatureVector(email):df pd.read_table(data/vocab.txt,names[words])vocab df.as_matrix() # return arrayvector np.zeros(len(vocab)) # init vectorvocab_indices email2VocabIndices(email, vocab) # 返回含有单词的索引# 将有单词的索引置为1for i in vocab_indices:vector[i] 1return vectorTraining SVM for Spam Classification vector email2FeatureVector(email) print(length of vector {}\nnum of non-zero {}.format(len(vector), int(vector.sum())))# 2.3 Training SVM for Spam Classification # Training set mat1 loadmat(spamTrain.mat) X, y mat1[X], mat1[y]# Test set mat2 scipy.io.loadmat(spamTest.mat) Xtest, ytest mat2[Xtest], mat2[ytest]clf svm.SVC(C0.1, kernellinear) clf.fit(X, y) Top Predictors for Spam predTrain clf.score(X, y) predTest clf.score(Xtest, ytest) predTrain, predTest参数对算法的影响 C 1/λ 大C 低偏差高方差对应低λ 小C 高偏差低方差对应高λ 大δ^2: 分布更平滑高偏差低方差小δ^2: 分布更集中地偏差高方差使用SVM 的步骤使用SVM软件库去求解参数θ Need to specify: choice of parameter Cchoice of kernel (similarity function): eg No kernel(‘linear kernel’) Gaussian kernel need to choose θ^2 logistic vs SVM n为特征数m为训练样本数。 (1)如果相较于而言要大许多即训练集数据量不够支持我们训练一个复杂的非线性模型我们选用逻辑回归模型或者不带核函数的支持向量机。 (2)如果较小而且大小中等例如在 1-1000 之间而在10-10000之间使用高斯核函数的支持向量机。 (3)如果较小而较大例如在1-1000之间而大于50000则使用支持向量机会非常慢解决方案是创造、增加更多的特征然后使用逻辑回归或不带核函数的支持向量机。值得一提的是神经网络在以上三种情况下都可能会有较好的表现但是训练神经网络可能非常慢选择支持向量机的原因主要在于它的代价函数是凸函数不存在局部最小值。

查看全文

http://www.zqtcl.cn/news/977701/