当前位置：首页 > news >正文

宝安电子厂做高端网站内网网站建设汇报

news 2025/11/14 22:17:59

宝安电子厂做高端网站,内网网站建设汇报,手机广告设计软件,1688官网电脑版推荐算法的先验算法的连接So here we are diving into the world of data mining this time, let’s begin with a small but informative definition;因此#xff0c;这一次我们将进入数据挖掘的世界#xff0c;让我们从一个小的但内容丰富的定义开始#xff1b; 什么是数…推荐算法的先验算法的连接So here we are diving into the world of data mining this time, let’s begin with a small but informative definition; 因此这一次我们将进入数据挖掘的世界让我们从一个小的但内容丰富的定义开始什么是数据挖掘 (What is data mining ?!) It’s technically a profound dive into datasets searching for some correlations, rules, anomaly detection and the list goes on. It’s a way to do some simple but effective machine learning instead of doing it the hard way like using regular neural networks or the ultimate complex version that is convolutions and recurrent neural networks (we will definitely go through that thoroughly in future articles). 从技术上讲这是对数据集的深入研究以寻找一些相关性规则异常检测并且列表还在继续。这是一种进行简单但有效的机器学习的方法而不是像使用常规神经网络或卷积和递归神经网络这样的终极复杂版本那样艰苦的方法来完成它(我们肯定会在以后的文章中全面介绍)。 Data mining algorithms vary from one to another, each one has it’s own privileges and disadvantages, i will not go through that in this article but the first one you should focus on must be the classical Apriori Algorithm as it is the opening gate to the data mining world. 数据挖掘算法因人而异每种算法都有其自身的特权和劣势在本文中我不会进行介绍但是您应该关注的第一个算法必须是经典的Apriori算法因为它是数据的门户采矿世界。 But before going any further, there’s some special data mining vocabulary that we need to get familiar with : 但是在进一步介绍之前我们需要熟悉一些特殊的数据挖掘词汇 k-Itemsets : an itemset is just a set of items, the k refers to it’s order/length which means the number of items contained in the itemset. k-Itemsets一个项目集只是一组项目 k表示它的顺序/长度这意味着该项目集中包含的项目数。 Transaction : it is a captured data, can refer to purchased items in a store. Note that Apriori algorithm operates on datasets containing thousands or even millions of transactions. 交易它是捕获的数据可以参考商店中购买的物品。请注意Apriori算法对包含数千甚至数百万个事务的数据集进行操作。 Association rule : an antecedent → consequent relationship between two itemsets : 关联规则两个项目集之间的前→后关系 Implies the presence of the itemset Y (consequent) in the considered transaction given the itemset X (antecedent). 在给定项目集X(先行者)的情况下表示在考虑的事务中存在项目集Y(因此)。 Support : represents the popularity/frequency of an itemset, calculated this way : 支持表示项目集的受欢迎程度/频率通过以下方式计算 Confidence ( X → Y ) : shows how much a rule is confident/true, in other words the likelihood of having the consequent itemset in a transaction, calculated this way : 置信度(X→Y)显示一条规则置信度/真实度的多少换句话说在交易中拥有后续项集的可能性计算方式为 A rule is called a strong rule if its confidence is equal to 1. 如果规则的置信度等于1则称为强规则。 Lift ( X → Y ) : A measure of performance, indicates the quality of an association rule : 提升(X→Y)一种性能度量表示关联规则的质量 MinSup : a user-specified variable which stands for the minimum support threshold for itemsets. MinSup用户指定的变量代表项目集的最低支持阈值。 MinConf : a user-specified variable which stands for the minimum confidence threshold for rules. MinConf用户指定的变量代表规则的最小置信度阈值。 Frequent itemset : whose support is equal or higher than the chosen minsup. 频繁项目集支持等于或大于选择的minsup 。 Infrequent itemset : whose support is less than the chosen minsup. 不频繁项目集其支持小于所选的minsup 。那么... Apriori如何工作 (So…how does Apriori work ?) Starting with a historical glimpse, the algorithm was first proposed by the computer scientists Agrawal and Srikant in 1994, it proceeds this way : 从历史的一瞥开始该算法由计算机科学家Agrawal和Srikant于1994年首次提出它以这种方式进行 Generates possible combinations of k-itemsets (starts with k1) 生成k个项目集的可能组合(以k 1开头) Calculates support according to each itemset 根据每个项目集计算支持 Eliminates infrequent itemsets 消除不频繁的项目集 Increments k and repeats the process 递增k并重复该过程 Now, how to generate those itemsets ?!! 现在如何生成这些项目集 For itemsets of length k2, it is required to consider every possible combination of two items (no permutation is needed). For k 2, two conditions must be satisfied first : 对于长度为k 2的项目集需要考虑两个项目的每种可能的组合(不需要排列)。对于k 2 必须首先满足两个条件 The combined itemset must be formed of two frequent ones of length k-1, let’s call’em subsets. 组合的项目集必须由两个长度为k-1的频繁项组成我们称它们为em 子集。 Both subsets must have the same prefix of length k-2 两个子集必须具有相同的长度k-2前缀 If you think about it, these steps will just extend the previously found frequent itemsets, this is called the ‘bottom up’ approach. It also proves that Apriori algorithm respects the monotone property : 如果您考虑一下这些步骤将仅扩展先前发现的频繁项目集这称为“自下而上”方法。这也证明Apriori算法尊重单调性 All subsets of a frequent itemset must also be frequent. 频繁项目集的所有子集也必须是频繁的。 As well as the anti-monotone property : 以及抗单调特性 All super-sets of an infrequent itemset must also be infrequent. 罕见项目集的所有超集也必须是不频繁的。 Okay, but wait a minute, this seems infinite !! 好的但是等等这似乎是无限的 No, luckily it is not infinite, the algorithm stops at a certain order k if : 不幸运的是它不是无限的如果满足以下条件该算法将以某个顺序k停止 All the generated itemsets of length k are infrequent 生成的所有长度为k的项目集很少 No found prefix of length k-2 in common which makes it impossible to generate new itemsets of length k 找不到长度为k-2的前缀这使得无法生成长度为k的新项目集 Sure…it’s not rocket science ! but how about an example to make this clearer ? 当然……这不是火箭科学但是如何使这个例子更清楚呢 Here’s a small transaction table in binary format, the value of an item is 1 if it’s present in the considered transaction, otherwise it’s 0. 这是一个二进制格式的小交易表如果项目存在于所考虑的交易中则该项目的值为1 否则为0 。太好了……是时候进行一些关联规则挖掘了 (Great…It’s time for some association rule mining !) Once you reach this part, all there’s left to do is to take one frequent k-itemset at a time and generate all its possible rules using binary partitioning. 一旦达到这一部分剩下要做的就是一次获取一个频繁的k项集并使用二进制分区生成所有可能的规则。 If the 3-itemset {Almonds-Sugar-Milk} from the previous example were a frequent itemset, then the generated rules would look like : 如果前面示例中的3个项目集{Almonds-Sugar-Milk}是一个频繁项集则生成的规则将如下所示我的Apriori模拟概述使用Python (An overview of my Apriori simulation !! Using Python) 数据集 (Dataset) Of format csv (Comma separated values), containing 7501 transactions of purchased items in a supermarket. Restructuring the dataset with the transaction encoder class from mlxtend library made the use and manipulation much easier. The resulting structure is occupying an area of 871.8 KB with 119 columns indexed respectively by food name from “Almonds” to “Zucchini”. 格式为csv (逗号分隔值)包含在超市中的7501个已购买商品的交易。使用mlxtend库中的事务编码器类重构数据集使使用和操作更加容易。最终的结构占据了871.8 KB的区域其中119列分别由食品名称从杏仁到西葫芦索引。 Here’s an overview of the transaction table before and after : 这是之前和之后的事务表的概述实现算法 (Implementing the algorithm) I will not be posting any code fragments as it was a straight forward approach, the procedure is recursive, calls the responsible functions for the itemsets generation, support calculation, elimination and association rule mining in the mentioned order. 我不会发布任何代码片段因为这是一种直接的方法该过程是递归的并按上述顺序调用负责项集生成支持计算消除和关联规则挖掘的负责功能。 The execution took 177 seconds which seemed optimised and efficient thanks to Pandas and NumPy’s ability to perform quick element-wise operations. All found association rules were saved in an html file for later use. 由于Pandas和NumPy能够执行快速的按元素操作因此执行过程耗时177秒这似乎是优化和高效的。找到的所有关联规则都保存在html文件中以备后用。现在去超市逛逛怎么样通过Plotly使用Dash (Now, how about a tour in the supermarket ? Using Dash by Plotly) Finally, i got to use the previously saved rules to suggest food items based on what my basket contains. Here’s a quick preview : 最后我必须使用之前保存的规则根据购物篮中的食物来建议食物。快速预览 Feel free to check my source code here. 请在此处随意检查我的源代码。翻译自: https://medium.com/the-coded-theory/data-mining-a-focus-on-apriori-algorithm-b201d756c7ff推荐算法的先验算法的连接

查看全文

http://www.zqtcl.cn/news/809951/