合肥做网站公司有哪些,公司网站建设内部调查,营销型网站需要注意,wordpress安装502我有一个带有多个速度值的熊猫数据帧#xff0c;这些速度值是连续移动的值#xff0c;但它是一个传感器数据#xff0c;因此我们经常在中间出现误差的情况下#xff0c;移动平均值似乎也无济于事#xff0c;所以我可以采用什么方法用于从数据中删除这些离群值或峰点#…我有一个带有多个速度值的熊猫数据帧这些速度值是连续移动的值但它是一个传感器数据因此我们经常在中间出现误差的情况下移动平均值似乎也无济于事所以我可以采用什么方法用于从数据中删除这些离群值或峰点
例
data points{0.5,0.5,0.7,0.6,0.5,0.7,0.5,0.4,0.6,4,0.5,0.5,4,5,6,0.4,0.7,0.8,0.9}
在此数据中如果我看到点4、4、5、6完全是离群值那么在我使用具有5分钟窗框的滚动平均值来平滑这些值之前但仍然得到了这些类型的亮点我想删除它有人可以建议我采取任何技术摆脱这些问题。
我有一张图片可以更清晰地查看数据
如果您在此处看到数据如何显示一些必须删除的离群点有什么想法摆脱这些问题的可能方法是什么
解决方案
I really think z-score using scipy.stats.zscore() is the way to go here. Have a look at the related issue in this post. There they are focusing on which method to use before removing potential outliers. As I see it, your challenge is a bit simpler, since judging by the data provided, it would be pretty straight forward to identify potential outliers without having to transform the data. Below is a code snippet that does just that. Just remember though, that what does and does not look like outliers will depend entirely on your dataset. And after removing some outliers, what has not looked like an outlier before, suddenly will do so now. Have a look:
importmatplotlib.pyplotaspltimportpandasaspdimportnumpyasnpfromscipyimportstats# your data (as a list)data[0.5,0.5,0.7,0.6,0.5,0.7,0.5,0.4,0.6,4,0.5,0.5,4,5,6,0.4,0.7,0.8,0.9]# initial plotdf1pd.DataFrame(datadata)df1.columns[data]df1.plot(styleo)# Function to identify and remove outliersdefoutliers(df,level):# 1. temporary dataframedfdf1.copy(deepTrue)# 2. Select a level for a Z-score to identify and remove outliersdf_Zdf[(np.abs(stats.zscore(df))
Originial data:Test run 1 : Z-score 4:As you can see, no data has been removed because the level was set too high.
Test run 2 : Z-score 2:Now were getting somewhere. Two outliers have been removed, but there is still some dubious data left.
Test run 3 : Z-score 1.2:This is looking really good. The remaining data now seems to be a bit more evenly distributed than before. But now the data point highlighted by the original datapoint is starting to look a bit like a potential outlier. So where to stop? Thats going to be entirely up to you!
EDIT: Heres the whole thing for an easy copypaste:
importmatplotlib.pyplotaspltimportpandasaspdimportnumpyasnpfromscipyimportstats# your data (as a list)data[0.5,0.5,0.7,0.6,0.5,0.7,0.5,0.4,0.6,4,0.5,0.5,4,5,6,0.4,0.7,0.8,0.9]# initial plotdf1pd.DataFrame(datadata)df1.columns[data]df1.plot(styleo)# Function to identify and remove outliersdefoutliers(df,level):# 1. temporary dataframedfdf1.copy(deepTrue)# 2. Select a level for a Z-score to identify and remove outliersdf_Zdf[(np.abs(stats.zscore(df))