成都企业网站营销设计,临沂建站程序,零售网站开发,宁波seo排名优化价格前面的博客讲了具体实现#xff0c;现在深究算法代码实现细节#xff01;#xff01;#xff01;
1.CO
(1)关于train
从以下代码可知#xff0c;CO首先是对各个电器的功率数据做了train#xff0c;为了了解其原生实现对代码进行了深究#xff1a;
classifiers {CO:…前面的博客讲了具体实现现在深究算法代码实现细节
1.CO
(1)关于train
从以下代码可知CO首先是对各个电器的功率数据做了train为了了解其原生实现对代码进行了深究
classifiers {CO:CombinatorialOptimisation()}
predictions {}
sample_period 120 ## 采样周期是两分钟
for clf_name, clf in classifiers.items():print(**20)print(clf_name)print(* *20)clf.train(top_5_train_elec, sample_periodsample_period) ### 训练部分
进入代码可知
def train(self, metergroup, num_states_dictNone, **load_kwargs):Train using 1D CO. Places the learnt model in the model attribute.Parameters----------metergroup : a nilmtk.MeterGroup objectnum_states_dict : dict**load_kwargs : keyword arguments passed to meter.power_series()Notes-----* only uses first chunk for each meter (TODO: handle all chunks).if num_states_dict is None:num_states_dict {}if self.model:raise RuntimeError(This implementation of Combinatorial Optimisation does not support multiple calls to train.)num_meters len(metergroup.meters)if num_meters 12:max_num_clusters 2else:max_num_clusters 3
# print(max_num_clusters)for i, meter in enumerate(metergroup.submeters().meters):print(Training model for submeter {}.format(meter))power_series meter.power_series(**load_kwargs)chunk next(power_series)num_total_states num_states_dict.get(meter)if num_total_states is not None:num_on_states num_total_states - 1else:num_on_states Noneself.train_on_chunk(chunk, meter, max_num_clusters, num_on_states)# Check to see if there are any more chunks.# TODO handle multiple chunks per appliance.try:next(power_series)except StopIteration:passelse:warn(The current implementation of CombinatorialOptimisation can only handle a single chunk. But there are multiple chunks available. So have only trained on the first chunk!)print(Done training!)
简单来看传入参数有metergroup-- top_5_train_elec (五种用电量较高的电器) num_states_dictNone,建立一个字典后期没找到用途 **load_kwargs --sample_periodsample_period采样周期是120秒也就是2两分钟
设置参数有如果训练的电器数量大于12将聚类的数量置为2其他则为3我们此处为5种电器因此max_num_clusters 3
总体过程遍历这5种电器数据每类电器进行单独聚类返回每个电器的聚类结果此结果为不同电器在不同状态下的功率数值。
每种电器训练过程
如果model存在则提示已经训练好无需二次训练否则就进行电器的聚类输入参数为chunk--每个电器的功率数据max_num_clusters 聚类数。并将每个电器的聚类结果记录statestraining_metadata参数保存成model。
def train_on_chunk(self, chunk, meter, max_num_clusters, num_on_states):# Check if weve already trained on this metermeters_in_model [d[training_metadata] for d in self.model]if meter in meters_in_model:raise RuntimeError(Meter {} is already in model! Cant train twice on the same meter!.format(meter))states cluster(chunk, max_num_clusters, num_on_states)
# print(states,states)self.model.append({states: states,training_metadata: meter})
结果为 聚类详解
通过_transform_data(X)进行数据格式转换主要是将pd.Series或者单列的pd.DataFrame转成ndarray数据然后再对数据进行聚类得到每个类别的质心值然后增加设备off状态的功率数据按理说聚类传入的参数是3在增加一个状态应该是四个状态值事实上只有3个状态继续深究可知
def cluster(X, max_num_clusters3, exact_num_clustersNone):Applies clustering on reduced data,i.e. data where power is greater than threshold.Parameters----------X : pd.Series or single-column pd.DataFramemax_num_clusters : intReturns-------centroids : ndarray of int32sPower in different states of an appliance, sorteda# Find where power consumption is greater than 10data _transform_data(X)# Find clusterscentroids _apply_clustering(data, max_num_clusters, exact_num_clusters)
# print(centroids,centroids)centroids np.append(centroids, 0) # add off statecentroids np.round(centroids).astype(np.int32)centroids np.unique(centroids) # np.unique also sorts# TODO: Merge similar clustersreturn centroids
电器状态值缺少的原因
由_apply_clustering函数的for n_clusters in range(1, max_num_clusters)可知在max_num_clusters取值为3的情况下n_clusters 的取值为1,2因此少了一个状态额外的在进行聚类的时候将每个电器的数据聚类成1簇2簇并采用了聚类算法的轮廓系数sklearn.metrics.silhouette_score选取了最好的n_clusters,即为n_clusters2.
def _apply_clustering(X, max_num_clusters, exact_num_clustersNone):Parameters----------X : ndarraymax_num_clusters : intReturns-------centroids : list of numbersList of power in different states of an appliance# If we import sklearn at the top of the file then it makes autodoc failfrom sklearn import metrics# Finds whether 2 or 3 gives better Silhouellete coefficient# Whichever is higher serves as the number of clusters for that# appliancenum_clus -1sh -1k_means_labels {}k_means_cluster_centers {}k_means_labels_unique {}# If the exact number of clusters are specified, then use thatif exact_num_clusters is not None:labels, centers _apply_clustering_n_clusters(X, exact_num_clusters)return centers.flatten()# Exact number of clusters are not specified, use the cluster validity measures# to find the optimal numberfor n_clusters in range(1, max_num_clusters):try:labels, centers _apply_clustering_n_clusters(X, n_clusters)
# print(labels, centers,labels, centers)k_means_labels[n_clusters] labelsk_means_cluster_centers[n_clusters] centersk_means_labels_unique[n_clusters] np.unique(labels)try:sh_n metrics.silhouette_score(X, k_means_labels[n_clusters], metriceuclidean)if sh_n sh:sh sh_nnum_clus n_clustersexcept Exception:num_clus n_clustersexcept Exception:if num_clus -1:return k_means_cluster_centers[num_clus]else:return np.array([0])print(k_means_cluster_centers[num_clus].flatten())return k_means_cluster_centers[num_clus].flatten()
选取聚类算法为Kmeans
def _apply_clustering_n_clusters(X, n_clusters)::param X: ndarray:param n_clusters: exact number of clusters to use:return:from sklearn.cluster import KMeansk_means KMeans(initk-means, n_clustersn_clusters)k_means.fit(X)return k_means.labels_, k_means.cluster_centers_
此为训练结果的全过程
2disaggregate_chunk
分解时候的函数是用的disaggregate_chunk得到房间的总功率曲线并对5种电器进行分解。
首先是将之前train()过程的质心提取出来并做一个枚举操作cartesian函数是做枚举操作由5个模型每个模型3个状态则可得3*3*3*3*3243行5列的状态组合数据。 def _set_state_combinations_if_necessary(self):Get centroids# If we import sklearn at the top of the file then auto doc fails.#枚举所有可能性if (self.state_combinations is None orself.state_combinations.shape[1] ! len(self.model)):from sklearn.utils.extmath import cartesiancentroids [model[states] for model in self.model]
# print(len(centroids),len(centroids[0]))self.state_combinations cartesian(centroids)
接下来对状态数据进行按列累加然后调用find_nearest函数求得负荷数据和状态数据的残差和具体索引值。find_nearest的传入参数有按列累加之后的状态数据用户的总功率数据。
def find_nearest(known_array, test_array):Find closest value in known_array for each element in test_array.Parameters----------known_array : numpy arrayconsisting of scalar values only; shape: (m, 1)test_array : numpy arrayconsisting of scalar values only; shape: (n, 1)Returns-------indices : numpy array; shape: (n, 1)For each value in test_array finds the index of the closest valuein known_array.residuals : numpy array; shape: (n, 1)For each value in test_array finds the difference from the closestvalue in known_array.# from http://stackoverflow.com/a/20785149/732596#将x中的元素从小到大排列提取其对应的index(索引),从小到大排序index_sorted np.argsort(known_array)known_array_sorted known_array[index_sorted]#将功率值插入到known_array_sorted对应的位置并返回indexidx1 np.searchsorted(known_array_sorted, test_array)idx2 np.clip(idx1 - 1, 0, len(known_array_sorted)-1)idx3 np.clip(idx1, 0, len(known_array_sorted)-1)#上限-实际值diff1 known_array_sorted[idx3] - test_array#实际值-下限diff2 test_array - known_array_sorted[idx2]#找到差据最小的indexindices index_sorted[np.where(diff1 diff2, idx3, idx2)]#残差residuals test_array - known_array[indices]return indices, residuals
得到索引值和残差之后就可通过每个model来分别进行计算了此时indices_of_state_combinations与indices等价。 appliance_powers_dict {}for i, model in enumerate(self.model):print(Estimating power demand for {}.format(model[training_metadata]))predicted_power state_combinations[indices_of_state_combinations, i].flatten()column pd.Series(predicted_power, indexmains.index, namei)
# print(column)appliance_powers_dict[self.model[i][training_metadata]] columnappliance_powers pd.DataFrame(appliance_powers_dict, dtypefloat32)
主要的分解核心代码在于
state_combinations[ indices_of_state_combinations, i].flatten()
下面看一个小例子 由此可知分别按列遍历然后选取indices_of_state_combinations对应位置的值为当前电器的分解结果。
2.FHMM
(1)关于train
在train函数里边频繁使用了好几次GaussianHMM没有理解其中深意。先看代码吧首先传入meters检查内存是否能支撑状态转换概率矩阵的计算。
def _check_memory(num_appliances):Checks if the maximum resident memory is enough to handle the combined matrix of transition probabilities# Each transmat is small (usually 2x2 or 3x3) but the combined# matrix is dense, using much more memory# Get the approximate memory in MBtry:# If psutil is installed, we can get the correct total # physical memory of the systemimport psutilavailable_memory psutil.virtual_memory().total 20except ImportError:# Otherwise use a crude approximationavailable_memory 16 10# We use (num_appliances 1) here to get a pessimistic approximation:# 8 bytes * (2 ** (num_appliances 1)) ** 2required_memory ((1 (2 * (num_appliances 1))) 3) 20if required_memory available_memory:warn(The required memory for the model may be more than the total system memory! Try using fewer appliances if the training fails.)
然后遍历每个电器获取每个电器的状态数据经过一系列判断语句最终发现他用了聚类来确定每个电器的状态。 if num_total_states is None:states cluster(meter_data, max_num_clusters)num_total_states len(states)
然后调用hmmlearn的GaussianHMM来进行模型训练。
print(Training model for submeter {} with {} states.format(meter, num_total_states))
learnt_model[meter] hmm.GaussianHMM(num_total_states, full)
# Fit
learnt_model[meter].fit(X)
对GaussianHMM计算的means_结果进行排序然后根据means_索引值得到相应的startprob,covars,transmat等然后在进行一次GaussianHMM并对参数赋值。
self.meters []
new_learnt_models OrderedDict()
for meter in learnt_model:startprob, means, covars, transmat sort_learnt_parameters(learnt_model[meter].startprob_, learnt_model[meter].means_,learnt_model[meter].covars_, learnt_model[meter].transmat_)new_learnt_models[meter] hmm.GaussianHMM(startprob.size, full)new_learnt_models[meter].startprob_ startprobnew_learnt_models[meter].transmat_ transmatnew_learnt_models[meter].means_ meansnew_learnt_models[meter].covars_ covars# UGLY! But works.self.meters.append(meter)
均值排序计算的代码如下
def return_sorting_mapping(means):means_copy deepcopy(means)means_copy np.sort(means_copy, axis0)# Finding mappingmapping {}for i, val in enumerate(means_copy):mapping[i] np.where(val means)[0][0]return mapping
例子设定a[1,5,3,4,2],return_sorting_mapping返回的是均值从小到大的索引值。 其余参数计算皆以均值有序值进行计算代码如下
def sort_startprob(mapping, startprob): Sort the startprob according to power means; as returned by mappingnum_elements len(startprob)new_startprob np.zeros(num_elements)for i in range(len(startprob)):new_startprob[i] startprob[mapping[i]]return new_startprobdef sort_covars(mapping, covars):new_covars np.zeros_like(covars)for i in range(len(covars)):new_covars[i] covars[mapping[i]]return new_covarsdef sort_transition_matrix(mapping, A):Sorts the transition matrix according to increasing order ofpower means; as returned by mappingParameters----------mapping :A : numpy.array of shape (k, k)transition matrixnum_elements len(A)A_new np.zeros((num_elements, num_elements))for i in range(num_elements):for j in range(num_elements):A_new[i, j] A[mapping[i], mapping[j]]return A_new
然后又对结果值做了一个GaussianHMM代码如下
def create_combined_hmm(model):list_pi [model[appliance].startprob_ for appliance in model]list_A [model[appliance].transmat_ for appliance in model]list_means [model[appliance].means_.flatten().tolist()for appliance in model]pi_combined compute_pi_fhmm(list_pi)A_combined compute_A_fhmm(list_A)[mean_combined, cov_combined] compute_means_fhmm(list_means)combined_model hmm.GaussianHMM(n_componentslen(pi_combined), covariance_typefull)combined_model.startprob_ pi_combinedcombined_model.transmat_ A_combinedcombined_model.covars_ cov_combinedcombined_model.means_ mean_combined
又对meanstransmat,startprob三个状态数据做了转换代码如下
def compute_A_fhmm(list_A):Parameters-----------list_pi : List of PIs of individual learnt HMMsReturns--------result : Combined Pi for the FHMMresult list_A[0]for i in range(len(list_A) - 1):result np.kron(result, list_A[i 1])return resultdef compute_means_fhmm(list_means):Returns-------[mu, cov]states_combination list(itertools.product(*list_means))num_combinations len(states_combination)means_stacked np.array([sum(x) for x in states_combination])means np.reshape(means_stacked, (num_combinations, 1))cov np.tile(5 * np.identity(1), (num_combinations, 1, 1))return [means, cov]def compute_pi_fhmm(list_pi):Parameters-----------list_pi : List of PIs of individual learnt HMMsReturns-------result : Combined Pi for the FHMMresult list_pi[0]for i in range(len(list_pi) - 1):result np.kron(result, list_pi[i 1])return result
然后得到了模型 2disaggregate_chunk
首先是获取总表的功率数据然后通过GaussianHMM的predict函数进行预测,然后在进行decode_hmm函数进行解码一顿操作猛如虎还是没看懂。 def disaggregate_chunk(self, test_mains):Disaggregate the test data according to the model learnt previouslyPerforms 1D FHMM disaggregation.For now assuming there is no missing data at this stage.# See v0.1 code# for ideas of how to handle missing data in this code if needs be.# Array of learnt stateslearnt_states_array []test_mains test_mains.dropna()length len(test_mains.index)temp test_mains.values.reshape(length, 1)learnt_states_array.append(self.model.predict(temp))print(learnt_states_array[0].shape)# Modelmeans OrderedDict()for elec_meter, model in self.individual.items():means[elec_meter] (model.means_.round().astype(int).flatten().tolist())means[elec_meter].sort()decoded_power_array []decoded_states_array []print(means.keys())for learnt_states in learnt_states_array:[decoded_states, decoded_power] decode_hmm(len(learnt_states), means, means.keys(), learnt_states)decoded_states_array.append(decoded_states)decoded_power_array.append(decoded_power)prediction pd.DataFrame(decoded_power_array[0], indextest_mains.index)return prediction
解码函数
def decode_hmm(length_sequence, centroids, appliance_list, states):Decodes the HMM state sequencehmm_states {}hmm_power {}total_num_combinations 1for appliance in appliance_list:print(centroids[appliance])total_num_combinations * len(centroids[appliance])print(total_num_combinations)for appliance in appliance_list:hmm_states[appliance] np.zeros(length_sequence, dtypenp.int)hmm_power[appliance] np.zeros(length_sequence)for i in range(length_sequence):factor total_num_combinationsfor appliance in appliance_list:# assuming integer division (will cause errors in Python 3x)factor factor // len(centroids[appliance])temp int(states[i]) / factorhmm_states[appliance][i] temp % len(centroids[appliance])hmm_power[appliance][i] centroids[appliance][hmm_states[appliance][i]]return [hmm_states, hmm_power]每个电器有3个状态均值共有243种组合方式然后进行了除和取余操作实现对数据的分解。
粗略解析只能到这里了具体还要看HMM等相关理论才能想明白那些操作吧
代码源于nilmtk包的源码文件.