此页面上的内容需要较新版本的 Adobe Flash Player。

获取 Adobe Flash Player

Abnormal user identification based on XGBoost algorithm

SONG Xiao-yu, SUN Xiang-yang, ZHAO Yang


School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)


Abstract: The eXtreme gradient boosting (XGBoost) algorithm is used to identify abnormal users. Firstly, the raw data were cleaned. Then user power characteristics were extracted from different aspects. Finally, the XGBoost classifier was used to identify the abnormal users respectively in the balanced sample set and the unbalanced sample set. In contrast, under the same characteristics, the k-nearest neighbor (KNN) classifier, back-propagation (BP) neural network classifier and random forest classifier were used to identify the abnormal users in the two samples. The experimental results show that the XGBoost classifier has higher recognition rate and faster running speed. Especially in the imbalanced data sets, the performance improvement is obvious.


Key words: user identification; electricity characteristics; eXtreme gradient boosting (XGBoost); random forest


 

CLD number: TP181             Document code: A


Article ID: 1674-8042(2018)04-0339-08            doi: 10.3969/j.issn.1674-8042.2018.04.006


 

References


1] Song Y, Zhou G, Zhu Y. Present status and challenges of big data processing in smart grid. Power System Technology, 2013, 37(4): 927-935.

2] Chen W, Chen Y, Qiu L, et al. Analysis of anti-stealing electric power based on big data technology. Journal of Electronic Measurement & Instrumentation, 2016, 30(10): 1558-1567.

3] Jian F J, Cao M, Wang L, et al. SVM based energy consumption abnormality detection in AMI system. Electrical Measurement & Instrumentation, 2014, (6): 64-69.

4] Qiu S, Yang H. Sparse Bayesian regression and its application to anomaly detection of harmonic current. In: Proceedings of the CSU-EPSA, 2017, (5): 0104-0107.

5] Zhuang C, Zhang B, Jun H U, et al. Anomaly detection for power consumption patterns based on unsupervised learning. In: Proceedings of the CSEE, 2016, 36(2): 379-387.

6] Zhou L, Zhao L, Gao W. Application of sparse coding in detection for abnormal electricity consumption behaviors. Power System Technology, 2015, 39(11): 3182-3188.

7] Monedero I, Biscarri F, León C,et al.Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 2012, 34(1): 90-98.

8] Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016: 785-794.

9] Aler R, Galván I M, Ruizarias J A, et al. Improving the separation of direct and diffuse solar radiation components using machine learning by gradient boosting. Solar Energy, 2017, 150: 558-569.

10] Ye Q Y, Rao H, Ji M S, et al. Sales prediction of stores based on XGBoost algorithm. Journal of Nanchang University, 2017, (3): 275-281.

11] Rui M A, Xie Z, Zhou P, et al. Data mining on correlation feature of load characteristics statistical indexes considering temperature. In: Proceedings of the CSEE, 2015, 35(1): 43-51.

12] Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27(8): 861-874.


 

基于XGBoost算法的异常用户识别


宋晓宇, 孙向阳,  


(兰州交通大学 电子与信息工程学院, 甘肃 兰州 730070)


  :  电力行业是国民经济中的基础性产业, 日益严重的电力异常行为给国家经济造成了巨大损失。 XGBoost(eXtreme gradient boosting)算法用于异常客户的识别。 首先, 对原始数据进行清洗; 然后, 从不同角度构建用户用电特征; 最后, 使用XGBoost分类器分别在平衡样本集和非平衡样本集下进行异常客户识别。 与之对比, 在相同特征下, 分别使用KNN(k-nearest neighbor)分类器、 BP(back-propagation)神经网络分类器和和随机森林分类器在这两个样本集下进行异常客户识别。 实验结果表明, XGBoost分类器有更高的识别率和更快的运行速度, 特别是在不平衡数据集下, 性能改进尤为明显。


关键词:  用户识别; 用电特征; XGBoost; 随机森林


 

引用格式:  SONG Xiao-yu, SUN Xiang-yang, ZHAO Yang. Abnormal user identification based on XGBoost algorithm. Journal of Measurement Science and Instrumentation, 2018, 9(4): 339-346. [doi: 10.3969/j.issn.1674-8042.2018.04.006]


[full text view]