面向多维特性数据的缺失值检测及填补方法对比
CSTR:
作者:
作者单位:

同济大学 电子与信息工程学院,上海 201804

作者简介:

乔 非(1967—),女,教授,博士生导师,工学博士,主要研究方向为智能生产系统。 E-mail: fqiao@tongji.edu.cn

通讯作者:

翟晓东(1993—),男,博士生,主要研究方向为大数据处理与分析。E-mail: xdzhai@tongji.edu.cn

中图分类号:

TP311.1

基金项目:

科技创新 2030“新一代人工智能”重大项目(2018AAA0101704);国家自然科学基金(62133011,61973237,61873191)


Comparison of Imputation Methods Based on Missing Value Detection for Multidimensional Feature Data
Author:
Affiliation:

College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [30]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    针对传统缺失值检测方法缺少对多维特性数据全面立体的分析及难以从众多缺失值填补算法中选择合适方法的问题,通过设计缺失值检测方法,在目前常见的数据点缺失度基础上,首次提出数据总体缺失度和加权数据总体缺失度的概念,实现对数据集缺失程度的全面检测,进而通过实验对比分析不同缺失值填补方法性能。实验结果表明,在不同缺失度的情况下,不同缺失值填补算法的性能不同,所提出的方法可为缺失值填补算法的选择提供有效依据。

    Abstract:

    Aiming at the problems that traditional missing value detection methods are not comprehensive enough to analyze the multidimensional feature data and it is difficult to select the most appropriate missing value algorithm among numerous methods, this paper first designs a missing value detection method and then proposes three different concepts of missing degree to achieve the comprehensive detection of the data with multidimensional features. On this basis, it compares and analyzes the performance of different missing value imputation methods. The results show that the proposed detection method can evaluate the data with multidimensional features effectively and provide basis for the selection of missing value imputation methods.

    参考文献
    [1] HEMANTH G R, CHARLES R S. Proposing suitable data imputation methods by adopting a stage wise approach for various classes of smart meters missing data – Practical approach[J]. Expert Systems with Applications, 2022, 187:1. DOI: 10.1016/j.eswa.2021.115911.
    [2] WANG H, TANG J, WU M, et al. Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example[J]. BMC Medical Informatics and Decision Making, 2022, 22:13. DOI: 10.1186/s12911-022-01752-6.
    [3] FENG J, LI F, XU C, et al. Data-driven analysis for RFID-enabled smart factory: a case study[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(1): 81.
    [4] LUO X, ZHOU M C, WANG Z, et al. An effective scheme for QoS estimation via alternating direction method-based matrix factorization[J]. IEEE Transactions on Services Computing, 2019, 12(4): 503.
    [5] LUO X, ZHOU M, LI S, et al. Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications[J]. IEEE Transactions on Cybernetics, 2020, 50(5): 692.
    [6] HAYASHI K, TAKENOUCHI T, SHIBATA T, et al. Exponential family tensor factorization for missing-values prediction and anomaly detection[C]//2010 IEEE International Conference on Data Mining. Sydney: [S.n.], 2010: 216-225.
    [7] CHEN H, MA G, WANG Z, et al. Probabilistic detection of missing tags for anonymous multicategory RFID systems[J]. IEEE Transactions on Vehicular Technology, 2017, 66(12): 11295.
    [8] FOLCH-FORTUNY A, ARTEAGA F, FERRER A. PLS model building with missing data: new algorithms and a comparative study[J]. Journal of Chemometrics, 2017, 31(7):2897. DOI: 10.1002/cem.2897.
    [9] BRINIS S, TRAINA C, TRAINA A J M. Hollow-tree: a metric access method for data with missing values[J]. Journal of Intelligent Information Systems, 2019, 53(3): 481.
    [10] 徐光伟, 白艳珂, 燕彩蓉, 等. 大数据存储中数据完整性验证结果的检测算法[J]. 计算机研究与发展, 2017, 54(11): 2487.XU Guangwei, BAI Yanke, YAN Cairong, et al. Check algorithm of data integrity verification results in big data storage[J]. Journal of Computer Research and Development, 2017, 54(11): 2487.
    [11] 张少敏, 高鹏, 王保义. 一种用于智能电网的数据完整性定量评估模型[J]. 电力系统保护与控制, 2012, 40(13): 93.ZHANG Shaomin, GAO Peng, WANG Baoyi. A quantitative evaluation model of data integrity for smart grid[J]. Power System Protection and Control, 2012, 40(13): 93.
    [12] 陈龙, 方新蕾, 王国胤. 基于复数旋转码的细粒度数据完整性检验方法[J]. 西南交通大学学报, 2009, 44(5): 667.CHEN Long, FANG Xinlei, WANG Guoyin. Integrity check method for fine-grained data based on complex rotary codes[J]. Journal of Southeast Jiaotong University. 2009, 44(5): 667.
    [13] STACK C B, BUTTERWORTH T, GOLDIN R. Designed learning: missing data in clinical research[J]. Annals of internal medicine, 2018, 168(10): 744.
    [14] 郭毅博, 牛猛, 王海迪, 等. 基于生成对抗网络的飞机燃油数据缺失值填充方法[J]. 浙江大学学报(理学版), 2021, 48(4): 402.GUO Yibo, NIU Meng, WANG Haidi, et al. An aircraft fuel data missing value filling method with generative adversarial network[J]. Journal of Zhejiang University (Science Edition), 2021, 48(4): 402.
    [15] 刘莎, 杨有龙. 基于灰色关联分析的类中心缺失值填补方法[J]. 四川大学学报(自然科学版), 2020, 57(5): 871.LIU Sha, YANG Youlong. Imputing missing value by class center based on grey relational analysis[J]. Journal of Sichuan University (Natural Science Edition), 2020, 57(5): 871.
    [16] GIOVANNI A D S J, ALISSON M D S. A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network[J]. Evolving Systems 2022, 13(2): 201.
    [17] LIU X, YANG X, ZHU P, et al. Robust multimodel identification of LPV systems with missing observations based on t-distribution[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(8): 5254.
    [18] PIRES I M, HUSSAIN F, MARQUES G, et al. Comparison of machine learning techniques for the identification of human activities from inertial sensors available in a mobile device after the application of data imputation techniques[J]. Computers in Biology and Medicine, 2021, 135: 104638. DOI: 10.1016/j.compbiomed.2021.104638.
    [19] 李璐, 董秋雷, 赵瑞珍. 含缺失成分的矩阵的广义低秩逼近及其在图像处理中的应用[J]. 计算机辅助设计与图形学学报, 2015, 27(11): 2065.LI Lu, DONG Qiulei, ZHAO Ruizhen. Generalized low-rank approximations of matrices with missing components and its applications in image processing[J]. Journal of Computer-Aided Design & Computer Graphics, 2015, 27(11):2065.
    [20] 孙晓飞. 基于核相似性和低秩近似的缺失值填充算法研究[D]. 天津: 天津大学, 2017.SUN Xiaofei. Research on imputing algorithm of missing values based on kernel similarity and low rank approximation[D]. Tianjin: Tianjin University, 2017.
    [21] LIN W, TSAI C, ZHONG J. Deep learning for missing value imputation of continuous data and the effect of data discretization[J]. Knowledge-Based Systems, 2022, 239:108079. DOI: 10.1016/j.knosys.2021.108079.
    [22] AWAN S E, BENNAMOUN M, SOHEL F, et al. A reinforcement learning-based approach for imputing missing data[J], Neural Computing and Applications, 2022, 34(12): 9701.
    [23] PHIMMARIN K, TOSSAPON B. Improved KNN imputation for missing values in gene expression data[J]. CMC-Computers, Materials & Continua, 2022, 70(2): 4009.
    [24] GU R, SHI J, CHEN X, et al. Octopus-DF: unified DataFrame-based cross-platform data analytic system[J]. Parallel Computing, 2022, 110:102879. DOI: 10.1016/j.parco.2021.102879.
    [25] TSYMBAL A, MEISSNER E, KELM M, et al. Towards cloud-based image-integrated similarity search in big data[C]//IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). Valencia: [S.n.], 2014: 593-596.
    [26] DAU H A, BAGNALL A, KAMGAR K, et al. The UCR time series archive[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1293.
    [27] MEHRAN A, RICHARD J. Missing data imputation using fuzzy-rough methods[J]. Neurocomputing, 2016, 205: 152.
    [28] GARCIA C, ESMIN A, LEITE D, et al. Evolvable fuzzy systems from data streams with missing values: with application to temporal pattern recognition and cryptocurrency prediction[J]. Pattern Recognition Letters, 2019, 128: 278.
    [29] LIAO W, BAK-JENSEN B, PILLAI J R, et al. Data-driven missing data imputation for wind farms using context encoder[J]. Journal of Modern Power Systems and Clean Energy, 2021, 10(4): 964. DOI: 10.35833/MPCE.2020.000894.
    [30] SAHRI Z, YUSOF R, WATADA J. FINNIM: Iterative imputation of missing values in dissolved gas analysis dataset[J]. IEEE Transactions on Industrial Informatics, 2014, 10(4): 2093.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

乔非,翟晓东,王巧玲.面向多维特性数据的缺失值检测及填补方法对比[J].同济大学学报(自然科学版),2023,51(12):1972~1982

复制
分享
文章指标
  • 点击次数:162
  • 下载次数: 690
  • HTML阅读次数: 61
  • 引用次数: 0
历史
  • 收稿日期:2022-04-11
  • 在线发布日期: 2023-12-29
文章二维码