基于样本依赖代价矩阵的小微企业信用评估方法
CSTR:
作者:
中图分类号:

TP391

基金项目:

国家自然科学基金(61572140),上海市科学技术委员会“科技创新行动计划”资助项目(17DZ1100504)


Credit Scoring of Small and Micro Enterprises Based on Sample-Dependent Cost Matrix
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [38]
  • |
  • 相似文献 [12]
  • | | |
  • 文章评论
    摘要:

    针对小微企业信用历史数据规模较小,而且类别不平衡问题较为严重,提出基于样本依赖代价矩阵的Smote XGboost?Bayes Minimum Risk (SXG?BMR)模型,对整体样本进行低倍率过采样,以弱化类别不平衡问题,降低模型过拟合的风险;模型将集成学习模型与最小风险贝叶斯决策相结合,以实现代价敏感。同时,模型中引入了样本依赖的代价矩阵,该代价矩阵不仅与类别有关,而且与样本自身属性有关,可以更为准确地表征代价。使用标准信用数据集和上海市小微企业信用数据集,进行多种算法的对比分析,结果表明,该模型性能优良。

    Abstract:

    Because the credit history data of small and micro enterprises are small and the problem of class imbalance is more serious, this paper proposes a Smote XGboost-Bayes Minimum Risk (SXG-BMR) model based on the sample-dependent cost matrix. The whole sample is oversampled at a low rate to weaken the problem of class imbalance and reduce the risk of model overfitting. The model combines the integrated learning model with the minimum risk Bayes decision to realize the cost sensitivity. At the same time, this paper introduces the sample-dependent cost matrix into the model. The cost matrix is related not only to the category, but also to the attributes of the sample.Therefore ,it can characterize the cost more accurately. In the empirical study,this paper uses a standard credit dataset and a real credit dataset of small and micro enterprises in Shanghai. Besides,it compares and analzes of various algorithms. The results show that the SXG-BMR model proposed in this paper has a good performance.

    参考文献
    [1] West D. Neural network credit scoring models [J]. Computers & Operations Research, 2000, 27(11): 1131-1152.
    [2] 肖文兵, 费奇. 基于支持向量机的个人信用评估模型及最优参数选择研究[J]. 系统工程理论与实践, 2006(10): 73-79.
    Xiao W B, Fei Q. A study of personal credit scoring models on support vector machine with optimal choice of kernel function parameters [J]. Systems Engineering-Theory & Practice, 2006(10): 73-79.
    [3] Bhattacharyya S, Jha S, Tharakunnel K, et al. Data mining for credit card fraud: A comparative study [J]. Decision Support Systems, 2011, 50(3): 602-613.
    [4] 邓超, 胡梅梅, 曾文潮. 基于贝叶斯界定折叠法的小企业信用评分模型研究[J]. 管理工程学报, 2015, 29(4): 162-170.
    Deng C, Hu M M, Zeng W C. Small business credit scoring model based on Bayesian inference using bound and collapse [J]. Jonrnal of Industrial Engineering/Engineering Management, 2015, 29(4): 162-170.
    [5] Lessmann S, Baesens B, Seow H V, et al. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research [J]. European Journal of Operational Research, 2015, 247(1): 124-136.
    [6] 肖斌卿, 杨旸, 李心丹, 等. 基于模糊神经网络的小微企业信用评级研究[J]. 管理科学学报, 2016, 19(11): 114-126.
    Xiao B Q, Yang Y, Li X D, et al. Research on the credit rating of small and micro enterprises based on fuzzy neural network [J]. Journal of Management Sciences in China, 2016, 19(11): 114-126.
    [7] 熊志斌. 信用评估中的特征选择方法研究[J]. 数量经济技术经济研究, 2016, 33(01): 142-155.
    Xiong Z B. Research on feature selection method in credit evaluation [J]. The Jouranal of Quantitative & Technical Economics, 2016, 33(01): 142-155.
    [8] Vlasselaer V V, Bravo C, Caelen O, et al. APATE : a novel approach for automated credit card transaction fraud detection using network-based extensions [J]. Decision Support Systems, 2015, 75: 38-48.
    [9] Dahiya S, Handa S S, Singh N P. A feature selection enabled hybrid‐bagging algorithm for credit risk evaluation [J]. Expert Systems, 2017, 34(9): e12217.
    [10] Chen F L, Li F C. Combination of feature selection approaches with SVM in credit scoring [J]. Expert Systems with Applications, 2010, 37(7): 4902-4909.
    [11] Guo X, Jarrow R A, Zeng Y. Credit risk models with incomplete information [J]. Mathematics of Operations Research, 2009, 34(2): 320-332.
    [12] 肖进, 刘敦虎, 顾新,等. 银行客户信用评估动态分类器集成选择模型[J]. 管理科学学报, 2015, (3): 114-126.
    Xiao J, Liu D H, Gu X, et al. Dynamic classifier ensemble selection model for bank customer’s credit scoring [J]. Journal of Management Sciences in China, 2015, (3): 114-126.
    [13] Kültür Y, Ça?layan M U. Hybrid approaches for detecting credit card fraud [J]. Expert Systems, 2017, 34(2):-.
    [14] Xiao H, Xiao Z, Wang Y. Ensemble classification based on supervised clustering for credit scoring [J]. Applied Soft Computing, 2016, 43: 73-86.
    [15] Ala''raj M, Maysam F. Abbod. Classifiers consensus system approach for credit scoring [J]. Knowledge-Based Systems, 2016, 104: 89-105.
    [16] Verbraken T, Bravo C, Weber R, et al. Development and application of consumer credit scoring models using profit-based classification measures [J]. European Journal of Operational Research, 2014, 238(2): 505-513.
    [17] Ng W W, Hu J, Yeung D S, et al. Diversified sensitivity-based undersampling for imbalance classification problems [J]. IEEE Trans Cybern, 2017, 45(11): 2402-2412.
    [18] 邹权, 郭茂祖, 刘扬, 等. 类别不平衡的分类方法及在生物信息学中的应用[J]. 计算机研究与发展, 2010, 47(8): 1407-1414.
    Zhou Q, Guo M Z, Liu Y, et al. A classification method for class-imbalanced data and its application on bioinformatics [J]. Journal of Computer Research and Development, 2010, 47(8): 1407-1414.
    [19] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
    [20] 林舒杨, 李翠华, 江弋,等. 不平衡数据的降采样方法研究[J]. 计算机研究与发展, 2011, 48(s3): 47-53.
    Lin S Y, Li C H, Jiang G, et al. Under-sampling method research in class-imbalanced data [J]. Journal of Computer Research and Development, 2011, 48(s3): 47-53.
    [21] Sun J, Lang J, Fujita H, et al. Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates [J]. Information Sciences, 2018, 425: 76-91.
    [22] Chung H Y, Ho C H, Hsu C C. Support vector machines using Bayesian-based approach in the issue of unbalanced classifications [J]. Expert Systems with Applications, 2011, 38(9): 11447-11452.
    [23] Bahnsen A C, Stojanovic A, Aouada D, et al. Cost sensitive credit card fraud detection using Bayes minimum risk [C]. Proceedings of the International Conference on Machine Learning and Applications. Miami, USA: IEEE, 2014:.333-338.
    [24] Bahnsen A C, Stojanovic A, Aouada D, et al. Improving credit card fraud detection with calibrated probabilities [C]. Proceedings of the Siam International Conference on Data Mining. 2014: 677-685.
    [25] 闫明松, 周志华. 代价敏感分类算法的实验比较[J]. 模式识别与人工智能, 2005, 18(5): 628-635.
    Yan M S, Zhou Z H. An empirical comparative study of cost-sensitive classification algorithms [J]. Pattern Recognition and Artificial Intelligence, 2005, 18(5): 628-635.
    [26] Hulse J V, Khoshgoftaar T M, Napolitano A. Experimental perspectives on learning from imbalanced data [C]. Proceedings of the 24th International Conference on Machine Learning. DBLP, Corvalis, USA, 2007, 227: 935-942.
    [27] Bahnsen A C, Aouada D, Björn. Example-dependent cost-sensitive logistic regression for credit scoring [C]. Proceedings of the International Conference on Machine Learning and Applications. IEEE, 2015, pp.263-269.
    [28] Lomax S, Vadera S. A survey of cost-sensitive decision tree induction algorithms [J]. Acm Computing Surveys, 2013, 45(2): 1-35.
    [29] Yang Q, Ling C, Chai X, et al. Test-cost sensitive classification on data with missing values [J]. IEEE Transactions on Knowledge & Data Engineering, 2006, 18(5): 626-638.
    [30] Chen T, Guestrin C. XGBoost: a scalable tree boosting system [C]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, 2016: 785-794.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张涛,汪御寒,李凯,张玥杰.基于样本依赖代价矩阵的小微企业信用评估方法[J].同济大学学报(自然科学版),2020,48(01):149~

复制
分享
文章指标
  • 点击次数:1111
  • 下载次数: 1009
  • HTML阅读次数: 91
  • 引用次数: 0
历史
  • 收稿日期:2019-01-16
  • 最后修改日期:2019-11-01
  • 录用日期:2019-09-27
  • 在线发布日期: 2020-01-20
文章二维码