基于余切相似度和BP神经网络的相似度快速计算
CSTR:
作者:
作者单位:

同济大学 电子与信息工程学院,上海 201804

作者简介:

乔 非(1967—),女,教授,博士生导师,工学博士,主要研究方向为智能生产系统. E-mail: fqiao@tongji.edu.cn

中图分类号:

TP311.1

基金项目:

国家自然科学基金(71690230/71690234,61973237,61873191)


A Fast Similarity Calculation Method Based on Cotangent Similarity and BP Neural Network
Author:
Affiliation:

College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    相似性度量在大数据相关应用中具有重要的意义,然而传统余弦相似度遍历计算方法的准确性和时效性较差,具有较大局限性,无法为海量高维数据的质量评估提供有效依据。针对上述问题,利用余切三角函数和数据维度差值构造2种余切相似度公式,提高相似度计算的准确性;借助后向传播(BP)神经网络建立一个能够逼近数据集相似度映射关系的网络模型,降低相似度计算的时间复杂度。实验表明,改进的相似度快速计算方法具有良好的准确性和时效性,而且应用在大规模数据集时的性能提升更显著。

    Abstract:

    Similarity measurement is of great significance in big data related applications. However, the traditional cosine similarity traversal calculation method has a poor accuracy and timeliness, which cannot provide an effective basis for the quality assessment of massive high-dimensional data. To improve the accuracy of similarity calculation, two types of cotangent similarity formulas with cotangent trigonometric function and data dimensional differences was constructed. Besides, a back-propagation(BP) neural network model approximating the similarity mapping relationship of datasets was established to reduce the time complexity. The experimental results demonstrate that the improved fast similarity calculation method has a good accuracy and timeliness. Moreover, it has a more significant performance improvement when applied to large-scale datasets.

    表 5 UCI数据集基本信息Table 5
    图1 二维向量关系示意图Fig.1 Schematic diagram of relationship between two-dimensional vectors
    图2 基于余切相似度和BP神经网络的相似度快速计算流程Fig.2 Flowchart of fast similarity calculation based on cotangent similarity and BP neural network
    图3 基于余切相似度和BP神经网络的相似度快速计算伪代码Fig.3 Pseudocode of fast similarity calculation based on cotangent similarity and BP neural network
    图4 基于神经网络和遍历计算的相似度计算误差(CWRU子数据集)Fig.4 Similarity calculation error based on neural network and traversal calculation(CWRU subdatasets)
    图5 基于不同计算公式的相似度计算时间对比Fig.5 Comparison of running time of similarity calculation based on different calculation formulas
    图6 基于不同计算方法的相似度计算时间对比Fig.6 Comparison of running time of similarity calculation based on different calculation methods
    参考文献
    相似文献
    引证文献
引用本文

乔非,关柳恩,王巧玲.基于余切相似度和BP神经网络的相似度快速计算[J].同济大学学报(自然科学版),2021,49(1):153~162

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-08-27
  • 在线发布日期: 2021-02-26
文章二维码