Abstract:To achieve more effective topic discovery of stock bar forum, this paper presents a framework with short text clustering based on frequent itemset and latent semantic (STC_FL). The important frequent itemsets are acquired with the concept vector space based on HowNet, and then a combination pattern of statistics and latent semantics is used to realize the selfadaptive clustering of important frequent itemsets. Finally, the algorithm of text soft classifying based on similarity threshold and nonoverlapping (TSCSN) is proposed. Text soft clustering is selected and controlled with parameter optimization. By taking the real stock bar forum data as a specific case of empirical analysis, it is shown that STC_FL framework and TSCSN algorithm can fully exploit the latent semantic information of text and reduce the dimension of feature space, which realizes the deep information mining and topic classification of short texts.