Abstract:The main problem with the text clustering algorithm based on vector space model(VSM) is that semantic information between words and the link between the various dimensions are overlooked,resulting in inaccuracy in the text similarity calculation.A method based on computing the text similarity using semantic distance and two-phrase clustering is proposed to improve the text clustering algorithm.First,the text analyzed according to its semantic,with nearest neighbor algorithm used for the first cluster.Some feature words are chosen according to the similarity weight to represent the cluster with the remaining feature words similar to the main themes of the cluster,and then class combination is carried out.Finally,the second clustering is carried out to improve the nearest neighbor clustering which is sensitive to the input order of the document.Simulation experiments indicate that the proposed algorithm can solve these problems and performs better than the text clustering algorithm based on VSM in the clustering precision and recall rate.