Abstract:To solve the problem that centralized clustering algorithms could not deal with big data sets, a distributed K-Means clustering algorithm was proposed based on the confidence radius by Fisher discriminant ratio in local nodes. The computing and storage capacitates as well as bandwidth of each nodes were used to share the time and space expenses to each nodes in the P2P networks. The Fisher discriminant ratio was applied to find the difference of dense and sparse distributions in the same cluster in local nodes. The ratio was used to deduce the confidence radius for the next clustering processing to maintain clustering accuracy, and the distributed clustering was speeded up at the same time. The numerical simulation of algorithm and experiments were completed based on real data. The results show that a good balance between accuracy and speed is obtained according to the data distributions. The proposed algorithm has better robustness than the DFEKM algorithm.