1.School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu 210003, China; 2.Computer Technology Institute, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu 210003, China
Abstract:Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.
卢杨, 李华康, 孙国梓. 一种基于P2P技术的分布式微博爬虫系统[J]. 江苏大学学报(自然科学版), 2016, 37(3): 296-301.
LU Yang, LI Hua-Kang, SUN Guo-Zi. Distributed microblog crawler system based on P2P[J]. Journal of Jiangsu University(Natural Science Eidtion)
, 2016, 37(3): 296-301.