|
|
Distributed microblog crawler system based on P2P |
1.School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu 210003, China; 2.Computer Technology Institute, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu 210003, China |
|
|
Abstract Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.
|
Received: 30 September 2015
|
|
|
|
|
|
|