Characterizing Twitter with Respondent-Driven Sampling
Twitter as one of the most important microblogging online social networks has attracted more than 200 million users in recent years. Although there have been several attempts on characterizing the Twitter by using incomplete sampled data, they have not been very successful to estimate the characteristics of the whole network. In this paper, we characterize Twitter by sampling from its social graph and user behaviors through a random walk based sampling technique called Respondent-Driven Sampling (RDS). To the best of our knowledge, for the first time RDS method and its estimator are used in order to obtain uniform unbiased estimation of several key structural and behavioral properties of Twitter. We compared the performance of the proposed method with other sampling methods such as Metropolis-Hasting Random Walk (MHRW) and sampling from active users (Timeline) against the uniform sampling (UNI). In order to gather the required data, we have implemented four independent crawlers. Our experimental results indicate that the RDS method exhibits lower estimation errors to the sample in- and out-degree distribution compared to MHRW and Timeline. We also show that RDS is more suitable to sample the followers vs. followings ratio, and the correlation between followers/followings vs. tweets.