Diffusion-Aware Sampling and Estimation in Information Diffusion Networks
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing the diffusion process characteristics. To this end, we propose a link-tracing based sampling design which uses the infection times as local information without any knowledge about the latent structure of diffusion network. To correct the bias of sampled data, we introduce three estimators for different categories; link-based, node-based, and cascade-based. To the best of our knowledge, this is the first attempt to introduce a complete measurement framework for diffusion networks. We also show that the estimator plays an important role in correcting the bias of sampling from diffusion networks. Our comprehensive empirical analysis over large synthetic and real datasets demonstrates that in average, the proposed framework outperforms the common BFS and RW sampling methods in terms of link-based characteristics by about 37% and 35%, respectively.