The Perils of Using Social Media Data to Predict the Spread of Diseases

AbstractThe data produced by social media engagement is of interest to various organizations and has been used in different applications like marketing, finance and healthcare. Though the potential of mining this data is high, standard data mining processes do not address the peculiarities of social media data. In this paper, we explore the perils of using social media data in predicting the spread of an infectious disease; perils that are mostly related to data quality, textual analysis and location information. We synthesize findings from a literature review and a data mining exercise to develop an adapted data mining process. This process has been designed to minimize the effects of the perils identified and is thus more aligned with the requirements of predicting disease spread using social media data. The process should be useful to data miners and health institutions

Return to previous page