Web Data Mining and Social Media Analysis for better Communication in Food Safety Crises - PDF

Please download to get full document.

View again

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Court Filings

Published:

Views: 6 | Pages: 10

Extension: PDF | Download: 0

Share
Related documents
Description
Available online at Int. J. Food System Dynamics 6 (3), 2015, Web Data Mining and Social Media Analysis for better Communication in Food Safety Crises Christian H. Meyer*, Martin
Transcript
Available online at Int. J. Food System Dynamics 6 (3), 2015, Web Data Mining and Social Media Analysis for better Communication in Food Safety Crises Christian H. Meyer*, Martin Hamer*, Wiltrud Terlau*, Johannes Raithel #, and Patrick Pongratz # *International Center for Sustainable Development, University of Applied Sciences Bonn-Rhein-Sieg, Germany # European IT Consultancy, EITCO GmbH, Bonn, Gemany Received February 2015, accepted June 2015, available online July 2015 ABSTRACT Although much effort is made to prevent risks arising from food, food-borne diseases are an ever-present threat to the consumers health. The consumption of fresh food that is contaminated with pathogens like fungi, viruses or bacteria can cause food poisoning that leads to severe health damages or even death. The outbreak of Shiga Toxinproducing enterohemorrhagic E. coli (EHEC) in Germany and neighbouring countries in 2011 has shown this dramatically. Nearly people were reported of being affected and more than 50 people died during the so called EHEC-crisis. As a result the consumers trust in the safety of fruits and vegetables decreased sharply. In situations like that quick decisions and reaction from public authorities as well as from privately owned companies are important: Food crisis managers have to identify and track back contaminated products and they have to withdraw them from the market. At the same time they have to inform the stakeholders about potential threats and recent developments. This is a particularly challenging task, because when an outbreak is just det ected, information about the actual scope is sparse and the demand for information is high. Thus, ineffective communication among crisis managers and towards the public can result in inefficient crisis management, health damages and a major loss of trust in the food system. This is why crisis communication is a crucial part of successful crisis management, whereas the quality of crisis communication largely depends on the availability of and the access to relevant information. In order to improve the availability of information, we have explored how information from public accessible internet sources like Twitter or Wikipedia can be harnessed for food crisis communication. In this paper we are going to report on some initial insight from a web mining and social media analysis approach to monitor health and food related issues that can develop into a potential crisis. We have chosen Twitter and Wikipedia as data sources for our study since they re publicly accessible and reveal what people state about certain topics and what they are looking for in order to answer their questions. Keywords. Crisis Communication; Web Data Mining; Social Media Analysis; Issues Monitoring 129 1 Introduction The consumption of fresh food that is contaminated with pathogens like fungi, viruses or bacteria can cause food poisoning that leads to severe health damages or even death. As a consequence actors in the agri-food system are challenged to design quality and safety assurance schemes that are efficient, reliable and internationally compatible. This requires organisational structures, appropriate action plans and technical solutions that guarantee an efficient collection and distribution of information in order to improve communication procedures and to shorten the time of decision making. But barriers and borders for information sharing still exist, not only between different regions or different countries but also between public and private organisations, especially during food crises. Although public and private data sets could provide useful knowledge and insight, there are no sufficient concepts for a routine information exchange within a short time (Hamer et al., 2014). Consequently, there s a need for alternative information gathering. For this reason, we aim to develop an online food safety issue monitoring system that integrates public accessible internet sources like Wikipedia and Twitter in order to detect potential threats as soon as possible and to make predictions about the information needs and sentiments of the publi c to improve crisis management and crisis communication. Issues monitoring in this respect is the task to keep an open eye on trends and topics that can have an impact on an organisation or on the trust in safe food in general. Issues can be seen and inter preted differently by different stakeholders. Usually they are of public interest with a high potential for conflicts (Ingenhoff and Röttger, 2008). This makes it important to have immediately available knowledge about what is actually going on and how a crisis is perceived by the public in order to tailor appropriate measures and communication plans to manage a crisis (Hamer et al., 2014). 2 The EHEC outbreak 2011 The ever present threat of food borne diseases is an issue that needs permanent attention. Th e outbreak of the Shiga Toxin-producing Escherichia coli O104:H4 (EHEC) in Germany and neighbouring countries in 2011 has shown this vividly. According to the central federal institution responsible for disease control and prevention in Germany, the Robert Koch Institute (RKI), it was the largest outbreak that was taken to the records in Germany and with respect to haemolytic-uremic syndrome (HUS) it was the largest known outbreak worldwide (RKI, 2011). Based on the onset of the diseases the outbreak lasted from early May 2011 until 26 July 2011, when RKI officially declared its end. However, although there s an official surveillance system in Germany in accordance with the Protection against Infection Act (IfSG) the RKI was not notified before 19 May, when a cluster of three paediatric HUS cases in Hamburg was reported. Before this day it wasn t apparent that there already were higher numbers of infections (Figure 1). In the evaluation report of the outbreak RKI states that in practice the period from the disease s onset until the notification of the RKI expanded from a few days to several weeks (RKI, 2011). After more infections became known, the public was informed by local authorities. First online articles about the EHEC outbreak that we could track back, appeared on 20 May 2011 (Phase II., Figure 1). In the course of time some more newspapers in Northern Germany picked up the topic and published articles about it online. According to the BfR s media monitoring first articles that were printed in the news papers appeared on 21 May (BfR, 2011). Around the same day the outbreak reached its peak in terms of infections in Northern Germany. Media articles reporting on the EHEC-topic in terms of printed articles peaked with a delay of some days at the beginning of June (Linge et al., 2012). In retrospect it can be shown that the demand for information rose sharply (Phase III.) after the outbreak was detected and officially notified, since Wikipedia page view numbers started to soar. Surprisingly enough, Wikipedia page views started to increase from 18 May 2011, which is one days prior to the official announcement. 130 Figure 1. Epidemiological curve of HUS and EHEC adapted from RKI (2011) (left axis, daily count), Wikipedia page views related to articles about Enterohämorrhagische Escherichia coli (EHEC), Hämolytisch-urämisches Syndrom (HUS) and articles that were printed in a selected sample of newspapers related to the EHEC-outbreak adapted from BfR (2011) and related Wikipedia page views (right axis, relative frequency) On 3 June 2011 a task force, composed of experts from different public authorities, was set up in order to deal with the outbreak and to assess its impact (BVL, 2011). In the task force s final report it is stressed that due to cooperation between governmental and local authorities and between public health and food safety authorities the source of the outbreak could be identified. In addition to this it is emphasized that information from multiple sources where collected and analysed which helped to discover sprouted seeds as the source of the infections. It became apparent that timely, sound and robust information was crucial for the coordination task of authorities and the communication among all involved parties and towards the public (European Commission, 2011). However, improvements still can be made by harnessing the internet for communication and analysis of web data. 3 Web data mining and social media analysis The internet is a constantly growing source of data that is publicly accessible and can be mined to discover useful information or knowledge by analysing the Web s hyperlink structure, page content and usage data. This process is referred to as web data mining (Liu, 2011). It is based on traditional data mini ng techniques, but it takes the particularities of the internet into account, like various types of data that range from structured tables and semi-structured websites to unstructured texts and multimedia files. Other technical terms can found depending on the mining tasks that have to be fulfilled. Extracting knowledge from texts is referred to as text mining (Weiss et al., 2005) or if text messages that are retrieved from Twitter are analysed the term Tweet content mining can be found (Yoon et al., 2013) as well as Twitter mining (Velardi et al., 2014). The analysis of social media data is a growing field for research. Especially data from social media like blogs, online forums or social networks like Twitter attract a lot of attention from researchers. Mu ch of the analysis is done in order to predict stock prices and volatility, product sales, the outcome of elections or the outbreak of diseases (Kalampokis, 2013, Schoen et.al, 2013). Twitter for example was used to predict flu trends (Achrekar et al., 2011) or the status of hay fever among people in Japan (Takahashi, 131 2011). Another aspect of analysis is to classify Tweets according to emotional content that they transport especially in crisis situations (Brynielsson et al., 2013, Gaspar, 2013, Torkildson, 2014) Recent research suggests that risk and crisis communication could benefit from the integration of social media for various reasons: On the one hand the use of social media for communication opens additional ways of getting in dialogue with an organisation s stakeholders and the public (Artman et al., 2011) or with journalists as representatives of traditional media (Veil et al., 2011). On the other hand it has been observed that people in times of crisis make statements about their coping strategies using social media like Twitter. This information can be harvested for further analysis in order to detect upcoming issues (Gaspar et al., 2013) and ongoing developments (Palen et al., 2007, Nilsson et al., 2012). Another aspect of people using social media to express their opinions about certain topics is that this makes social media a valuable source for improving the early warning routines as a complement to traditional reporting mechanisms (e. g. Diaz-Aviles et al., 2012). But not only Twitter is a valuable source of information. Wikipedia can also be harnessed for monitoring tasks (Generous et al., 2014, McIver and Brownstein, 2014) since people use Wikipedia as a prominent source for health related information (Laurent and Vickers, 2009). 4 Methodology As the EHEC outbreak has shown food borne diseases can affect people s health badly. This is why we put the focus of our research on EHEC related web mining questions and social media analysis, in order to improve issue monitoring as an essential preliminary of successful crisis communication. Nevertheless, mining useful information and knowledge from internet data requires deep insight into to patterns and exceptions. Much of the related research has taken English language into account. We take a closer l ook at German language by gradually digging deeper into the informational haystack. This is why our research is rather qualitative than quantitative in order to raise further research question. For data retrieval from Twitter we used the ubermetrics Delta application. This is a fee-based media monitoring service offered by the ubermetrics Technologies GmbH (Germany) that allows to save web content for further processing. To learn about the development of Wikipedia page views we were using R which is a free software environment for statistical computing and graphics (see references). Furthermore, we used the Twitter search engine for single Tweet detection. 5 Core challenge: Dealing with the noise Much of the data that can be found on the internet is noisy, which means much data is highly redundant and irrelevant for particular mining tasks. This is why it is often necessary to remove the noise from the data by pre-processing it prior to analysis (e.g. Liu, 2011, Yoon et al., 2013, Saif et al., 2014). Taking into account that millions of Tweets are produced every day, it is an impossible task to go through them by hand. This is why automated solutions must be applied. One approach to reduce the degree of noise is to apply rule and keyword based filters that bring up only material containing selected keywords. But even filtered material still can contain a lot of noise as the following two Tweets taken from Twitter illustrate: Tweet 1: Aktualisierung: Im kath. Kindergarten St. Rupert in Gerolfing sind drei Fäll e von EHEC aufgetreten. Hotline ab , 7:30 Uhr (Tweeted on ) ( Update: In the catholic Kindergarten St. Rupert in Gerolfing three cases of EHEC occurred. Hotline from: , 7:30 o clock) Tweet 2: An dieser Stelle sollten wir kurz innehalten und an die Gurken denken, die zu Unrecht unter EHEC- Verdacht gestanden haben. (Tweeted on ) ( At this point we should pause shortly and think of the cucumbers that were wrongfully under EHEC-suspicion ) Both the Tweets contain the keyword EHEC, but whereas Tweet 1 refers to an actual EHEC-case in Bavaria, the author of the second Tweets is joking about cucumbers (Gurken) that came under suspicion to be the source of the EHEC-outbreak in Wikipedia page views as an early warning signal Inspired by the work of McIver and Brownstein (2014) and Generous et al. (2014) we took a second look at Wikipedia page views statistics of articles about Enterohämorrhagische Escherichia coli (EHEC) and Hämolytisch urämisches Syndrom (HUS) and added Wikipedia page views for the article about Escherichia coli (E. Coli) for a four month period from 1 September until 31 December 2014 (Figure 2). 132 Plotting the data in a time line reveals that page views drop towards the end of a week and rise at the beginning of a week. Towards the end of the year between Christmas and the New Year s Eve a lower level of page views can be observed, too. In addition to this, at some days there are exceptionally more page views than on other days. Hence, we looked for an explanation and possibly found it at Twitter. We found for some cases that page view peaks could be matched with Tweets on the same topic that link to online news sources respectively online articles that were published around the same time (I. -V.) (Figure 2.). These reports were mainly published by local newspapers of local branches of national newspapers. Figure 2. Wikipedia page views from September 1st to December 31st, 2014 relating to articles about Enterohämorrhagische Escherichia coli (EHEC), Hämolytisch-urämisches Syndrom (HUS) and Escherichia coli (E. Coli) (relative frequency) and Tweets that report a related topic at the time of peak Furthermore, Peak II. represents an increase in page view numbers for the term EHEC, but the related online news article is about E. Coli in drinking water found in North Rhine-Westphalia, whereas page view Peaks IV. and V. represents the term EHEC which can be related to the discovery of EHEC infections in Schleswig-Holstein and Bavaria (Table 1.) However, the peak of page views related to the term HUS (Peak III.) was related to the Czech priest and philosopher Jan Hus (see Table 1.). In this case Tweets relating to the HUS disease could not be found. Table 1. Wikipedia page views peaks and related Twitter content Number Date of peak Event Date and source of first related online article I Detection of E.Coli in public water pipes in Rhineland-Palatinate (Germany) (regional newspaper) II Detection of E. Coli in public water pipes near of North Rhine-Westphalia (regional news source) III Online Article about the Czech priest Jan Hus (national magazine) IV Man with EHEC was brought to hospital n Schleswig-Holstein (link to regional newspaper) V Three children infected with EHEC in an public Kindergarten in Bavaria (announcement via Twitter) 133 7 Tweets that count and Tweets that don t When a particular topic or food safety issue is named, it is easier to count and analyse Tweets that relate to the named topic (e. g. Diaz-Aviles et al., 2012). Unfortunately, due to restrictions and limitations of the Twitter application programming interface (Twitter API) it wasn t possible for us to retrieve enough Tweets that were published before the 2011 EHEC outbreak for further in depth analysis. This is why we looked ahead and focused our analysis on Tweets that were published between 1 December and 31 December The aim of our Twitter analysis was to find patterns and co-occurrences of words in order to optimise keyword and rule based search filters for noise reduction in health related Tweets. This is necessary to reduce the amount of collected web content that has to be explored by hand until more sophisticated and automated sorting procedures exist. But what content do Tweets have before an ongoing food safety issue is officially notified? Table 2. Number of frequent bigrams occurring in a set of Tweets containing the keywords Durchfall (diarrhoea), Bauchschmerzen (abdominal pain) and Übelkeit (sickness) Durchfall (n=1.129) Bauchschmerzen (n=1.929)) Übelkeit (n=873) durchfall und *) 99 ich hab 248 und übelkeit **) 229 bei durchfall 40 bauchschmerzen und 147 die übelkeit 58 wie durchfall 39 hab bauchschmerzen 145 gegen übelkeit 41 durchfall hat 38 mit bauchschmerzen 116 kopfschmerzen und 26 mit durchfall 31 ich habe 110 mit übelkeit 26 ich habe 25 ich bauchschmerzen 84 übelkeit ist 21 ich hab 23 die bauchschmerzen 76 bei mir 20 gegen durchfall 21 hab ich 68 kopfschmerzen übelkeit 19 in der 21 bauchschmerzen vor 67 ich bin 17 hat durchfall 20 habe bauchschmerzen 66 der übelkeit 16 *) including the bigram und durchfall **) including the bigram übelkeit und To answer this question, we explored a selection of health related Tweets containing keywords respectively symptoms of food-borne diseases (Matzik, 2014) like Durchfall (diarrhoea), Bauchschmerzen (abdominal pain) and Übelkeit (sickness) with the intention of understanding how and what people tweet when they use these terms. The keywords were chosen on the assumption that people rather tweet about their symptoms than about the medical name of the diseases (Velardi et al., 2014). For our analysis we prepared the data by removing duplicate Tweets and Re-tweets from the data sets in order to create an unbiased list of unique Tweets to analyse word frequencies and frequently used bigrams (set of two terms that occur next to each other) (Table 2). In order to keep maximum information, we didn t remove stopwords. Stopwords are terms that don t carry none or not much information like what, the, it and the like. It is reported that removing stopwords can be problematic if sentiments are to be analysed (Saif et al., 2014). We found that Twitter users that tweet about abdominal pain ( Bauchschmerzen ) often state that they are actually suffering from abdominal pain, whereas Tweets containing the keyword diarrhoea ( Durchfall ) more often make statements about the symptom itself (Table 2). Furthermore, in many cases people state that they suffer from more than one symptom as the frequent use of the word und (and) indicates. People who are tweeting about Übelkeit (sickness) apparently o
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks