Design and Implementation of a Secure Social Network System - PDF

Please download to get full document.

View again

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Presentations

Published:

Views: 20 | Pages: 11

Extension: PDF | Download: 0

Share
Related documents
Description
Design and Implementation of a Secure Social Network System Ryan Layfield, Bhavani Thuraisingham, Latifur Khan, Murat Kantarcioglu, Jyothsna Rachapalli The University of Texas at Dallas Abstract Context-based
Transcript
Design and Implementation of a Secure Social Network System Ryan Layfield, Bhavani Thuraisingham, Latifur Khan, Murat Kantarcioglu, Jyothsna Rachapalli The University of Texas at Dallas Abstract Context-based anomaly tracking represents a new approach to security enhancement of communication streams. By creating a system that develops an understanding of normal and abnormal based on communication history, it is possible to detect fluctuations in an evolving social network. Although more research is necessary to overcome current obstacles, the combination of social network analysis and anomaly detection techniques yields a promising set of applications for enhancing communication security. In this paper we will describe a system for context-based anomaly detection and then describe experiments for message surveillance application. S I. INTRODUCTION ocial networks are essentially networks formed by individuals, groups and organizations. Social network analysis is about analyzing the behaviors of individuals, groups and organizations and determine their behavior patterns/ Social network analysis is becoming an important tool for counterterrorism applications. For example with social networks analysis one can perhaps determine whether individuals, groups or organizations are involved in terrorist activities. An example of a social network is illustrated in Figure 1. Monitoring a continuous stream of data in the interest of security is not a trivial problem. In order to properly classify a single message as normal or suspicious, one must parse the contents, determine the origin, identify the recipients, and determine how prior communication traffic affects the context. Whether or not a message is suspect, it can theoretically affect the semantic meaning of future communications. This implies some method of storage for prior messages is necessary for a perfect detection system. While tools exist for message classification with a variety of intent, there are no known systems which establish a localized context for each node in the interest of security. While individuals being monitored may share a similar context in a common environment, there are several scenarios in which the same message passed by two different users does not have the same meaning. Hence, there is a need for a system to personalize context data to properly ascertain security threats. One of the biggest challenges in automated message surveillance is the recognition of messages containing suspicious content. A classic approach to this problem is constructing a set of keywords (i.e. bomb, nuclear ). In the event that a communiqué contains one or more of these words, the message is flagged as suspicious for further review. However, there are two drawbacks to this particular approach. First, it is reasonable to assume that such relatively static keywords will not always be present in messages that would otherwise warrant suspicion. Second, there is little guarantee that a sufficiently intelligent individual will not recognize such surveillance is in place and, instead, use substitute words in place of known keywords. David Skillicorn of Queens University has suggested a different approach in his work on the Enron dataset [SKIL05]. In his work, he outlines a method for using singular value decomposition (SVD) in the interest of recognizing trends in such topics as and social networks. We believe that this work can be expanded upon by extrapolating the techniques he used and applying them to a real-time message monitoring system. We wish to create algorithms which are designed to handle streaming data that make use of the techniques outlined in Skillicorn's work. By making use of the Enron dataset, we will use our existing threat identification techniques and apply singular value decomposition to discover data correlation of message content. Through word frequency analysis, we can rate the similarity of two separate data streams that may or may not deal with the same topic. The benefits of such an algorithm would allow us to recognize subjects, trends, and even conversations. The origination of this paper is as follows. In section 2 we will discuss the design of our socials network system and discuss context based anomaly tracking. In section 3 we will discuss social jet work analysis for message detection. We will provide some background information on the techniques utilized as well as discuss our system. We will also discuss security and privacy considerations. The paper is concluded in section /09/$ IEEE 236 ISI 2009, June 8-11, 2009, Richardson, TX, USA One or more messages passed from one node to another forms a basic unidirectional link. Over time, as more messages are passed, it becomes possible to determine common lines of communication among people. By weighting links based on message frequency and whether or not replies are given in a timely manner, the strength of social ties between individuals can be realized. Figure 1. A Typical Social Network II. INTEGRATING SOCIAL NETWORK ANALYSIS WITH CONTEXT-BASED ANOMALY TRACKING A. Background Since its inception, has become an increasingly popular form of communication. According to a study performed by research group IDC in 2002, traffic will increase from 31 billion messages a day to 60 billion by 2006 [MINI05]. As of the second quarter of 2005, there are roughly 900 million known users of the internet (Global Reach). Assuming that mail traffic increases linearly, each user receives an average of 60 s a day. Manually sifting through the traffic of a group of a hundred people would require an individual to read 6,000 s a day. Clearly, automated methods are necessary to deal with such increasing volumes of data. The combination of text analysis and link mining concepts itself is not a new avenue of research. The work of Ben-Dov et. al. demonstrates the ability to enhance link mining of news sites by using available tools to semantically comprehend the contents of a document. One experiment performed by the group successfully discovered correlations between two individuals based simply on their presence within the same sentence. Successful examples can also be found in the field of semantic web analysis [HORR03]. B. Our Approach Overview The system we propose is an active monitoring agent that resides at a major message communication hub. Each message that passes through the hub is deconstructed to acquire basic information, such as source and destination addresses. These in turn are used in the construction of an evolving graph of communication patterns. anomaly detection is not a new concept. One existing topic of interest is the identification and filtering of spam. Using a set of desirable message attributes, a spam removal system is responsible for removing all unwanted from a user s inbox. This frequently includes advertisements, fraudulent topics, general bulk mail, and any other messages that do not appear relevant. Ultimately, this will ideally result in a set of messages consisting only of what the user desires [GOLD92-61]. While not always providing enhanced security directly, spam filtering represents a well-defined area to build from. The monitoring of and other point-to-point contact services can be used to build a relationship-oriented web. Instead of simply looking at each message as an isolated event, such a web allows the complex relationships between individuals to be mapped and further analyzed. Formally, this approach is known as constructing a social network. Within a social network, each individual represents a node. Figure 2. Original System Architecture The fundamental properties of this design can be found in how the elements within a monitored group are represented. First, each individual that uses the hub is kept as a user node. As in social networking, each node represents an endpoint of communication. Basic contact information is kept at the node to identify when future messages are arriving or departing at the node itself. In the 237 case of , the address is all the contact information necessary to uniquely identify the user. It is assumed that these identifiers do not change over time. Each message passed represents a conversational link between two users. The direction of the link is determined by the source and destinations of the message. In the event a message is passed back, the link automatically becomes bi-directional. The strength of a link is dependent on the number of messages passed in either direction. The attributes of the message itself is stored within the link. This allows the system to retrieve historical data between two nodes without needing to go to each node, find the relayed messages between them, etc. This is counter-intuitive to how messages are normally stored within most message services, but it is necessary to form context. Since a single message can be sent to multiple parties, a single instance of the message is often shared by multiple links. created and passed along to each individual that received the information. In turn, should any of the recipients disseminate the classified information to other recipients, additional child tokens will be created. Ultimately, a localized web of suspicious parties is created and tracked for future investigations. When the system is deployed, it is of note that it is the responsibility of an agent observing the results to take action. Ideally, the agent will be a human responsible for the security of the group being monitored. The system itself is only an observational tool. This approach was chosen to maximize potential uses for the system. For example, the responses chosen by an intelligence organization would vary widely from an internet service provider. Analysis In this section we discuss the strengths and weaknesses of our approach. Future directions will be discussed in Section 5. In the event the message passed has unusual properties, the anomalous characteristics are noted and recorded within a unique token. Attributes of a message ideally include unusual keywords, communication pattern deviations, and any other clues that may be necessary to identify future messages with similar traits. Other attributes of the token include an atomic identifier and a pointer to the originating token, if any. The token itself is considered as part of the established context, and it is stored within node endpoints. Current, the only characteristics of a message that the system tracks is a fixed set of keywords within the body of an . While marginally effective in generating reasonable result data, the technique is far from sufficient. The future plans section of this document describes the techniques which will eventually be implemented. --Strengths In theory, the deployment of this system offers a great deal of benefits. First, all analysis is performed in real-time. This means that, once deployed, the system is actively monitoring the available text stream for any and all communication activity. In the event that a malicious situation is identified, the observer of the system can either respond immediately or await further messages to decide whether a security issue exists. Second, the system indirectly models the complex social interactions of individuals. Hence, as messages are passed, it is possible to identify groups of people with malicious intent and how they collaborate. This is especially crucial to recognition of social sub-networks, in which normal keyword testing could be insufficient in identifying individuals with malicious intent. The reason a node is responsible for storing tokens, rather than a link, can be found in the fact that messages convey information that the user records for future reference when communicating with other users. These tokens represent unusual information that can propagate through a network. When new messages are passed, a check is performed to determine whether or not tokens exist at the originating node which matches the attributes of the message itself. If a match is found, a child token is created and passed along with the message. This child contains a link back to the original, creating a semantic trail that can be traced through a network. Such a trace can be useful in several security scenarios. For example, consider an intelligence agency concerned that there have been leaks of information within the organization to the media. A security manager using this system could begin by flagging certain keywords found only in a top-secret report recently given to suspects. In the event that messages sent from these suspects begin to use these keywords, a context token is 238 For example, consider the deployment of this system in the interest of catching a group of criminals involved in smuggling stolen works of art across international borders. Assume that they are using a message passing network to remain in constant contact along with a multitude of innocent people. Using keywords involving the stolen works, simple text filtering could create a number of false positives from people simply discussing the crimes mentioned in the news. By overlaying detected keyword uses with social network graphs, we could detect a group of individuals using these words among themselves. Once the group is properly identified, the entire set of individuals connected could be captured and questioned. Extending upon this scenario, should any of these individuals be held responsible, the system has already generated a set of conversations shared among the guilty parties. These exchanges could easily translate into an evidence exhibit to be used during prosecution. While certainly capable of being built by existing text mining tool, the convenience offered by the availability of this data is an invaluable tool in situations where time is a factor. --Weaknesses Unfortunately as promising as such as a system may be, it is of note that the proper operation of the system has a number of dependent factors. First, the system requires that it has a roughly omnipotent view of communication among individuals. For example, it assumes that users of an server will not use any other server to communicate, nor any other form of communication that falls outside the bounds of what can be observed. Given that groups of individuals will likely communicate in person at some point, one or more semantic gaps could be created. Such gaps would prohibit token passing among nodes, as well as create inaccuracies within the perceived social network, reducing the overall effectiveness of the system. Second, there are serious ethical implementations for a system with such far reaching observational capability. Regardless of whether or not individuals are engaging in suspicious activity, social models are being created for future reference. Essentially, the data generated can be used to identify how close two individuals are, what they have been talking about, the common points of contact among them, etc. If an individual uses the monitored text stream exclusively for communication, a fairly accurate model of their relationships can be generated. To fully understand how such data can be used against an individual, consider an employer with access to a system that has been observing an individual applying for a position. During the evaluation process, an employer could analyze the social net around the applicant and determine the people they are closest to. These individuals could then be contacted and asked a series of questions about the applicant, their habits, prior employment history, etc. While the employer would benefit greatly from being able to have such data, the potential employee would undoubtedly feel their private life had been violated. Another weakness of the system is the lack of training methods to teach it when certain messages are false positives and false negatives. Although it is assumed that the observing agent can distinguish between results, it is much more convenient to filter out the noise to focus more on issues that require more attention. Additionally, given the token use of the system, a serious amount of false contexts could be created that would cause multiple complications for the entire social network. In theory, the impact of false tokens could be eliminated by giving an agent the option to delete specific tokens, but this is only a temporary solution. Regardless of how effective the system is, the ultimate weakness that this system faces is how much data must be stored. Traditionally, in most message passing networks, messages are stored at the user s terminal, removing the burden from the server. However, in order for the system to properly determine previous context, all messages passed must be stored in an archive after processing. Coupled with the data storage of links among users and the presence of tokens, it is possible that the data requirements of the system could multiply exponentially as more users join a network and average traffic flow increases. As of the creation of this document, there are no known systems that are implemented with these characteristics. Presumably, such systems may fall under classified government security methods. The fabled Echelon system, for example, is rumored to have similar capabilities. However, the lack of documentation to support this claim leads us to believe that this system simply represents a relatively unexplored area of research. Prototype Implementation I Design of the System The objective of this system is to combine social network analysis with text anomaly detection to enhance detection of unusual and undesirable activity. By combining these techniques and applying them to a continuous stream of messages, we believe it is possible to build a more secure communication system by identifying unusual behavior in normal channels of contact. Ultimately, this system will be ideally deployed on either a corporate or public network, provided that adequate authority exists. There are two primary parts necessary for the successful operation of this system: the organizational analysis system and the real-time results viewer. The former is responsible for building an understanding of the system from the text stream being observed. The latter translates the output of the former into a graphical representation viewable by a human security agent. This is necessary due to the intense amount of processing required by each to perform the responsibilities of the system in real-time. The architecture of the system is a largely decentralized and heavily object-oriented. The major aspects of functionality and purpose are encapsulated in appropriately named objects in the interest of keeping data and state information categorized appropriately. However, the system is currently tightly coupled, as many objects are heavily dependent on the functionality of others, often in both directions. Organizational Analysis System This subsystem is responsible for actual processing required to parse, analyze, and derive information from a text stream. Hence, it is broken down into three main pieces: the Delivery Agent, Detection Agent, and Mailroom Agent, each responsible for one of these three tasks, respectively. Basic information is represented as a series of user nodes and 239 conversational links, while tokens represent anomalous behavior. The Delivery Agent object represents the entry point for messages into the system. In the current design, it is a passive agent that, for each iteration, reads in another message from the stream. It then parses that for it s origin, destination, content, etc. Ultimately, a message object is created to embody this . No tokens are created at this point, as the agent does not keep track of context internally. This message is then passed to the Mailroom agent. Neither messages or s are kept on record here. When a message first arrives, the Mailroom Agent queries the Detection Agent to determine if any suspicious activ
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks