TechPush IEEE Project www.techpush.in: Opinion Mining and Social Networks: a Promising Match Project

Opinion Mining and Social Networks:

a Promising Match

Abstract—In this paper we discuss the role and importance of social networks as preferred environments for opinion mining and sentiment analysis especially. We begin by briefly describing selected properties of social networks that are relevant with respect to opinion mining and we outline the general relationships between the two disciplines. We present the related work and provide basic definitions used in opinion mining. Then, we introduce our original method of opinion classification and we test the presented algorithm on real world datasets acquired from popular Polish social networks, reporting on the results. The results are promising and soundly support the main thesis of the paper, namely, that social networks exhibit properties that make them very suitable for opinion mining activities.

Keywords: opinion mining, sentiment analysis, social

computing, social networks

I. INTRODUCTION

Graphs and networks certainly rank among one of the most popular data representation models due to their universal applicability to various application domains. The need to analyze and mine interesting knowledge from graph and network structures has been long recognized, but only recently the advances in information systems have enabled the analysis of graph structures at huge scales. Analysis of graph and network structures gained new momentum with the advent of social networks. While the analysis of social networks has been a field of intensive research, particularly in the domains of social sciences and psychology, economy or chemistry, it is the emergence of huge social networking services over the Web that

spawned the research into large-scale structural properties of social networks.. Social networks exhibit a very clear community structure. Such community structure partially stems from objective limitations (e.g., internal organizational structure of a company can be closely represented by the ties within a particular social network) or, to some extent, may result from subjective user actions and activities (e.g., bonding with other people who share one’s interests and hobbies). Unveiling the true structure of a social network and understanding of communities forming within the network is the key factor in understanding what the future structure of network will be. The main goal of social network analysis is the study of structural properties of networks. Structural analysis of the social network investigates the properties of individual vertices and the global properties of the network as a whole. It answers two basic classes of questions about the network: what is the structural position of any given individual node and what can be said about groups (communities) forming within the network. The main measurement of a node’s social power (also called member’s prestige) is centrality, which allows to determine node’s relative and absolute importance in the network. There are several methods to determine node’s centrality, such as the degree centrality (the number of links that connect to a given node), the betweenness centrality (the number of shortest paths between any pair of nodes in the network that traverse a given node) or the closeness centrality (the mean of shortest paths lengths to other nodes in the network). From the point of view of opinion mining the ability to assess the node’s prestige is essential as it allows to differentiate between opinions of different individuals. More specifically, node’s prestige allows to assign different weights to opinions and associate more importance to opinions expressed by prominent individuals. Another factor that is often considered in opinion mining is the identification of influential individuals. An influential individual does not have to be necessarily characterized with high degree centrality to influence the average opinion within the network. Usually, such individuals are characterized by high betweenness

centrality, impacting the dissemination of opinion rather than forming the opinion. For instance, an individual with high betweenness centrality can stop a negative opinion from spreading through the network, or, on the other hand, she can amplify the opinion. Due to psychological reasons humans tend to form their opinions in such way that the opinions conform with the norm established within a given social group. Thus,

when mining opinions one has to take into consideration the influence of the context in which the opinion is forming, i.e. the social milieu of an individual. Social networks are highly effective in bolstering group formation

RELATED WORK

Literature related to social network analysis is extremely abundant and rich. The first proposals toperform social network analysis originated in the domains of social sciences and psychology [12] or economy [13]. Interestingly, much of this research rephrased what has been previously discussed in physics within the context of complex systems [14]. The most thorough summary of social network analysis topics, models and algorithms can be found in [17]. Opinion mining is a relatively new domain spanning between the fields of data mining, machine learning and natural language processing. Sentiment Analysis methods can be regarded both as a supervised [1][5] and an unsupervised learning methods [6][15], and an information retrieval methods [16][18]. Many works concerning

opinion mining present conceptions based on dealing with text documents modelled as sets of words [1] or vectors, where dimensions represents words and values are weights of words in the document [2]. In the vast majority of sentiment analysis methods, information about connotations of a word with a positive or a negative class is used to calculate document’s

semantic orientation γ

where 􀝐􀯜 is the i-th term of the document d, |􀝀| is

the number of terms appearing in the document d, 􀜥􀯉 and

􀜥􀯇 are positive and negative classes, respectively, and

score() is a function that assigns positive or negative

values to terms, depending on their relationship with

the respective class. Semantic orientations of individual terms are aggregated using a dictionary method [5]. This method uses two small sets of manually identified positive and negative adjectives, which serve as seed sets. New terms are subsequently added to these sets if they are linked by semantically loaded conjunctions such as “and”, “but”, “however”, etc. Some opinion mining algorithms use the pointwise mutual information measure to determine semantic orientation of a term [3][4][6]. In this case semantic orientation of a term is inferred from the association between the term and a word (or a set of words) assigned unambiguously to only one class (positive or negative),

e.g. excellent and poor. The pointwise mutual information

of the term t and the word w is defined as

OUR APPROACH

The method proposed in this paper for determining term’s semantic orientation is a variant of the method used in [1]. The drawback of the original method is that it assigns maximum or minimum value to all terms if they occur in only one class, regardless of the number of occurrences. Therefore, we have proposed an alternative way of calculating the semantic orientation of a term. Our method is based on the ratio of term occurence frequency in documents assigned to positive and negative classes. According to our approach the scoring function for assigning positive and negative scores to terms becomes

Example: Let us compute token polarity evaluation in

the way presented above. Let’s assume training set

contains 1000 positive and 200 negative examples, token T

occured 9 times in positive examples, and 3 times in

negative examples.

Software and hardware requirements

4.2.2.1 Development Environment

· Operating System: Windows 2000 Pro\NT\98\xp\7

The system will be built on windows compatible environment. The application will be web based using Java technology

· Web Server:

IIS – Internet Information Services

· Server side Application Software: Active Server Pages.NET (ASP.NET)

· Client Side Application Software: Java Script, HTML

· Data Base: SQL Server 2000 \2005

The system requires SQL Server as a database, however the system will be ODBC complaint to work on any standard database.

· Client Browsers:

Internet Explorer 5.0 or Netscape Navigator 4.7

The system requires Internet Explorer or Netscape Navigator browser for client side.

· Hardware: Pentium PCs with 128 MB RAM/ 20 GB HDD.

4.2.2.2 Production Environment

· Operating System: Windows 2000 Pro/NT/98 /xp/7

The system will be built on windows compatible environment. The application will be web based using ASP.NET technology.

· Web Server:

IIS – Internet Information Services.

· Server side Application Software: ASP.NET

Client Side Application Software: Java Script, HTML.

· Data Base: SQL Server 2000 \2005

The system requires SQL Server as a database, however the system will be ODBC complaint to work on any standard database.

· Client Browsers:

Internet Explorer 4.0 and above

Netscape Navigator 4.0 and above

The system requires Internet Explorer or Netscape Navigator browser for client side.

· Hardware: Pentium PCs with 128 MB RAM/ 20 GB HDD.