Opinion Mining
and Social Networks:
a Promising Match
Abstract—In this paper we discuss the role and importance of
social networks as preferred environments for opinion mining and sentiment
analysis especially. We begin by briefly describing selected properties of
social networks that are relevant with respect to opinion mining and we outline
the general relationships between the two disciplines. We present the related work
and provide basic definitions used in opinion mining. Then, we introduce our original
method of opinion classification and we test the presented algorithm on real
world datasets acquired from popular Polish social networks, reporting on the
results. The results are promising and
soundly support the main thesis of the paper, namely, that social networks
exhibit properties that make them very suitable for opinion mining activities.
Keywords: opinion mining, sentiment analysis, social
computing, social networks
I. INTRODUCTION
Graphs
and networks certainly rank among one of the most popular data representation
models due to their universal applicability to various application domains. The
need to analyze and mine interesting knowledge from graph and network
structures has been long recognized, but only recently the advances in information
systems have enabled the analysis of graph structures at huge scales. Analysis
of graph and network structures gained new momentum with the advent of social
networks. While the analysis of social networks has been a field of intensive research,
particularly in the domains of social sciences and psychology, economy or
chemistry, it is the emergence of huge social networking services over the Web
that
spawned
the research into large-scale structural properties of social networks.. Social
networks exhibit a very clear community structure. Such community structure
partially stems from objective limitations (e.g., internal organizational
structure of a company can be closely represented by the ties within a
particular social network) or, to some extent, may result from subjective user
actions and activities (e.g., bonding with other people who share one’s
interests and hobbies). Unveiling the true structure of a social network and
understanding of communities forming within the network is the key factor in understanding
what the future structure of network will be. The main goal of social network
analysis is the study of structural properties of networks. Structural analysis
of the social network investigates the properties of individual vertices and
the global properties of the network as a whole. It answers two basic classes
of questions about the network: what is the structural position of any given individual
node and what can be said about groups (communities) forming within the
network. The main measurement of a node’s social power (also called member’s
prestige) is centrality, which allows to determine node’s relative and absolute
importance in the network. There are several methods to determine node’s
centrality, such as the degree centrality (the number of links that connect to
a given node), the betweenness centrality (the number of shortest paths between
any pair of nodes in the network that traverse a given node) or the closeness centrality
(the mean of shortest paths lengths to other nodes in the network). From the
point of view of opinion mining the ability to assess the node’s prestige is
essential as it allows to differentiate between opinions of different
individuals. More specifically, node’s prestige allows to assign different
weights to opinions and associate more importance to opinions expressed by
prominent individuals. Another factor that is often considered in opinion
mining is the identification of influential individuals. An influential
individual does not have to be necessarily characterized with high degree
centrality to influence the average opinion within the network. Usually, such
individuals are characterized by high betweenness
centrality,
impacting the dissemination of opinion rather than forming the opinion. For
instance, an individual with high betweenness centrality can stop a negative
opinion from spreading through the network, or, on the other hand, she can
amplify the opinion. Due to psychological reasons humans tend to form their opinions
in such way that the opinions conform with the norm established within a given
social group. Thus,
when
mining opinions one has to take into consideration the influence of the context
in which the opinion is forming, i.e. the social milieu of an individual.
Social networks are highly effective in bolstering group formation
RELATED WORK
Literature
related to social network analysis is extremely abundant and rich. The first
proposals toperform social network analysis originated in the domains of social
sciences and psychology [12] or economy [13]. Interestingly, much of this
research rephrased what has been previously discussed in physics within the
context of complex systems [14]. The most thorough summary of social network
analysis topics, models and algorithms can be found in [17]. Opinion mining is
a relatively new domain spanning between the fields of data mining, machine
learning and natural language processing. Sentiment Analysis methods can be
regarded both as a supervised [1][5] and an unsupervised learning methods
[6][15], and an information retrieval methods [16][18]. Many works concerning
opinion
mining present conceptions based on dealing with text documents modelled as
sets of words [1] or vectors, where dimensions represents words and values are
weights of words in the document [2]. In the vast majority of sentiment
analysis methods, information about connotations of a word with a positive or a
negative class is used to calculate document’s
semantic
orientation γ
where
is the i-th term of the document d, || is
the
number of terms appearing in the document d, and
are positive and negative classes, respectively, and
score()
is a function that assigns positive or negative
values
to terms, depending on their relationship with
the
respective class. Semantic orientations of individual terms are aggregated
using a dictionary method [5]. This method uses two small sets of manually
identified positive and negative adjectives, which serve as seed sets. New
terms are subsequently added to these sets if they are linked by semantically
loaded conjunctions such as “and”, “but”, “however”, etc. Some opinion mining
algorithms use the pointwise mutual information measure to determine semantic orientation
of a term [3][4][6]. In this case semantic orientation of a term is inferred
from the association between the term and a word (or a set of words) assigned unambiguously
to only one class (positive or negative),
e.g.
excellent and poor. The pointwise mutual information
of
the term t and the word w is defined as
OUR APPROACH
The
method proposed in this paper for determining term’s semantic orientation is a
variant of the method used in [1]. The drawback of the original method is that
it assigns maximum or minimum value to all terms if they occur in only one
class, regardless of the number of occurrences. Therefore, we have proposed an
alternative way of calculating the semantic orientation of a term. Our method
is based on the ratio of term occurence frequency in documents assigned to
positive and negative classes. According to our approach the scoring function
for assigning positive and negative scores to terms becomes
Example: Let us compute
token polarity evaluation in
the
way presented above. Let’s assume training set
contains
1000 positive and 200 negative examples, token T
occured
9 times in positive examples, and 3 times in
negative
examples.
Software and hardware requirements
·
Operating System: Windows 2000 Pro\NT\98\xp\7
The system will be built on windows
compatible environment. The application will be web based using Java technology
·
Web Server:
IIS – Internet Information Services
·
Server side Application Software: Active Server Pages.NET (ASP.NET)
·
Client Side Application Software: Java Script, HTML
·
Data Base:
SQL Server 2000 \2005
The system requires SQL Server as a database,
however the system will be ODBC complaint to work on any standard database.
·
Client Browsers:
Internet Explorer
5.0 or Netscape Navigator 4.7
The system requires Internet Explorer or
Netscape Navigator browser for client side.
·
Hardware: Pentium PCs with 128 MB RAM/ 20 GB HDD.
·
Operating System: Windows 2000 Pro/NT/98 /xp/7
The system will be built on windows
compatible environment. The application will be web based using ASP.NET
technology.
·
Web Server:
IIS – Internet Information Services.
·
Server side Application Software: ASP.NET
Client Side Application
Software: Java Script, HTML.
·
Data Base: SQL Server 2000 \2005
The system requires SQL Server as a database,
however the system will be ODBC complaint to work on any standard database.
·
Client Browsers:
Internet Explorer
4.0 and above
Netscape Navigator 4.0 and above
The system requires Internet
Explorer or Netscape Navigator browser for client side.
·
Hardware: Pentium PCs with 128 MB RAM/ 20 GB HDD.