An Automatic Answering System with Template
Matching for Natural Language Questions
Matching for Natural Language Questions
Abstract
Using computers to answer natural language questions is
an interesting and challenging problem. Generally such problems are handled
under two categories: open domain problems and close domain problems. This
paper presents a system that attempts to solve close domain problems.
Typically, in a close domain, answers to questions are
not available in the public domain and therefore they cannot be searched using a search engine. Hence answers have to
be stored in a database by a domain expert. Then, the challenge is to
understand the natural language question so that the solution could be matched
to the respective answer in the database. We use a template matching technique
to perform this matching. In addition, given that our target is to use this
system with non-native English speakers, we developed a method to overcome the
mismatches we might encounter due to spelling mistakes. The system is developed
such that the questions can be asked using short messages from a mobile phone
and therefore the system is designed to understand SMS language in addition to
English. One of the main contributions of this paper is the outcome presented
of a deployment of this system in a real environment.
Keywords—FAQ, Answering System, SMS,
Template Matching
Introduction
EVELOPING mechanisms for using computers to answer user
questions is becoming an interesting problem with the increased use of
computers. Such mechanisms allow users to ask questions in a natural language
and give a concise and accurate answer.
Understanding user questions in natural languages requires Natural Language
Processing (NLP). Being an active area of research, NLP plays a big role in the
ICT and Question Answering (QA) systems.
Natural language processing is the computerized approach
to analyzing text based on both a set of theories and a set of technologies. It
will become important to be able to ask queries and obtain answers, using
natural language (NL) expressions, rather than the keyword based retrieval
mechanisms. The QA system can better satisfy the needs of users as they will
provide an accurate, quicker, convenient and effective way of giving answers to
user questions. The approach we have adopted in this project is an automated
FAQ (Frequently Asked Question) answering system that replies with pre-stored
answers to user questions asked in ordinary English, rather than keyword or
syntax based retrieval mechanisms. This is achieved using a template matching
technique with some other mechanisms
like disemvoweling, matching synonyms, etc.
Related Work
•
Q&A
system research received considerable attention from the research community
through Text Retrieval Conference Q&A track since 1999.
•
The
original aim of the track is to systematically evaluate both academic and
commercial Q&A systems. Maybury has discussed the characteristics of
Q&A systems and resources needed to develop and evaluate such systems.
•
Main approaches in Q&A systems could be
found in which template based approach discussed in detail.
•
Although,
most Q&A systems are based on Web environments, SMS has also been used as
an environment in contexts such as in learning and agriculture.
Our
Approach
Main
modules:
•
pre-processing,
•
question
template matching
•
Answering
•
SMS
Abbreviation
•
Stop
Word
•
Ward
Parser
•
Synonyms
Matcher
•
Security
•
disemvoweling
Architecture
In this section we describe the architecture of our
system. The overall architecture of the system can be subdivided into three
main modules:
(1) pre-processing,
(2) question template matching, and
(3) answering.
A. Pre-Processing Module
Pre-processing module mainly consists of three
operations: (1) converting SMS abbreviations into general English words, (2)
removing stop words, and (3) removing vowels. Since the system is expected to
process texts with both natural and SMS languages it is necessary to replace
the SMS abbreviations with the corresponding English words before processing
user questions further. This is done by referring to pre-stored frequently used
SMS abbreviations. Stop words are the words that add no effect to the meaning
of a sentence even if they are removed.
Removing stop words is done to increase the effectiveness of the system by
saving time and disk space.
Examples of stop words are the, a, and, etc. Next step in
this module is to remove vowels from the text to handle spelling
mistakes. This process is called disemvoweling which will be discussed in
details in coming sections.
B. Question-Template Matching Module
The pre-processed text is matched against each and every pre stored
template until it finds the best matched template with the received text. In
order to do this, templates are created according to a specific syntax and the
details are described in section IV. Further in this module, words that are
considered to have synonyms are referred in a synonym file. This synonym file
can be modified according to the relevant domain and are updated from a
standard database such as WordNet [6]. It is worth noting that the templates
here are for questions and not for answers. The main target of this system is
to identify the closest template that matches the question we have received
from the user.
C. Answering Module
Since each and every template representing a question are pre stored in a
database with its answer, just when the best matched template for the question
is found, the corresponding answer will be returned to the end user.
Algorithms
•
Disemvowelling
•
SMS
Abbreviation Replace
•
Stop
Words
•
Template
Matching
•
MD5
•
Top-down
parser
Software and hardware requirements
•
Hardware :
▫
GSM Modem
•
Software :
▫
JAVA jdk 1.6
▫
Apache Tomcat 6
▫
MYSQL 5
▫
NetBeans 7.0