An Automatic Answering System with Template Matching for Natural Language Questions Project


An Automatic Answering System with Template
Matching for Natural Language Questions

Abstract
Using computers to answer natural language questions is an interesting and challenging problem. Generally such problems are handled under two categories: open domain problems and close domain problems. This paper presents a system that attempts to solve close domain problems.
Typically, in a close domain, answers to questions are not available in the public domain and therefore they cannot be searched  using a search engine. Hence answers have to be stored in a database by a domain expert. Then, the challenge is to understand the natural language question so that the solution could be matched to the respective answer in the database. We use a template matching technique to perform this matching. In addition, given that our target is to use this system with non-native English speakers, we developed a method to overcome the mismatches we might encounter due to spelling mistakes. The system is developed such that the questions can be asked using short messages from a mobile phone and therefore the system is designed to understand SMS language in addition to English. One of the main contributions of this paper is the outcome presented of a deployment of this system in a real environment.

KeywordsFAQ, Answering System, SMS, Template Matching

Introduction
EVELOPING mechanisms for using computers to answer user questions is becoming an interesting problem with the increased use of computers. Such mechanisms allow users to ask questions in a natural language and give a  concise and accurate answer. Understanding user questions in natural languages requires Natural Language Processing (NLP). Being an active area of research, NLP plays a big role in the ICT and Question Answering (QA) systems.
Natural language processing is the computerized approach to analyzing text based on both a set of theories and a set of technologies. It will become important to be able to ask queries and obtain answers, using natural language (NL) expressions, rather than the keyword based retrieval mechanisms. The QA system can better satisfy the needs of users as they will provide an accurate, quicker, convenient and effective way of giving answers to user questions. The approach we have adopted in this project is an automated FAQ (Frequently Asked Question) answering system that replies with pre-stored answers to user questions asked in ordinary English, rather than keyword or syntax based retrieval mechanisms. This is achieved using a template matching technique with some other mechanisms
like disemvoweling, matching synonyms, etc.


Related Work

         Q&A system research received considerable attention from the research community through Text Retrieval Conference Q&A track since 1999.
         The original aim of the track is to systematically evaluate both academic and commercial Q&A systems. Maybury has discussed the characteristics of Q&A systems and resources needed to develop and evaluate such systems.
          Main approaches in Q&A systems could be found in which template based approach discussed in detail.
         Although, most Q&A systems are based on Web environments, SMS has also been used as an environment in contexts such as in learning and agriculture.


Our Approach
Main modules:
         pre-processing,
         question template matching
         Answering
         SMS Abbreviation
         Stop Word
         Ward Parser
         Synonyms Matcher
         Security
         disemvoweling

Architecture

In this section we describe the architecture of our system. The overall architecture of the system can be subdivided into three main modules:
(1) pre-processing,
(2) question template matching, and
(3) answering.


A. Pre-Processing Module
Pre-processing module mainly consists of three operations: (1) converting SMS abbreviations into general English words, (2) removing stop words, and (3) removing vowels. Since the system is expected to process texts with both natural and SMS languages it is necessary to replace the SMS abbreviations with the corresponding English words before processing user questions further. This is done by referring to pre-stored frequently used SMS abbreviations. Stop words are the words that add no effect to the meaning of a sentence even if they are removed.
Removing stop words is done to increase the effectiveness of the system by saving time and disk space.
Examples of stop words are the, a, and, etc. Next step in this module is to remove vowels from the text to handle spelling
mistakes. This process is called disemvoweling which will be discussed in details in coming sections.


B. Question-Template Matching Module
The pre-processed text is matched against each and every pre stored template until it finds the best matched template with the received text. In order to do this, templates are created according to a specific syntax and the details are described in section IV. Further in this module, words that are considered to have synonyms are referred in a synonym file. This synonym file can be modified according to the relevant domain and are updated from a standard database such as WordNet [6]. It is worth noting that the templates here are for questions and not for answers. The main target of this system is to identify the closest template that matches the question we have received from the user.


C. Answering Module
Since each and every template representing a question are pre stored in a database with its answer, just when the best matched template for the question is found, the corresponding answer will be returned to the end user.

Algorithms
         Disemvowelling
         SMS Abbreviation Replace
         Stop Words
         Template Matching
         MD5
         Top-down parser


Software and hardware requirements

         Hardware :
         GSM Modem

         Software :
         JAVA  jdk 1.6
         Apache Tomcat 6
         MYSQL 5
         NetBeans 7.0