# Build a QA System using BERT and Hugging Face

Artificial-Intelligence Mar 1, 2021

A chatbot is an AI software that can simulate a conversation (or a chat) with a user through messaging applications, websites, mobile apps or through the telephone.
A chatbot is often described as one of the most advanced and promising expressions of interaction between humans and machines. However, from a technological point of view, a chatbot only represents the natural evolution of a Question Answering system leveraging Natural Language Processing (NLP). Formulating responses to questions in natural language is one of the most typical Examples of Natural Language Processing applied in various enterprises’ end-use applications.

BERT (Bidirectional Encoder Representations from Transformers) has started a revolution in NLP with state of the art results in various tasks, including Question Answering, GLUE Benchmark, and others. People even referred to this as the ImageNet moment of NLP.

Question Answering systems are built on pairs of question and contexts.

You can check out an example hosted version here.

### The Tutorial:

In this tutorial, we will use a pre-trained modified version of BERT from Hugging Face which was trained on Squad 2.0 dataset. We will provide the questions and for context, we will use the first match article from Wikipedia through wikipedia package in Python. Then we will tokenize the article using the AutoTokenizer model in order for the AutoModelForQuestionAnswering model to predict the sequence of words which will be our answer.

### A little background:

The model we are using was originally trained on masked datasets where the researchers masked key words in a huge corpus and the task for the model was to predict that word. The QA system uses a similar system for its set of tasks.

Now, let's get into the tutorial.

First we will create a class that will compile the model import and tokenizing of the question and matched wikipedia article.

class QASystemWithBERT:
def __init__(self, pretrained_model_name_or_path='bert-large-uncased'):
)
)
self.max_len = self.model.config.max_position_embeddings
self.chunked = False

def tokenize(self, question, text):
self.inputs = self.tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="pt", return_token_type_ids=True)
self.input_ids = self.inputs["input_ids"].tolist()[0]

if len(self.input_ids) > self.max_len:
self.inputs = self.chunkify()
self.chunked = True

def chunkify(self):
"""
Break up a long article into chunks that fit within the max token
requirement for that Transformer model.
"""

chunk_size = self.max_len - qt.size()[0] - 1

chunked_input = OrderedDict()
for k,v in self.inputs.items():
chunks = torch.split(c, chunk_size)

for i, chunk in enumerate(chunks):
if i not in chunked_input:
chunked_input[i] = {}

thing = torch.cat((q, chunk))
if i != len(chunks)-1:
if k == 'input_ids':
thing = torch.cat((thing, torch.tensor([102])))
else:
thing = torch.cat((thing, torch.tensor([1])))

chunked_input[i][k] = torch.unsqueeze(thing, dim=0)
return chunked_input

if self.chunked:
for k, chunk in self.inputs.items():

if ans != '[CLS]':
answer += ans + " / "
else:

return self.convert_ids_to_string(self.inputs['input_ids'][0][

def convert_ids_to_string(self, input_ids):
return self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(input_ids))

Now, we are going to pass a list of questions and get the answers to our questions:

questions = [
'Who is the President of the United States of America?',
'How many sides does a pentagon have?'
]

for question in questions:
print(f"Question: {question}")
results = wiki.search(question)

page = wiki.page(results[0])
print(f"Top wiki result: {page}")

text = page.content

qas.tokenize(question, text)
print()

Well, it answered all three questions correctly (after 10 retries on each question & changing wordings of the question numerous times), have a look:

### [Optional]

Checkout the public notebook below:

### [Bonus]

Checkout the below post in order to better understand BERT and the impact it has had on NLP research.

Cheers!

### Tags

#### Jay Sinha

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.