Build a QA System using BERT and Hugging Face

Artificial-Intelligence Feb 28, 2021

A chatbot is an AI software that can simulate a conversation (or a chat) with a user through messaging applications, websites, mobile apps or through the telephone.
A chatbot is often described as one of the most advanced and promising expressions of interaction between humans and machines. However, from a technological point of view, a chatbot only represents the natural evolution of a Question Answering system leveraging Natural Language Processing (NLP). Formulating responses to questions in natural language is one of the most typical Examples of Natural Language Processing applied in various enterprises’ end-use applications.

Mia wants treats!
Photo by Camylla Battani / Unsplash

BERT (Bidirectional Encoder Representations from Transformers) has started a revolution in NLP with state of the art results in various tasks, including Question Answering, GLUE Benchmark, and others. People even referred to this as the ImageNet moment of NLP.

Question Answering systems are built on pairs of question and contexts.

You can check out an example hosted version here.

The Tutorial:

In this tutorial, we will use a pre-trained modified version of BERT from Hugging Face which was trained on Squad 2.0 dataset. We will provide the questions and for context, we will use the first match article from Wikipedia through wikipedia package in Python. Then we will tokenize the article using the AutoTokenizer model in order for the AutoModelForQuestionAnswering model to predict the sequence of words which will be our answer.

A little background:

The model we are using was originally trained on masked datasets where the researchers masked key words in a huge corpus and the task for the model was to predict that word. The QA system uses a similar system for its set of tasks.

Now, let's get into the tutorial.

First we will create a class that will compile the model import and tokenizing of the question and matched wikipedia article.

class QASystemWithBERT:
    def __init__(self, pretrained_model_name_or_path='bert-large-uncased'):
        self.READER_PATH = pretrained_model_name_or_path
        self.tokenizer = AutoTokenizer.from_pretrained(self.READER_PATH
        self.model = AutoModelForQuestionAnswering.from_pretrained(self.READER_PATH
        self.max_len = self.model.config.max_position_embeddings
        self.chunked = False

    def tokenize(self, question, text):
        self.inputs = self.tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="pt", return_token_type_ids=True)
        self.input_ids = self.inputs["input_ids"].tolist()[0]

        if len(self.input_ids) > self.max_len:
            self.inputs = self.chunkify()
            self.chunked = True

    def chunkify(self):
        Break up a long article into chunks that fit within the max token
        requirement for that Transformer model. 
        qmask = self.inputs['token_type_ids'].lt(1)
        qt = torch.masked_select(self.inputs['input_ids'], qmask)
        chunk_size = self.max_len - qt.size()[0] - 1 
        chunked_input = OrderedDict()
        for k,v in self.inputs.items():
            q = torch.masked_select(v, qmask)
            c = torch.masked_select(v, ~qmask)
            chunks = torch.split(c, chunk_size)
            for i, chunk in enumerate(chunks):
                if i not in chunked_input:
                    chunked_input[i] = {}

                thing =, chunk))
                if i != len(chunks)-1:
                    if k == 'input_ids':
                        thing =, torch.tensor([102])))
                        thing =, torch.tensor([1])))

                chunked_input[i][k] = torch.unsqueeze(thing, dim=0)
        return chunked_input

    def get_answer(self):
        if self.chunked:
            answer = ''
            for k, chunk in self.inputs.items():
                answer_start_scores, answer_end_scores = self.model(**chunk)[:2]

                answer_start = torch.argmax(answer_start_scores)
                answer_end = torch.argmax(answer_end_scores) + 1

                ans = self.convert_ids_to_string(chunk['input_ids'][0][answer_start:answer_end])
                if ans != '[CLS]':
                    answer += ans + " / "
            return answer
            answer_start_scores, answer_end_scores = self.model(**self.inputs)[:2]

            answer_start = torch.argmax(answer_start_scores)  
            answer_end = torch.argmax(answer_end_scores) + 1  
            return self.convert_ids_to_string(self.inputs['input_ids'][0][

    def convert_ids_to_string(self, input_ids):
        return self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(input_ids))

Now, we are going to pass a list of questions and get the answers to our questions:

questions = [
    'Where is Microsoft Headquarters located?',
    'Who is the President of the United States of America?',
    'How many sides does a hexagon have?'

qas = QASystemWithBERT("deepset/bert-base-cased-squad2")

for question in questions:
    print(f"Question: {question}")
    results =

    page =[0])
    print(f"Top wiki result: {page}")

    text = page.content

    qas.tokenize(question, text)
    print(f"Answer: {qas.get_answer()}")

Well, it answered all three questions correctly (after 10 retries on each question & changing wordings of the question numerous times), have a look:


Checkout the public notebook below:

QA System with BERT
Explore and run machine learning code with Kaggle Notebooks | Using data from no data sources


Checkout the below post in order to better understand BERT and the impact it has had on NLP research.

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
Discussions:Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations: Chinese (Simplified), French, Japanese, Korean, Persian, Russian The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural La…