How to Build a Natural Language Processing Question Answering System

In this article we are going to create a Natural Language Processing question answering system using Tensorflow.js and JavaScript using a pre-trained BERT model, that is tuned on SQuAD 2.0 dataset.

Tensorflow: “BERT, or Bidirectional Encoder Representations from Transformers, is a method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing tasks.

This app uses a compressed version of BERT, MobileBERT, that runs 4x faster and has a 4x smaller model size.

SQuAD, or Stanford Question Answering Dataset, is a reading comprehension dataset consisting of articles from Wikipedia and a set of question-answer pairs for each article.

The model takes a passage and a question as input then returns a segment of the passage that most likely answers the question. It requires semi-complex pre-processing including tokenization and post-processing steps that are described in the BERT paper and implemented in the sample app.”

Tensorflow.js grow its popularity in recent times, due to enabling you to do Deep Learning in the browser. Paired up with Node.js it paves Deep Learning’s way into modern Web Applications development, Desktop Application development, etc.

If you are new, to Tensoflow.js, please check out our article about The Future of Artificial Intelligence: Deep Learning with JavaScript, Node.js, and TensorFlow.

Building a Natural Language Processing Question Answering System

The first thing you do is create a Node.js project. If you don’t know how to do, check our previous article here, where we explain it thoroughly.

Next thing is to create a .html file, we’ve named it questionanswering.html, and a .js file, we’ve named it nlpqna.js.

After you’ve done this, add the following code in your .html file.

Copy to Clipboard

Next, we add the following code into our .js file. This code loads the model.

After we’ve loaded the model, the next move is to try to ask question and see the answer we get.

To do that we need to use the findAnswers(question: string, passage: string) function, that takes two arguments, the first one is the question which as the name suggest is the question and the second is the passage which is the content from where we extract our answers.

To do that, add the following lines, to your .js file.

The last line in the code below just adds an event listener that will fire up our function answer_questions(), every time we make a GET request to our page.

Copy to Clipboard

async function answer_questions() {
    let model = await qna.load();
    // or you can specify the model url.
    //config = {modelUrl: 'https://yourown-server/qna/model.json'};
    //customModel = await qna.load(config);
    const passage = "Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware. It is considered one of the Big Four technology companies, alongside Amazon, Apple, and Facebook. Google was founded in September 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14 percent of its shares and control 56 percent of the stockholder voting power through supervoting stock. They incorporated Google as a California privately held company on September 4, 1998, in California. Google was then reincorporated in Delaware on October 22, 2002. An initial public offering (IPO) took place on August 19, 2004, and Google moved to its headquarters in Mountain View, California, nicknamed the Googleplex. In August 2015, Google announced plans to reorganize its various interests as a conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and will continue to be the umbrella company for Alphabet's Internet interests. Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the CEO of Alphabet."
    const question = "Who is the CEO of Google?"
    const answers = await model.findAnswers(question, passage);
    console.log(answers);
}

document.addEventListener('DOMContentLoaded', answer_questions);

Now, to test our work, we need to open the .html file in the browser, and open our console (inspect element) and see what we get back.

It might get some time (like 1 minute or less. You can print some messages using console.log() to be sure that your code works properly), but here are the answers:

Natural Language Processing: The model’s output

Image 1: The model’s output

In Image 1 you can see the output of the model. The output consists of JavaScript objects (JSON), where each of them has 4 attributes: the text is of string type and represents the answer body, and the score is a number, indicates the confidence level. The startIndex is the index of the starting character of the answer in the passage. The endIndex is the index of the last character of the answer.

Become a Creator: Learn More Here

Conclusion

So, here is how you can build a Natural Language Processing (NLP) question answering system in less than 20 lines of code using Tensorflow.js, based on Stanford University’s dataset. If you think it’s easy and you want to give it a shot you can find the project on our Github repository.

The inspiration for this post can be found on TensorFlow’s Github page. You will notice that we have a different approach to the solution since working with Node.js it’s a bit tricky and the given solution was not quite working and can be confusing for beginners.

Check our older articles below, you might find them helpful.

Like with every post we do, we encourage you to continue learning, trying and creating.

Facebook Comments