How do we build the conversational chat bot using Neural networks and Deep learning?


Does it always take human interaction to build a conversation? Or maybe just human-like thinking than an actual human?

Neural networks are famous for mimicking the way a human think and we need a deep dive into neural networks for building a meaningful chat bot. Chat bots can be classified on their purpose as General conversation and Goal oriented. Goal oriented bots are meant to either respond to a query or information exchange, which makes businesses to be interested in them. Identification of your bot function is important to design your architecture and select your “neural network model”.

General bots need generative models which are to be crude, trains on a large data set of conversations (user input and reply) and once trained predicts and generates reply based on input. RNN is the core of these models. However, goal-oriented bots are Selective neural conversational model or also called deep semantic similarity model. Instead of estimating the probability and “generating” an output, selective models learn similarity or context and a reply is from a predefined pool of possible answers.

Lisa from is meant to be a goal-oriented bot, an conversational bot. We built a large corpus of conversational data with several possible user inputs, a large data set of responses and samples of conversation to train the neural networks on. The sample conversations have a sequence of labelled user inputs and a legit reply to be given.

Making a chat bot requires Natural language processing to understand user input first. Replying and maintaining the conversation is a different game. Understanding the user input requires:

1. Classification of the intent of input

2. NLP techniques of Vectorisation (TF-IDF and TensorFlow word embedding), Tokenisation and Lemmatisation.

NLP techniques are at the core of finding the semantic similarity of the user input with the intents we have or what we expect.

This gives us the context. Word embedding involve projection of a word in vocabulary, similar words will be projected closer to each other in the dimensional space. then the input is taken and embedded into semantic vector space and then finds similarity between context and a reply vector with cosine similarity.

With the context, identification of specific entities or just the important parts of conversation require chunking and named entity recognition. Constitutional neural network CNN is one of the best text classification algorithms to work on entity recognition and intent extraction.

CNN vs RNN is a very common FAQ. CNN is better with spatial data or data regardless of a state and RNN is better for sequence data or data where the state is dynamic and needs to be remembered. RNNs have “internal memory” because of nodes from a directed graph along a sequence. RNNs excel in sequential interpretation and predicting the next state of the system. But they fall dull with classification or analysis making CNN better for identifying user intent.

Once the user input is classified, the machine needs to predict/decide the best response for the context. This is called the state of a conversation which must be remembered throughout the conversation. The core of this conversation engine is based on LSTM, a type of RNN. RNN is a class of neural networks which can remember the state of the conversation. LSTM selectively remembers patterns for long durations of time, maintaining the context of conversations and predicting the next action. Supervised learning with the corpus sample conversations on LSTM model is implemented on Keras along with previously mentioned intent classification. The conversation is always based on the history of conversation which decides the state of the conversation. LSTMs are more capable of remembering the semantic details for a longer time of context in conversation. — Lisa from transforms the way companies discover, qualify and recruit top candidates. Using proprietary artificial intelligence algorithms, it helps companies to pick best resources in less than 90% time taken today by doing audio/video interviews as done by SME’s today.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>