Diverse Responses and Evaluation : Assignment 4

The point of the fourth assignment is to solve some issues from homework 3 and also evaluate! In particular, there are two issues (1) the responses seemed bland and (2) evaluation is difficult for social bots. The first step is to be able to train either a language model or a backwards model, so P(S|T). Then you should implement Maximum Mutual Information from A Diversity-Promoting Objective Function for Neural Conversation Models.

Be creative and have fun.

Re-implement the data filtering described in DialoGPT paper OR Improving Neural Conversational Models with Entropy-Based Data Filtering.
Using the filtered data train a model on OpenSubtitles using P(T|S) and P(S|T) with filters and also when the last character is a quesionmark(“?”).
Train P(T|S) transformer models on the PersonaChat or DialyDialog dataset using either the model from (part 2) or Reddit pre-trained model for P(T|S).
Implement decoding with MMI-bidi objective (this can be beam search reranking or sampling [nucleus or top-k]..
Evaluate your chatbot on the mechanical turk SANDBOX using ParlAI’s model evaluator and [code]((https://github.com/facebookresearch/ParlAI/tree/master/parlai/mturk/tasks/model_evaluator).
Identify issues with your chatbot and suggest possible solutions and future work.
Submit homework on Gradescpoe.

Please remember to include a README. If there are too many files that simply point to your github … sorry about Gradescope.