At Automaise, we strive for stress-free interactions for your clients by keeping up to date with the latest developments and building better digital assistants.
State-of-the-art language models are currently being trained and evaluated on short conversations with little to no context. Recent improvements continue to disregard the span and character of human dialogues, and models often fail to perform on long open-domain conversations.
How can we tackle the problem? 🤔
Facebook AI Research has recently addressed methods for long-term open-domain conversation in their work “Beyond Goldfish Memory: Long-Term Open-Domain Conversation”¹. Additionally, they collected an English dataset entitled Multi-Session Chat (MSC), consisting of human-crowd worker chats spanning over five sessions, each with up to 14 utterances. Each session also contains annotations regarding essential topics discussed in previous exchanges to fuel the following conversations.
To collect the dataset, they employed crowd workers to play the roles of speakers — provided as sentences describing a persona — and replicated an online chat in which users often pause the conversation only to resume it after some time.
For modeling multi-session chat, the authors decided on a standard large language model (i.e., encoder-decoder Transformer) while also studying two techniques:
- A Retrieval-Augmentation method that uses a retrieval system to find and select which part of the context to include in the encoding.
- A Summarization Memory-Augmentation method that summarizes the knowledge from previous dialogues and only stores that piece of information, thus being more efficient than the latter.
Throughout their experiments, the authors observed an improvement in perplexity (defined as the exponentiated average negative log-likelihood of a sequence) when adding the dialogue history compared to a no-context scenario. They observed an increase in performance when using the session summaries as annotated by crowdworkers, which are potentially more informative than the dialogue history. The gain in performance is even more noticeable when evaluating the opening responses of a session.
In addition to the computed metrics, the authors also performed a human evaluation task using crowdworkers. Two personas are randomly chosen from the validation set, and one is assigned to the crowdworker. The crowdworker then engages in a conversation with the other persona and is asked to evaluate their partner’s replies, whether they refer to information they learned from previous sessions or not. The authors concluded that their models were significantly better at mentioning previous topics, adding new ones to the conversation, and engaging responses.
Overall, the work focuses on different model architectures to help conduct long-term conversations more effectively.
In customer care, the techniques described above allow us to build better conversational agents: better at engaging users, handling complex user responses and ensuring customized care based on previous interactions and requests.