What are Query, Key, and Value in the Transformer Architecture and Why Are They Used? | by Ebrahim Pichka | Oct, 2023

An analysis of the intuition behind the notion of Key, Query, and Value in and why is it used.

Ebrahim Pichka
Towards Data Science
by author — generated by Midjourney

Recent years have seen the Transformer architecture make waves in the field of (NLP), achieving state-of-the-art results in a variety of tasks including machine translation, language modeling, and text summarization, as well as other domains of AI i.e. Vision, Speech, RL, etc.

Vaswani et al. (2017), first introduced the transformer in their paper “Attention Is All You Need”, in which they used the self-attention mechanism without incorporating recurrent connections while the can focus selectively on specific portions of input sequences.

The Transformer model architecture — Image from the Vaswani et al. (2017) paper (Source: arXiv:1706.03762v7)

In particular, previous sequence models, such as recurrent encoder- models, were limited in their ability to capture long-term dependencies and parallel computations. In fact, right before the paper came out in 2017, state-of-the-art performance in most NLP tasks was obtained by using RNNs with an attention mechanism on top, so attention kind of existed before transformers. By the multi-head attention mechanism on its own, and dropping the RNN part, the transformer architecture resolves these issues by allowing multiple independent attention mechanisms.

In this post, we will go over one of the details of this architecture, namely the Query, Key, and Values, and try to make sense of the intuition used behind this part.

Note that this post assumes you are already familiar with some basic concepts in NLP and deep learning such as , Linear (dense) layers, and in general how a simple neural network works.

First, let’s start understanding what the attention mechanism is trying to achieve. And for the sake of simplicity, let’s start with a simple case of sequential data to understand what problem exactly…

Source link