A model architecture defines how a machine learning model is structured—how data flows through it, how different components interact, and how it makes predictions.
There have been quiet a few model architectures. Right now, Transformer architecture dominates the field.
But before Transformer, Seq2Seq (Sequence-to-Sequence) was the big thing.
How does Seq2Seq work?
Seq2Seq is built with two main components:
• Encoder: Processes the input.
• Decoder: Generates the output.
Both work with sequences of tokens and, in the classic approach, use Recurrent Neural Networks (RNNs) or their more powerful versions—LSTMs and GRUs.
The encoder reads the input sequence step by step, updating its hidden state at each time step.
The final hidden state after processing the last input token represents the entire input sequence.
The decoder receives this final hidden state as its initial state and starts generating the output sequence, token by token.
Metaphor used in "AI Engineering" book was: working with final hidden state is like answering questions about a book based on just the summary. The final hidden state tries to capture everything from the input, but some details may get lost.
Since RNNs work sequentially, we must process the entire input before generating even the first output token. The longer the input, the longer we wait before we get anything in return. It doesn’t create the best UX for the chatbots.
After some time, the problem of not looking back at original input tokens (but only at the Final Hidden State) got solved by the Attention Mechanism. But this is a story for another time.
Sources:
AI Engineering by Chip Huyen (O’Reilly). Copyright 2025 Developer Experience Advisory LLC, 978-1-098-16630-4
https://d2l.ai/chapter_recurrent-modern/seq2seq.html
https://arxiv.org/abs/1409.3215v3