Generative Pre-Trained Transformer (GPT)

Generative Pre-Trained Transformer (GPT)#

GPT is a type of Language Model

Let’s break this down piece-by-piece in reverse order:

Language Models
Large
Transformer
Pre-Trained
Generative

A Language Model (LM) is a model that assigns probabilities to sequences of words. Probabilities are estimated (the models are trained) using collections of naturally ocurring text, called a corpus. You can use a LM to select the most probable next word given a context of a preceding series of words.

Animated Transfomer

_images/lm-hist.png — Fig. 1 Evolution of language models over time#

Large

We call a language model large when:

The model has billions of parameters
The model has been trained on billions of words/tokens

Why the focus on large language models (LLMs)? Because as scale increases new emergent capabilities have appeared, such as complex reasoning. Smaller language models gave no indication that this would happen.

_images/wikipedia-list.png — Fig. 2 LLM List#

Transformer

Transformers were the key innovation that allowed language models to get large. They are a deep learning architecture that allow massive parallelization of training and inference on GPUs

_images/attention-is-all-you-need.png — Fig. 3 Attention Paper Annotated Version#

Pre-trained

Pre-trained language models have been trained via self-supervision on vast quantities of text. These are also called foundation models. They are not typically useful until…

Generative

Generative models are foundation models that have been further trained via supervised fine-tuning and reinforcement learning from human feedback (RLHF) to behave in a useful and safe manner, for example by responding to questions with answers like a chat assistant.

OpenAI:

OpenAI is an American artificial intelligence (AI) organization consisting of the non-profit OpenAI, Inc.[4] registered in Delaware and its for-profit subsidiary corporation OpenAI Global, LLC.[5] OpenAI researches artificial intelligence with the declared intention of developing “safe and beneficial” artificial general intelligence, which it defines as “highly autonomous systems that outperform humans at most economically valuable work”.[6]

OpenAI was founded in 2015 by Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Jessica Livingston, John Schulman, Pamela Vagata, and Wojciech Zaremba, with Sam Altman and Elon Musk serving as the initial board members.Microsoft provided OpenAI Global LLC with a $1 billion investment in 2019 and a $10 billion investment in 2023.

_images/model-zoo.png — Fig. 4 Model Zoo#

_images/openai-llm-history.png — Fig. 5 GPT Generations#