State of GPT - 2023 by karpathy
- Dennis Kuriakose
- Aug 15, 2024
- 2 min read
Excellent talk from Karpathy defining the moment in the LLM Training, Applications, Tools/Frameworks and Usecases
LLM Training pipeline - Karpathy explains the significant engineering execution involved in building LLMs. The numbers in the slide below speak for themselves.

Reward Modeling - It's a small LLM model training in itself involving the creation of a large set of prompts and responses and creating a human rating for each response produced by the LLM.
RLHF - Reinforcement Learning with Human Feedback is the magic which makes a base model into a human like agent with amazing abilities in language tasks and multi-modality. The reward model created earlier will evaluate each response produced by the LLM with each higher-rewarded one going up in the response stream
System 1 Vs System 2 thinking - RLHF trained model is still a raw power. It just imitates what it found during training (System 1 thinking). However, most of the tasks we encounter in the world need planning, reflection, corrections and reformulations (System 2 thinking). How do we achieve this?
The answer lies in prompting - which is instructing the LLM to produce the responses in a certain way. There are simple prompting techniques to more involved in augmented prompting
Chain of Thought - Go through a step by step sequence of steps
Ensemble multiple attempts and choose best response
Ask for reflection - a second round evaluation to prompt the LLM to double check the answer
Thought Action Reaction Models - A pattern of prompting the sequence of human thinking
AutoGPT - LLM itself break it down a prompt into multiple subtasks
Condition for good performance - Encouraging and assigning the LLM explicitly to produce best answer
Retrieval Augmented Generation - Prompts are enhanced with additional documents/context which forces LLM to work within that context and produce a better response
LLM plugins - which will talk to external APIs (weather API) and tools (calculator)
Finally; Finetuning - but this is a highly involved exercise. Merely adding additional training data to specific context has become a lot more accessible with new techniques such as PEFT (Parameter Efficient Fine Tuning)
Karpathy closes with critical limitations and then important recommendations

Comments