Excellent talk from Karpathy defining the moment in the LLM Training, Applications, Tools/Frameworks and Usecases
LLM Training pipeline - Karpathy explains the significant engineering execution involved in building LLMs. The numbers in the slide below speak for themselves.
![](https://static.wixstatic.com/media/1dc0ca_3556e69f7cb44e4d974a76f8681f15e8~mv2.jpg/v1/fill/w_980,h_522,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/1dc0ca_3556e69f7cb44e4d974a76f8681f15e8~mv2.jpg)
Reward Modeling - It's a small LLM model training in itself involving the creation of a large set of prompts and responses and creating a human rating for each response produced by the LLM.
RLHF - Reinforcement Learning with Human Feedback is the magic which makes a base model into a human like agent with amazing abilities in language tasks and multi-modality. The reward model created earlier will evaluate each response produced by the LLM with each higher-rewarded one going up in the response stream
System 1 Vs System 2 thinking - RLHF trained model is still a raw power. It just imitates what it found during training (System 1 thinking). However, most of the tasks we encounter in the world need planning, reflection, corrections and reformulations (System 2 thinking). How do we achieve this?
The answer lies in prompting - which is instructing the LLM to produce the responses in a certain way. There are simple prompting techniques to more involved in augmented prompting
Chain of Thought - Go through a step by step sequence of steps
Ensemble multiple attempts and choose best response
Ask for reflection - a second round evaluation to prompt the LLM to double check the answer
Thought Action Reaction Models - A pattern of prompting the sequence of human thinking
AutoGPT - LLM itself break it down a prompt into multiple subtasks
Condition for good performance - Encouraging and assigning the LLM explicitly to produce best answer
Retrieval Augmented Generation - Prompts are enhanced with additional documents/context which forces LLM to work within that context and produce a better response
LLM plugins - which will talk to external APIs (weather API) and tools (calculator)
Finally; Finetuning - but this is a highly involved exercise. Merely adding additional training data to specific context has become a lot more accessible with new techniques such as PEFT (Parameter Efficient Fine Tuning)
Karpathy closes with critical limitations and then important recommendations
![](https://static.wixstatic.com/media/1dc0ca_9c3c37d2a3bf4776a4e91c3cf9d76375~mv2.jpeg/v1/fill/w_980,h_505,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/1dc0ca_9c3c37d2a3bf4776a4e91c3cf9d76375~mv2.jpeg)
Comentarios