top of page

State of GPT - 2023 by karpathy

Writer's picture: Dennis KuriakoseDennis Kuriakose

Excellent talk from Karpathy defining the moment in the LLM Training, Applications, Tools/Frameworks and Usecases



LLM Training pipeline - Karpathy explains the significant engineering execution involved in building LLMs. The numbers in the slide below speak for themselves.

Reward Modeling - It's a small LLM model training in itself involving the creation of a large set of prompts and responses and creating a human rating for each response produced by the LLM.


RLHF - Reinforcement Learning with Human Feedback is the magic which makes a base model into a human like agent with amazing abilities in language tasks and multi-modality. The reward model created earlier will evaluate each response produced by the LLM with each higher-rewarded one going up in the response stream


System 1 Vs System 2 thinking - RLHF trained model is still a raw power. It just imitates what it found during training (System 1 thinking). However, most of the tasks we encounter in the world need planning, reflection, corrections and reformulations (System 2 thinking). How do we achieve this?


The answer lies in prompting - which is instructing the LLM to produce the responses in a certain way. There are simple prompting techniques to more involved in augmented prompting


  1. Chain of Thought - Go through a step by step sequence of steps

  2. Ensemble multiple attempts and choose best response

  3. Ask for reflection - a second round evaluation to prompt the LLM to double check the answer

  4. Thought Action Reaction Models - A pattern of prompting the sequence of human thinking

  5. AutoGPT - LLM itself break it down a prompt into multiple subtasks

  6. Condition for good performance - Encouraging and assigning the LLM explicitly to produce best answer

  7. Retrieval Augmented Generation - Prompts are enhanced with additional documents/context which forces LLM to work within that context and produce a better response

  8. LLM plugins - which will talk to external APIs (weather API) and tools (calculator)

  9. Finally; Finetuning - but this is a highly involved exercise. Merely adding additional training data to specific context has become a lot more accessible with new techniques such as PEFT (Parameter Efficient Fine Tuning)


Karpathy closes with critical limitations and then important recommendations



3 views0 comments

Recent Posts

See All

Comentarios


Follow

  • X
  • LinkedIn

©2024 Collationist.

bottom of page