The key developments at Open API, Stable Diffusion, Runway, Invideo, Google, Microsoft, Meta, Adobe etc which were making bee lines. What is striking are the number of ideas which got tested - Chat Plugins, Auto GPT, Mixtrals etc. Next obvious observation is how far each product lines from AI leading players improved in just over 9-10 months.
This years saw;
Introduction of multi modality at Open API and Google products
We have significant improvements in the creative video generation abilities and large scale creativity and productivity improvements
Images are obviously part of the game, but they were already proven end of year even before ChatGPT made the headlines
January
Microsoft invested into OpenAI $10Billion for a stake in its growth and technology preferential treatment
ElevenLabs introduced a new product in text to speech with significant improvement over previous versions
Instruct pix2pix - introduced image editing with prompt in addition to generate images from prompts
February
Bard was introduced, but still much behind GPT4. In the same month Bing introduced GPT4 inside its search
Controlnet - which provided ability to create images from scribbles, create different poses from the image of a person or object and pictures as the basis of animation equivalents
March
GPT4 introduced, with ability to create a website from a scribble note instructions written in a note pad as the prompt, nothing world had seen till that point.
Midjourney v5 was introduced which created photorealistic images and 5 fingered hands! (from animation like images)
Adobe Firefly AI art images from Adobe platform. Coming packaged with Adobe's existing platform this was a powerful addition to their toolset
ChatGPT plugins were introduced. Plugins are tools designed for LLMs to access up-to-date information, run computation or use 3rd party services - Expedia, OpenTable etc
Runway's Gen1 - upload an existing video to create a sequence of action to be set in a completely different context (man running on the street can transposed to neanderthal man running on artic ice).
Invideo introduced capability to create a video from text prompt including a mobile app
April
Meta introduced ability to segment images and even change position of the objects in an image in a different position
Wonder Dynamics: Replace actors with a computer generated character
AutoGPT which can create it, baby AGI. Provide an end goal and it keeps on generating its own prompts
May
Geoffrey Hinton left Google to help humanity battle dangers of AI in misinformation (who is considered to be a pioneer in AI)
Google announced put AI in all google products
Senate hearing to regulate AI which is attended by Sam Altman amongst others
Open AI introduced text 3D image generative models
Adobe photoshop added generative AI tools inside it - change the image with prompts inside
June
Applevision Pro was introduced
Runway Gen 2 - text prompt or image prompt creates a video with a better quality
July
ChatGPT introduced Code interpreter, upload pdf files and have it read it and upload excel sheet to analyse data inside
Anthropic's 100K token model - add a 75K word document and it summarises it for you. ChatGPT cant handle that many tokes as input limitinh text summarisation capabilities
Llama 2 came onboard with its open source model. Ramp up of open source models
ChatGPT can now be pre-trained with information before we start prompting with this model. This provides additional context
Stable Diffusion SDXL 1.0 was introduced which had more flexibility to use your own images to give styles to the images it generated
August
Midjourney introduced Vary Region - it can tailor the image section with prompts
September
ChatGPT introduced multi modality - introduced ability to provide audio input and then get back audio output back. Also images as part of the prompt and prompt on the part of the image to ask specific questions
October
Dalley 3 from Open API - inside ChatGPT, it became contextual
Introduction of text to vector inside Adobe inside Illustrator
November
Elon Musk announced Grok using all available data in Twitter with real time data from Twitter
OpenAPI had developer days and introduced GPTs. GPTs are these customer bots you can train inside ChatGPT tailored to specific roles - 'Tech Advisor', 'Creative writing coach' etc. Assistance added to it which is the equivalent version of GPTs but with APIs
Latent Consistency model which allows matching actions from one image to be modifying the behaviours on another image
Sam Altman & Greg Brockman saga and the how it all played out
December
Gemini came along. Gemini inside Bard
Mixtral of experts - new way of doing LLMs. Your question will be splitting the question to more specialised models
Comments