top of page

Generative AI in 2023 - Headlines and breakthroughs - Matt Wolfe

Writer's picture: Dennis KuriakoseDennis Kuriakose

The key developments at Open API, Stable Diffusion, Runway, Invideo, Google, Microsoft, Meta, Adobe etc which were making bee lines. What is striking are the number of ideas which got tested - Chat Plugins, Auto GPT, Mixtrals etc. Next obvious observation is how far each product lines from AI leading players improved in just over 9-10 months.


This years saw;


  • Introduction of multi modality at Open API and Google products

  • We have significant improvements in the creative video generation abilities and large scale creativity and productivity improvements

  • Images are obviously part of the game, but they were already proven end of year even before ChatGPT made the headlines



January

  • Microsoft invested into OpenAI $10Billion for a stake in its growth and technology preferential treatment

  • ElevenLabs introduced a new product in text to speech with significant improvement over previous versions

  • Instruct pix2pix - introduced image editing with prompt in addition to generate images from prompts

February

  • Bard was introduced, but still much behind GPT4. In the same month Bing introduced GPT4 inside its search

  • Controlnet - which provided ability to create images from scribbles, create different poses from the image of a person or object and pictures as the basis of animation equivalents


March

  • GPT4 introduced, with ability to create a website from a scribble note instructions written in a note pad as the prompt, nothing world had seen till that point.

  • Midjourney v5 was introduced which created photorealistic images and 5 fingered hands! (from animation like images)

  • Adobe Firefly AI art images from Adobe platform. Coming packaged with Adobe's existing platform this was a powerful addition to their toolset

  • ChatGPT plugins were introduced. Plugins are tools designed for LLMs to access up-to-date information, run computation or use 3rd party services - Expedia, OpenTable etc

  • Runway's Gen1 - upload an existing video to create a sequence of action to be set in a completely different context (man running on the street can transposed to neanderthal man running on artic ice).

  • Invideo introduced capability to create a video from text prompt including a mobile app


April

  • Meta introduced ability to segment images and even change position of the objects in an image in a different position

  • Wonder Dynamics: Replace actors with a computer generated character

  • AutoGPT which can create it, baby AGI. Provide an end goal and it keeps on generating its own prompts


May

  • Geoffrey Hinton left Google to help humanity battle dangers of AI in misinformation (who is considered to be a pioneer in AI)

  • Google announced put AI in all google products

  • Senate hearing to regulate AI which is attended by Sam Altman amongst others

  • Open AI introduced text 3D image generative models

  • Adobe photoshop added generative AI tools inside it - change the image with prompts inside


June

  • Applevision Pro was introduced

  • Runway Gen 2 - text prompt or image prompt creates a video with a better quality


July

  • ChatGPT introduced Code interpreter, upload pdf files and have it read it and upload excel sheet to analyse data inside

  • Anthropic's 100K token model - add a 75K word document and it summarises it for you. ChatGPT cant handle that many tokes as input limitinh text summarisation capabilities

  • Llama 2 came onboard with its open source model. Ramp up of open source models

  • ChatGPT can now be pre-trained with information before we start prompting with this model. This provides additional context

  • Stable Diffusion SDXL 1.0 was introduced which had more flexibility to use your own images to give styles to the images it generated


August

  • Midjourney introduced Vary Region - it can tailor the image section with prompts


September

  • ChatGPT introduced multi modality - introduced ability to provide audio input and then get back audio output back. Also images as part of the prompt and prompt on the part of the image to ask specific questions


October

  • Dalley 3 from Open API - inside ChatGPT, it became contextual

  • Introduction of text to vector inside Adobe inside Illustrator

November

  • Elon Musk announced Grok using all available data in Twitter with real time data from Twitter

  • OpenAPI had developer days and introduced GPTs. GPTs are these customer bots you can train inside ChatGPT tailored to specific roles - 'Tech Advisor', 'Creative writing coach' etc. Assistance added to it which is the equivalent version of GPTs but with APIs

  • Latent Consistency model which allows matching actions from one image to be modifying the behaviours on another image

  • Sam Altman & Greg Brockman saga and the how it all played out


December

  • Gemini came along. Gemini inside Bard

  • Mixtral of experts - new way of doing LLMs. Your question will be splitting the question to more specialised models






0 views0 comments

Recent Posts

See All

Comments


Follow

  • X
  • LinkedIn

©2024 Collationist.

bottom of page