Introduction to Generative AI
A new step towards creativity
Last updated
A new step towards creativity
Last updated
A powerful new class of foundation AI models is making it possible for machines to write, code, draw, animate, compose, and create with credible and sometimes superhuman results.
AI has been lately, and still is, very good at analyzing things, even better than humans. AI models are unbeatable analyzing huge sets of data and find patterns in them for a multitude of use cases:
Classifying or predicting values from input data of all kind
Structured/tabular data
Non-structure data (text, signals etc.)
Images, video, sound, perceptual signals
This could be classified as Analytical AI. But humans are also good at creating, from literature, to art, music, etc.
Up until recently, machines had no chance of competing with humans at creative work—they were relegated to analysis and rote cognitive labor. But machines are just starting to get good at creating sensical and beautiful things. This new category is called “Generative AI,” meaning the machine is generating something new rather than analyzing something that already exists.
Every industry that requires humans to create original work—from social media to gaming, advertising to architecture, coding to graphic design, product design to law, marketing to sales—is up for reinvention.
Certain functions may be completely replaced by generative AI, while others are more likely to thrive from a tight iterative creative cycle between human and machine—but generative AI should unlock better, faster and cheaper creation across a wide range of end markets.
Neural networks are not new at all. Its foundations date back to the late 40s, and many of its challenges (training) were gradually resolved.
But we can date a first turning point in 2012, when the AlexNet network, trained using GPUs, managed to far outperform other approaches in the complicated problem of classifying images.
The use of these networks to move from analytical use in classification tasks to creative tasks began early with architectures such as autoencoders, and the possibility of generating new images and content by injecting random numbers into their latent space.
Another turning point is the GANs, by Ian Goodfellow, in which two networks, after a process of competition between the two, one creating new content from random numbers, the other determining the adequacy of the generated content and punishing (retraining) the first, make possible to generate new images (styles, and other types of content) based on totally random information or assisted by a creator.
The appearance of the transformer architecture by Google in 2017, made the generation of text and its creativity reach new heights. This technique was soon applied to other frames beyond the text.
The combination of techniques, such as the transformer and broadcast models, have led to a real game changer in 2022, with the development of powerful foundational models of content generation through AI in almost any field: image, video, animation, text, music. , sound etc
OpenAI created and introduced GPT-3 (Generative Pre-trained Transformer 3) in 2020. This LLM leverages deep learning to generate text, code, images, etc.
DALL-E, an AI image generator by Open AI is a “neural network that creates images from text captions for a wide range of concepts expressible in natural language. DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.”
Microsoft invested $1B in Open AI and their new Microsoft Designer uses AI-generated images by DALL-E 2.
GitHub Copilot, essentially autocomplete for coders is another application of this technology. GitHub Copilot uses the “OpenAI Codex to suggest code and entire functions in real-time, right from your editor.
Stable Diffusion is a “latent text-to-image diffusion model capable of generating photo-realistic images given any text input, cultivates autonomous freedom to produce incredible imagery, empowers billions of people to create stunning art within seconds.”
Stability AI just raised $101M funding round and has over 5,000 A100 graphic processors, making it one of the biggest supercomputers in the world, trained with 2B images. Stability AI is open source and is made up of a developer community “with over 200,00 members who are building AI for the future.”
“DreamStudio by Stability AI is a new AI system powered by Stable Diffusion that can create realistic images, art and animation from a description in natural language.”
Other tech giants like Google and Meta have their own generative AI models:
Google has Parti (Pathways Autoregressive Text-to-Image model), “an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.”
Meta has ‘Make-a-scene’, that not only takes text prompts but also sketches to create high-definition visual masterpieces on a digital canvas.
I am excited about the future of content that can be personalized and contextualized. I believe that this technology will not replace but rather enhance human creativity and am excited to see many different and unique use cases of generative AI.
Here are a couple of projects that I’ve come across that are using this technology:
Lex , a word processor with artificial intelligence baked in, so you can write faster.
Chula, an AI assistant that creates graphics for presentations.
Metaphor, a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3.
Runway, next-generation content creation with artificial intelligence.
For more, MagicTools has a collection of 200+ AI tools that you can explore!
https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/