text to video—
frames from imagination
nothing is real
earlier this year i wrote about five artificial intelligence megatrends to watch. one of those was text-to-video, and this week, openai broke the internet by releasing sora.
the release of the sora model went viral, with openai ceo sam altman sourcing text prompts on X, and posting replies with realistic-looking videos. competitive models from pika and runwayml are now immediately extinct.
the model generates videos up to a minute long with adherence to the user’s prompt, and wow, these videos look realistic.
tim brooks and bill peebles are the primary authors of the research paper, along with more than 10 others. “we take inspiration from large language models which acquire generalist capabilities by training on internet-scale data,” they say in the paper. “given a compressed input video, we extract a sequence of spacetime patches which act as transformer tokens.”
the model is trained on videos at their native aspect ratios, which the authors say improves composition. it can animate images, extend generated videos, enable video-to-video editing, stitch videos together, create images, and even simulate digital worlds with minecraft capabilities.
basically, it’s lightyears ahead of any other text-to-video model.
“these capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them,” the authors say.
so what does this all mean for the world?
one change i’ve noticed after writing this substack for more than a year, starting before chatgpt was released, is that i’m increasingly feeling as though none of my thoughts are intelligent enough to be worthwhile for writing. i’ve begun to trust chatgpt as an extension and enhancer of my own intelligence. i’m not a scientist or a scholar, and so for me, most of what chatgpt creates is frankly better and faster than me. even my opinions don’t feel as meaningful.
moreover, i feel less creative. generative models are often far more creatively skilled than me, and my creativity is limited by what i’ve experienced.
and now video is in the mix. not only will i trust openai and models like it for intelligence. i will trust it more for creative output and learning.
i’m buckling my seatbelt and hanging on for my life, with the expectation that this all will accelerate fast in the months ahead. a model that can produce a minute-long video will likely produce a 2-hour movie with the right gpus, by the end of the year, is my guess.
i’m preparing to sit down for a netflix show limited only by my imagination. this is a world where we’ll all write the reality around us. i’ll re-write the ending to seinfeld and game of thrones. shit, i’ll write new shows, with new characters from my own life.
openai and companies like it will become the new netflix, the new cnn, the new youtube.
meanwhile, human imagination will become even more priceless. it will be our only modicum of control, and it is what will save us while the world is burning around us from a lack of trust in anything.
then again, at least we’ll be able to reshape our realities with sora videos of bunnies and rainbows.
damn, it feels good to be a monkey.