I imagined it would require technical skill-like some sort of advanced prompt engineering where I'd need to specify exactly how each file interacted with every other file. I thought I'd need to understand the "rules" of combining images with audio, or know the exact syntax for referencing multiple inputs. The reality was much simpler. Multi-modal input just means you can throw different types of files at Seedance 2.0 and tell the model
In a new viral AI video, Brad Pitt and Tom Cruise pummel each other on a rooftop in a cinematic action sequence. It's not a trailer for a new blockbuster, and it's not actually Pitt and Cruise, though it looks a lot like them. The video is so realistic, in fact, that the clearest sign it's made with AI is the dialogue.