TL;DR
Veo 3 can be bossed around using JSON.
You can literally lay out your entire scene, character, camera movement, and audio plan down to the second.
Custom GPT that turns any idea into a working JSON prompt for Veo3.
Why I Use JSON with Veo 3
Natural language is fine when you're riffing ideas, but when you're building video content that has to look the same every time, you need structure.
That’s where JSON comes in.
What I like about it:
It forces you to be precise.
It keeps stuff consistent across shots.
You can reuse templates and tweak as needed instead of starting from scratch every time.
Text prompting is nice when you’re starting out, but JSON is what locks things in place when it's time to get serious. JSON can be great if you are into AI filmmaking.
JSON Prompting vs Just Typing Prompts
Here’s how they compare based on my own trial.
Best flow: Write freely to brainstorm → switch to JSON when you’re ready to lock it in.
A Simple JSON Example
Here’s a basic structure I used to test a noir chase scene:
{
"scene": {
"background": "rainy city street",
"style": "neo-noir"
},
"characters": [
{
"name": "Detective",
"appearance": "long coat and hat",
"action": "running",
"expression": "focused"
}
],
"camera": { "angle": "wide", "motion": "tracking" },
"timeline": { "duration": "8s" },
"audio": { "effects": ["rain", "footsteps"] }
}
That gave me: a wide shot of the detective running through a moody street, with rain and footstep sounds synced to the pacing. It looked pretty slick.
Example
Breaking Down the JSON Pieces
To keep things organized (and my brain from short-circuiting), here’s how I think about JSON blocks:
Scene: Location, time, weather, overall look.
Characters: Names, outfits, what they’re doing, and how they look.
Camera: Angle, movement, transitions, per-shot timing.
Timeline: Total length, what happens when.
Audio: Voice, music, SFX, timing for each.
Once you get used to it, it’s like filling out a checklist.
My Go-To Process (7-Step JSON Flow)
Start with the vibe — what’s the point of the scene? (e.g., “cold open chase with tension”)
Scene it up — think: where are we? What's the look?
Add characters — name, outfit, motion, mood.
Camera stuff — angle, moves, maybe a transition.
Timeline beats — map out moments (keep total under 8s for now).
Layer in audio — optional but worth it for polish.
Run a JSON validator — syntax issues can wreck your output.
Veo model options
Veo 3 – Fast (Beta Audio)
Use Case: Rapid prototyping or iterating visual ideas quickly with audio support.
Best For: Quick concept drafts, social media snippets, internal reviews.
Why Choose It: Fast rendering + audio = great for fast turnaround on dialogue or narration scenes.
Veo 2 – Fast (No Audio)
Use Case: Speedy generation where audio is not required.
Best For: Silent visual storytelling, animated GIFs, moodboards.
Why Choose It: Light on compute, fast results for test renders.
Veo 3 – Quality (Beta Audio)
Use Case: High-end visuals with voiceovers or sound design.
Best For: Final deliverables, cinematic trailers, ad spots.
Why Choose It: Combines stunning visuals and audio for top-tier production value.
Veo 2 – Quality (No Audio)
Use Case: Beautiful high-resolution visuals without sound.
Best For: Visual-only films, storyboard animations, B-roll content.
Why Choose It: When audio isn’t needed but you still want excellent visual fidelity.
Quick Templates I Use All the Time
1. Product Ad (Clean and Quick)
{
"scene": { "background": "<studio_or_location>", "style": "clean" },
"characters": [],
"camera": { "angle": "close-up", "motion": "slow_pan" },
"timeline": { "duration": "6s", "sequence": ["reveal", "tagline"] },
"audio": { "music": { "track": "<brand_theme>", "volume": "medium" } }
}
2. Explainer (Voiceover Driven)
{
"scene": { "background": "<contextual_space>", "style": "minimal" },
"characters": [],
"camera": { "angle": "medium", "motion": "static" },
"timeline": {
"duration": "8s",
"markers": { "vo_start": "0.5s", "cutaway": "5s" }
},
"audio": { "voiceover": "narrator" }
}
3. Short Film Vibe (Cinematic Flow)
{
"scene": { "background": "<location>", "time": "dusk", "style": "cinematic" },
"characters": [{ "name": "<protagonist>", "action": "<action>", "expression": "<mood>" }],
"camera": { "angle": "wide", "motion": "tracking", "transition": "cut" },
"timeline": { "duration": "8s", "sequence": ["establish", "beat", "reveal"] },
"lighting": { "temperature": "warm", "shadows": "soft", "intensity": "low" }
}
Keeping Characters Consistent
I ran into this early—my character kept changing from shot to shot. You can fix that by locking identity like this:
{
"characters": [
{
"name": "Detective",
"appearance": "trench coat, hat",
"expression": "serious",
"voice": "deep",
"identity_lock": true
}
]
}
Now your “Detective” stays the same through all clips. Same voice, same look, no surprises.
Quick Tips
Did you set a "background" and "duration"?
Are your key names consistent? (Don't switch between "length" and "duration" for example.)
Keep it short — 8s max, 2 characters tops. Veo3 generates videos under 7 seconds.
Replace vague stuff like "cool lighting" with something you’d actually say to a DP (like: "low intensity, warm shadows").
Leveling Up with JSON Extras
Need more control? Here’s an example of adding timed events:
{
"timeline": {
"duration": "8s",
"events": [{ "time": "3s", "camera": { "angle": "close-up", "motion": "zoom" } }]
}
}
And you can keep reusable blocks for characters, camera setups, lighting, etc. I store them in Notion and copy-paste as needed.
Also, adding a "note" is a sneaky move when you want to keep the tech side structured but toss in a creative nudge:
"note": "make it peaceful and cinematic"
Use Cases I Actually Use This For
Brand openers (short, polished, consistent)
Product shots (reveal → detail → logo)
Narrative shorts (with a turn or mini twist)
Explainers (voice-led with cuts at key times)
Want to Skip JSON Wrangling?
I built a tool that writes the whole thing for you. Describe what you want in plain English, and it spits out:
A storyboard you can read and share.
A working, valid JSON file ready to drop into Veo 3.
Give it a go here:
Here’s a sample prompt I’ve used:
“Create a 6-second cinematic shot of a skateboarder at sunset, wide tracking, upbeat music. Add a close-up at 4s, fade music at 5.5s.”
And boom — storyboard + JSON in under 10 seconds.
Copy + Paste Starter JSON
Use this as your first test scene:
{
"scene": { "background": "sunset skate park", "style": "cinematic" },
"characters": [{ "name": "Skater", "appearance": "hoodie, sneakers", "action": "kickflip", "expression": "confident" }],
"camera": { "angle": "wide", "motion": "tracking", "transition": "cut" },
"timeline": {
"duration": "6s",
"events": [{ "time": "4s", "camera": { "angle": "close-up", "motion": "slow_zoom" } }]
},
"audio": { "music": { "track": "upbeat_rock", "volume": "medium", "fade_out": "5.5s" } }
}
Final Thoughts
If you’re serious about making Veo 3 work like a real production tool, JSON is kind of non-negotiable. It gives you control, repeatability, and keeps everything clean once you're collaborating.
But if you just want to start fast, try the GPT tool. It'll save you a bunch of time, and you can always dig into the JSON once you see how it's structured.
If this helped or sparked questions, reach out — or try making a clip and tag me.