How to Create Professional AI Videos with Google Gemini’s Cinematic Templates

You’ve probably typed a long, hopeful paragraph into an AI video tool — and gotten back something that looked nothing like what you imagined. That guessing game is over.

Google just rolled out cinematic templates inside Gemini, powered by the Veo 3.1 model. Instead of starting from a blank prompt, you now pick a visual style, drop in your own photo, and direct the action. It’s the difference between staring at a blank canvas and working from a professional storyboard.

This tutorial walks you through every step — from your first template to a polished, brand-ready video clip. Follow along and you’ll have a finished piece in minutes, not hours.

What You’ll Need Before You Start

A few quick prerequisites. You’ll need a Google account with an active AI subscription — Plus, Pro, or Ultra. The free tier doesn’t include video generation in Gemini. Your daily video limit depends on your plan: two clips per day on Plus, three on Pro, and five on Ultra.

You’ll also want a high-quality reference image ready. This is the photo that tells Veo 3.1 what your video should look like. Think of it as your visual brief. The better your image, the more consistent your output.

Templates launched on February 23, 2026, and are available on both gemini.google.com and the Gemini mobile app. If you don’t see the video option yet, make sure your app is updated.

Step 1: Open the Video Creator and Pick a Template

Head to gemini.google.com or open the Gemini app on your phone. Tap the Tools menu, then select “Create videos.”

You’ll see a gallery of cinematic templates — each one a pre-built visual rhythm designed for a specific mood or use case. The current lineup includes styles like Glam, Cyberpunk, Cosmos, Action Hero, Stardust, Jellytoon, ASMR Apple, Red Carpet, and more.

Each template is a creative scaffold. It handles the structure — the pacing, transitions, and visual tone. Your job is to fill it with your own subject matter and direction.

How to choose: Match the template to your goal. Launching a product? Go with Glam or Red Carpet. Building a character-driven story? Try Stardust or Cosmos. Creating something playful for social media? Jellytoon or ASMR Apple will give you that energy.

Step 2: Upload Your Reference Image

This step is more important than most people realize. Your reference photo isn’t a nice-to-have — it’s the anchor of everything.

Veo 3.1 uses your image to lock in the visual identity of the entire clip. That means color palette, lighting behavior, texture, clothing, art style — all of it flows from this single photo. If your reference is messy or stylistically confused, your video will reflect that.

What makes a strong reference image:

  • Clear, high-resolution, well-lit
  • Consistent art style (don’t mix photorealism with cartoon elements)
  • The subject is clearly visible — no cluttered backgrounds fighting for attention
  • The mood of the photo matches the mood you want in the video

Think of it this way: the image is your Art Director. It defines the visual language. The text prompt you write next is your Choreographer — it directs movement and timing. The template is your Producer — it sets structure and rhythm.

Veo 3.1 now supports up to three reference images per generation. This lets you guide characters, objects, and style elements separately for more complex scenes.

Step 3: Write a Motion-First Prompt (Stop Describing, Start Directing)

Here’s where most people go wrong. They upload a beautiful reference photo — and then describe the photo all over again in the prompt. That’s wasted space.

Veo 3.1 already “sees” your image. It knows the colors, the hair, the clothing, the lighting. You don’t need to repeat any of that.

Instead, use your text space exclusively to describe how things move.

❌ Weak prompt: “A girl with pink hair in pastel colors walking toward the camera.”

✅ Strong prompt: “The character walks toward the camera with a joyful bounce, her hair fluttering in a light breeze.”

See the difference? The strong version directs energy, rhythm, and emotion — not static attributes the AI already knows.

Your motion prompt checklist:

  • Body language: glances, gestures, posture shifts
  • Emotional pacing: hesitant, confident, playful, dramatic
  • Environmental interaction: fabric swaying, dust lifting, light flickering
  • Micro-movements: slow blink, subtle smile, head tilt
  • Momentum cues: sudden turn, accelerating step, soft pause

Advanced technique: Layer movement with intent. Don’t just say “she turns.” Say why she turns. Example: “The character pauses mid-step, glances back over her shoulder with a knowing smile, then continues forward as the wind catches her ribbon.” That single sentence gives Veo direction, emotion, and physical detail to work with.

Step 4: Add Cinematic Camera Directions

Static shots are fine. But if you want your AI video to feel like it was produced by a professional crew, you need to direct the camera — not just the character.

Veo 3.1 understands professional cinematography terms. Drop them directly into your prompt and the model will respond. Here are eight camera movements you can use right now:

  1. Dolly In/Out — Moves the camera closer or further. Perfect for emotional beats. A slow dolly-in builds intimacy; a dolly-out creates distance or revelation.
  2. Whip Pan — A fast, blurry horizontal turn. Great for high-energy transitions between subjects or moments.
  3. Drone Shot — A high-angle, sweeping view. Use this to establish a setting or show scale.
  4. Crane Shot — A vertical rise or descent. Ideal for dramatic reveals and emotional endings.
  5. Steadicam Shot — A smooth, floating follow. Creates immersive, natural tracking behind or beside your subject.
  6. Push-In — A slow forward move toward the subject. Builds tension or draws the viewer into a moment.
  7. Over-the-Shoulder (OTS) — Frames one character from behind another. Adds depth and perspective in dialogue or reveal scenes.
  8. Orbit Shot — A 360° circular movement around the subject. Powerful for introductions, transformations, or hero moments.

Pro tip: Match your camera movement to your audio energy. A slow dolly-in pairs beautifully with soft ambient music. A whip pan works better with sharp percussive beats or a whoosh transition sound.

Step 5: Direct the Soundscape and Dialogue

Veo 3.1 doesn’t just generate visuals — it generates synchronized audio. You can shape what the viewer hears by adding audio cues and dialogue instructions at the end of your prompt.

Think in three layers when designing your soundscape:

  • Ambient environment: birds, city hum, wind, café chatter, rain
  • Musical tone: lo-fi beats, orchestral swell, upbeat synth-pop, soft piano
  • Accent sounds: camera shutter, bell chime, footsteps, fabric rustle

How to format audio direction:

Add a line at the end of your prompt like this:

Audio: soft lo-fi beats, distant birds chirping, subtle wind ambience, and a gentle shop bell ‘ding’.

For dialogue, keep it simple. Use a colon format:

The character says: “Welcome to my shop!”

Bonus — Lyria 3 music direction: Google also launched Lyria 3, a full music generation engine, inside Gemini in February 2026. You can specify genre, tempo, instrumentation, and vocal style to create custom 30-second soundtracks. While Lyria 3 runs as a separate tool (under “Create music” in the Tools menu), you can generate a track and pair it with your video for a fully custom result. It supports eight languages and is available to all users 18 and older.

Step 6: Set Intentional Lighting and Pacing

If you write “daytime” in your prompt, you’ll get a technically correct but emotionally flat result. Lighting isn’t about time of day. It’s about mood.

Veo 3.1 responds extremely well to precise, mood-driven lighting instructions. Here are eight lighting styles you can use, each tied to a specific emotional effect:

  • Golden Hour Rim Light: Warm backlight outlining the subject. Use for romance, nostalgia, or beauty content.
  • Soft Overcast Diffusion: Flat, shadowless light. Perfect for calm, slice-of-life scenes.
  • Volumetric Fog with Light Rays: Dramatic beams cutting through mist. Use for mystery or epic reveals.
  • Neon Edge Lighting: Cyberpunk glow separating the subject from a dark background. High-energy, futuristic.
  • High-Contrast Noir Lighting: Sharp shadows. Tension, suspense, thriller vibes.
  • Dreamy Lens Flare + Bloom: Ethereal, pastel glow. Fantasy, fairy tale, or soft beauty aesthetics.
  • Cool Blue Moonlight: Cinematic night tone with subtle highlights. Moody, contemplative.
  • Studio Spotlight with Falloff: Controlled, premium look. Ideal for product reveals and hero shots.

Add your lighting direction after your motion prompt: “Lighting: golden hour rim light with dreamy lens flare and soft bloom highlights.”

For pacing, tell Veo exactly what you want. “Pacing: slow-motion elegance with a gentle pause before the product reveal” gives the model a clear rhythm to follow. Without pacing cues, you’ll get default timing — functional, but not cinematic.

Step 7: Put It All Together — A Complete Prompt Example

Let’s assemble everything into a single real-world example. Imagine you’re a beauty creator launching a pastel luxury fragrance brand on TikTok.

Template: Social Media Reveal (or Glam)

Reference image: A high-resolution beauty portrait — soft pastel florals, romantic lace dress, dreamy morning light, and a crystal perfume bottle in the foreground.

The prompt:

“The character rests her chin gently on her folded arms. A subtle whip pan transitions to a close-up of her holding the crystal perfume bottle, opening the bottle and spraying as petals softly fall around her. The camera performs a slow dolly-in as she lifts her gaze toward the viewer with a soft, confident smile. Lighting: golden hour rim light with dreamy lens flare and soft bloom highlights. Pacing: slow-motion elegance with a gentle pause before the product reveal. Audio: airy synth-pop melody layered with a soft glitter accent.”

Why this works:

  • The reference image locks in the romantic luxury identity — no need to describe colors or clothing in the text.
  • The action sequence (hold → open → spray) creates a clear product narrative with progression.
  • The whip pan adds modern social-media energy without making the elegance feel slow.
  • The slow dolly-in paired with eye contact builds intimacy — a key trigger in beauty marketing.
  • Golden hour rim light plus soft bloom enhances perceived luxury and premium softness.
  • The slow-motion pacing elevates the spray moment, making the fragrance feel sensual rather than rushed.
  • The airy synth-pop with a glitter accent reinforces a light, feminine brand tone.

The output is a branded 9:16 clip that feels like a high-end fragrance campaign — generated in seconds.

Quick-Reference Checklist

Print this out or screenshot it. Run through it every time before you hit “Generate.”

  • Template selected — matches the mood and purpose of your video
  • Reference image uploaded — high-res, consistent style, clear subject
  • Prompt describes motion only — no repeated visual attributes from the image
  • Camera direction included — at least one cinematic movement specified
  • Lighting is mood-driven — not just “daytime” or “night”
  • Pacing cue added — tells Veo the rhythm and tempo of the scene
  • Audio direction layered — ambient + musical tone + accent sounds
  • Dialogue formatted simply — character says: “line” (if applicable)

Common Mistakes to Avoid

Over-describing the subject. If you uploaded a reference photo of a character in a red dress, don’t write “a woman in a red dress” in the prompt. Veo already knows. You’re burning prompt space on redundant information.

Ignoring the reference image quality. A blurry, poorly lit, or stylistically mixed reference will produce a blurry, poorly lit, stylistically mixed video. Garbage in, garbage out. Invest a few minutes in picking the right anchor image.

Writing generic lighting. “Bright” and “dark” aren’t lighting directions — they’re guesses. Use specific, mood-driven terms from the lighting list above. The difference between “daytime” and “golden hour rim light with soft bloom” is the difference between a snapshot and a campaign.

Forgetting audio entirely. Veo 3.1 generates synchronized sound by default. If you don’t direct it, you’ll get generic ambient audio. Take 10 seconds to add an audio line. It dramatically changes the feel of the final clip.

Treating templates as final products. Templates are starting points, not finished pieces. The magic happens when you customize — swap the placeholders, upload your own images, and rewrite the motion cues to match your brand.

What’s Next

Once you’re comfortable with single clips, explore Veo 3.1’s scene extension feature. It lets you generate new clips that connect to your previous video, maintaining visual continuity. Each new segment picks up from the final second of the last clip. This means you can chain together multiple 8-second generations into longer sequences — potentially building scenes that run a minute or more.

You can also experiment with frame-specific generation: provide a starting image and an ending image, and let Veo create the transition between them, complete with audio.

The shift here is fundamental. You’re not typing wishes into a box and hoping for the best. You’re directing — choosing visual identity, choreographing movement, setting camera angles, designing soundscapes, and controlling mood through light.

Every template is leveraged. Now go make something worth watching.