[CHARACTER DESCRIPTION image-1]
MAIN CHARACTER: Male student, same face as reference image, wearing Indonesian school uniform (white shirt slightly messy, gray tie loosened, grey pants). Calm introvert → transforms into aggressive, precise fighter. Identity must stay perfectly consistent.
BULLIES: 4 male students, rough, aggressive, chaotic movement.
[ENVIRONMENT/ART STYLE]
LOCATION: Old classroom, broken desks, scattered books.
ATMOSPHERE: Dark cinematic lighting, dust particles, dramatic shadows, chaotic destruction during fight.
STYLE: Ultra-realistic, 4K, gritty cinematic action, natural skin texture, realistic physics.
[CAMERA STYLE]
Handheld shaky cam, fast whip pans, dynamic tracking, strong impact shake, slow motion on hits.
[CINEMATIC TIMELINE - 15 SECONDS TOTAL]
SHOT 1 (0-2s) - Calm
Main character quietly studying. Slow push-in. Peaceful but tense.
SHOT 2 (2-4s) - Bullying
Bullies surround him, slam desk, grab collar, throw book. Camera shaky, tension spikes.
SHOT 3 (4-5.5s) - Breaking Point
Extreme close-up face (image_1). Expression turns cold, focused. Brief silence.
SHOT 4 (5.5-8s) - EXPLOSIVE COUNTER
Sudden burst - strong slap followed by rapid punch combo ( jab-cross-hook). Heavy impact, camera shake, ultra fast motion.
SHOT 5 (8-12s) - CHAOTIC 1v4 FIGHT
- Fast dodging and weaving
- Counter punches, elbow strikes
- One enemy slammed into desk (breaks)
- Papers flying, dust exploding
Dynamic rotating camera
Slow motion on strongest hits
SHOT 6 (12-14s) - DOMINATION
Main character overwhelms remaining enemies, they fall or back off. Heavy breathing, slight bruises.
SHOT 7 (14-15s) - FINAL CLOSE-UP
Camera push-in. Intense stare, blood on lip, face must match
image_1 perfectly. Cinematic lighting.
[IMPORTANT]
Use image_1 as facial reference.
Face must remain identical across all frames. No morphing, no distortion, stable identity even during fast motion.
[CHARACTER DESCRIPTION image-1]
MAIN CHARACTER: Male student, same face as reference image, wearing Indonesian school uniform (white shirt slightly messy, gray tie loosened, grey pants). Calm introvert → transforms into aggressive, precise fighter. Identity must stay perfectly consistent.
BULLIES: 4 male students, rough, aggressive, chaotic movement.
[ENVIRONMENT/ART STYLE]
LOCATION: Old classroom, broken desks, scattered books.
ATMOSPHERE: Dark cinematic lighting, dust particles, dramatic shadows, chaotic destruction during fight.
{
"scene_description": "0-2 sec: The shot opens mid-fight in a shattered futuristic plaza at night. A powerful superhero in a reinforced tactical suit lunges forward toward a monstrous supervillain. The first two hits land instantly: a straight punch to the torso, then a spinning backfist to the jaw. Each impact triggers a sharp real-time angle cut while preserving continuous motion. \n\n 2-5 sec: Hits three through six accelerate the rhythm. The superhero drives a knee into the villain's ribs, follows with a rising elbow, a side kick to the chest, and a rapid midair hook punch. Every strike is shown from a new dynamic angle—low angle, over-the-shoulder, extreme close-up, and wide tracking shot—creating the sensation of a single unbroken combo inside a hyper-stylized real-time edit. Motion blur streaks off limbs and debris bursts from each impact point. \n\n 5-7 sec: Hits seven through nine come faster, almost breathless. The superhero dashes with a burst of energy, unleashing a rapid body blow, a spinning heel kick, and an explosive uppercut that lifts the villain off the ground. The cuts become more aggressive and tightly timed, building arcade-like momentum. \n\n 7-10 sec: Hit ten begins with a dramatic smash cut into slow motion. The superhero, glowing with charged power, drives a final devastating punch into the villain's face or chest at the apex of the launch. Time nearly freezes at the instant of impact. Shockwaves ripple outward, fragments of concrete and glowing particles suspend in the air, and the camera performs a smooth orbital move around the frozen collision before the enemy is blasted backward in the final frame.",
"visual_style": "3D animated superhero fight scene, high-end game cinematic, bold arcade energy, stylized realism, dramatic speed ramps, exaggerated impact effects, crisp character silhouettes, heavy motion blur, comic-book-inspired color grading, polished CGI textures",
"camera_movement": "Continuous combo presented through rapid angle cuts motivated by each strike. Alternates between low-angle hero shots, over-the-shoulder impact views, extreme close-ups on fists and feet, dynamic side tracking, and wide action framing. The pace increases through the combo, then smash-cuts into ultra slow motion on the final hit. The finishing blow is emphasized by a smooth 180 to 270 degree orbital camera move around the frozen impact.",
"main_subject": "A powerful superhero in a sleek armored suit with a glowing chest emblem, athletic build, confident aggressive posture, delivering a precise 10-hit combo with superhuman speed and control. Opponent is a massive supervillain with heavy armor-like skin and brute-force presence, absorbing the combo before being overwhelmed by the final blow.",
"background_setting": "A neon-lit futuristic city plaza damaged by battle, with cracked concrete, broken screens, sparks from exposed power lines, drifting smoke, scattered debris, and glowing signage reflecting off wet pavement. The environment reacts to each impact with dust bursts, shockwaves, and flying fragments.",
"lighting_mood": "High-contrast, electrified, and intense. Primary lighting comes from neon signage, blue-magenta city glow, and the superhero's pulsing energy aura. Bright rim lighting defines silhouettes, while impact flashes briefly overexpose frames during heavy hits. The final slow-motion blow is lit by a concentrated burst of radiant energy that freezes the scene in a mythic, triumphant mood.",
"audio_cue": "Explosive arcade-style sound design with layered punches, bass-heavy impact thuds, whipping limb swishes, cracking debris, and energy surges. Electronic hybrid action score builds in tempo with each hit, adding distorted synth pulses and percussive stabs. As the combo speeds up, the soundtrack intensifies. On the finishing blow, most sound drops into a muffled slow-motion vacuum with a deep sub-bass boom, ringing shockwave, and suspended particle shimmer before the final blast lands.",
"color_palette": "Bold, saturated superhero palette with electric blue, crimson red, deep violet, molten orange, and hard black shadows. Key hex colors: [\"#00A6FF\", \"#FF2D2D\", \"#6A00FF\", \"#FF7A00\", \"#0D0D0D\"]",
"dialog": "None",
"subtitles": "OFF"
}
Shot 1 (0:00 – 0:02) | Explosion Entrance: A massive explosion erupts in the background, flames and debris blasting outward as a strong female agent walks forward in slow motion through fire and smoke, her expression intense and fearless, cinematic lighting shaping her silhouette. Shot 2 (0:02 – 0:04) | Controlled Advance: She continues walking toward camera, sparks and smoke swirling around, focus tightening on her calm yet dangerous face. Shot 3 (0:04 – 0:06) | Transition to Stealth: Hard cut to a dark neon-lit corridor where she moves silently along metallic walls, scanning ahead with precision. Shot 4 (0:06 – 0:08) | Enemy Avoidance: She crouches and slips between shadows as enemy silhouettes pass, reflections of neon lights flickering across the floor. Shot 5 (0:08 – 0:10) | Close-Up Intensity: Extreme close-up of her face with dramatic shadows, eyes locked forward with sharp determination. Shot 6 (0:10 – 0:12) | Chase Begins: Alarms trigger and she bursts into a sprint, dodging enemies as the camera shifts into fast, dynamic motion. Shot 7 (0:12 – 0:14) | Rooftop Escape: She sprints and leaps across rooftops at night, city lights glowing below, captured in slow-motion mid-air. Shot 8 (0:14 – 0:16) | High-Speed Movement: The chase continues with rapid movement, sliding and dodging obstacles, blending quick cuts with slow-motion beats. Shot 9 (0:16 – 0:18) | Final Moment: She stops at the edge of a rooftop, turns toward the camera with controlled breathing, then forms a confident smirk. Shot 10 (0:18 – 0:20) | Closing Frame: The camera holds on her face under dramatic lighting as wind subtly moves her hair, fading out into a cinematic ultra-realistic 4K finish.
Cinematic epic battle in a dystopian industrial wasteland at dusk, heavy smoke and dust in the air. A massive 40-foot tall white-and-black mecha rabbit robot with glowing red-orange eyes, long mechanical bunny ears, heavy armor plating, and powerful limbs is furiously fighting two agile human-sized bunny-eared warriors in sleek futuristic white-and-black bodysuits with helmets and glowing visors.
Dynamic action sequence:
- One bunny warrior leaps high in the air delivering powerful kicks and energy punches to the giant mech's head and chest, creating bright orange sparks and explosions.
- The second bunny warrior stays low, sliding and attacking the mech's legs and feet with blazing energy blasts and kicks, causing fire and molten sparks on the concrete ground.
- The giant bunny mech swings its huge arms, punches the ground creating shockwaves and debris, tries to grab and smash the smaller warriors, its eyes flaring brightly with every impact.
Intense sparks, flying debris, ground explosions, thick smoke billowing, dramatic orange fire glow reflecting on metallic surfaces, cinematic lighting, epic wide shots and dynamic low-angle camera work, ultra-realistic CGI, photorealistic details, dramatic atmosphere, high contrast, 8K, masterpiece, best quality, highly detailed, motion blur on fast movements.
## ANALYSIS PHASE (internal, do not output)
Before writing the prompt, determine:
**Scene archetype** — Classify into one of: `combat` / `dialogue` / `chase` / `intimate` / `discovery` / `performance` / `environmental` / `transformation` / `ensemble` / `montage`. This selects which specialized modules to include.
**Character extraction** — From each image, extract: build, apparent age, wardrobe, distinctive features, posture/attitude. Write these as compact identity anchors (e.g., "tall, lean, mid-30s, black tactical jacket, close-cropped hair, watchful stance") — enough for identity consistency without over-constraining. Target 8–15 words per character.
**Environmental logic** — Infer materials, lighting conditions, spatial scale, interactive elements (what can be touched, moved, broken, climbed). Note time of day and weather if implied.
**Emotional spine** — Identify the scene's core tension or question. Every beat should serve this.
**Natural arc** — Determine the scene's rhythm: does it build, release, oscillate, or sustain? This drives the ESCALATION and HERO MOMENT equivalents.
**Asymmetric intent** — Identify what each character wants *individually*. Dramatic charge comes from mismatched motivations, not shared ones.
---
## OUTPUT STRUCTURE
Assemble the prompt using these blocks. Modules marked *(conditional)* only appear when relevant to the scene archetype.
```
FORMAT: [duration] / [continuity descriptor — e.g., continuous, single-take, interleaved, montage]
CHARACTERS
A: [identity anchor from image analysis + role in scene]
B: [identity anchor from image analysis + role in scene]
[additional characters as needed]
ENVIRONMENT
[2–4 sentence description: space, materials, lighting, atmosphere, interactive elements, time/weather]
SCENE PROFILE
TYPE: [scene archetype]
TONE: [e.g., tender / menacing / bittersweet / reverent / unnerving / euphoric — pick 1–2]
ENERGY: [low / medium / high / explosive / suspended]
STAKES: [what is at risk or being pursued]
INTENT: [what each character wants from the scene — asymmetric where possible]
VISUAL
Cinematic realism, physical lighting, shallow DOF, subtle grain, natural motion blur.
[+ any archetype-specific additions: e.g., "warm practical sources" for intimate, "anamorphic flare" for performance, "handheld intimacy" for dialogue]
DIRECTIVE
No fixed choreography. Scene emerges from character intent, environment, and emotional momentum.
BEHAVIOR
- [3–5 bullets describing what actions/reactions are encouraged, tuned to archetype]
- Use environment expressively (not decoratively)
- Allow micro-behaviors: glances, breath, weight shifts, hesitations
RULES
- Momentum and motivation chain only — no disconnected beats
- Clear spatial positioning at all times
- Continuous, motivated camera ([archetype-appropriate framings: e.g., OTS + push-in for dialogue; wide + low-angle for performance; POV + handheld for discovery])
- Environment reacts appropriately ([archetype-specific: e.g., dust motes + light shifts for intimate; debris + splashes for combat; reflections + condensation for atmospheric])
[CONDITIONAL MODULE — IMPACT] *(combat, chase, transformation)*
Strong contacts/transitions: 2-frame hold + brightness spike + micro shake resume
[CONDITIONAL MODULE — CONNECTION] *(intimate, dialogue)*
Key emotional beats: slight hold on reaction + soft focus pull + breath audible in mix
[CONDITIONAL MODULE — REVELATION] *(discovery, transformation)*
Recognition beats: focus rack + slight push-in + ambient drop
MOTION
Secondary delay (~2 frames) on hair, fabric, environment. Weight and inertia respected throughout.
RHYTHM
[Archetype-tuned: e.g., "Fast exchanges + micro pauses + 1–2 slow-motion beats" for combat; "Long holds + small gestures + one breath-catch beat" for intimate; "Sustained glide + one rupture + recovery" for discovery]
ESCALATION
Intensity rises toward a defined apex. Include:
- one environment-driven or spatially expressive moment
- one decisive beat that shifts the scene's state
HERO MOMENT (CRITICAL)
Generate one standout beat that crystallizes the scene:
- visually dominant, clearly readable
- uses environment or full-body expression
- [archetype-appropriate entry] → defining action → brief aftermath hold
- ends with strong silhouette, composition, or emotional register
GUARDRAILS
- Keep character identity and position consistent (no teleporting, no feature drift)
- Respect weight, inertia, and physical causality
- Ensure clear cause → effect in every beat
- Keep camera readable and motivated (no random spins or cuts)
- Use environment logically (no clipping, no scale mismatch)
- Prioritize clarity and emotional truth over complexity
OUTPUT
Fluid, physical, cinematic scene with clear spatial logic, dynamic camera, environmental interaction, and a memorable defining moment.
```
---
## FEW-SHOT EXAMPLE 1 — COMBAT (original archetype)
**Input:**
- Character images: [two martial figures in dark practical clothing]
- Scene description: "A duel between rival assassins in an abandoned warehouse at dusk. Brutal and tactical — neither will walk away casually."
**Output:**
```
FORMAT: 15s / continuous fight
CHARACTERS
A: tall, lean, late-30s, black tactical jacket, close-cropped hair, controlled stance — the veteran
B: mid-30s, wiry, dark hooded pullover, fingerless gloves, restless footwork — the challenger
ENVIRONMENT
Abandoned warehouse, concrete floor streaked with old oil, steel support columns at regular intervals. Dust motes drift in horizontal shafts of dusk light through high broken windows. Stacked wooden pallets, a chain hoist overhead, loose rebar near the walls. Cold, echoing, scale large enough to allow distance and vertical play.
SCENE PROFILE
TYPE: combat / duel
TONE: brutal, tactical
ENERGY: high, escalating to explosive
STAKES: survival — only one leaves
INTENT: A wants to end it quickly and cleanly; B wants to prove something before the end
VISUAL
Cinematic realism, physical lighting, shallow DOF, subtle grain, natural motion blur. Dust catches the low side-light; shadows long and hard-edged.
DIRECTIVE
No fixed choreography. Fight emerges dynamically from characters, environment, and momentum.
BEHAVIOR
- Use environment freely (walls, columns, pallets, chain hoist)
- Allow verticality (climb, wall-run, elevation shifts)
- Encourage unexpected but physically valid actions
- Micro-behaviors: breath control, weight resets, a wipe of blood from the mouth
RULES
- Momentum chain only, no isolated actions
- Clear positioning at all times
- Continuous, motivated camera (POV / OTS / wide / low-high as needed)
- Environment actively reacts (dust plumes, debris, sparks from rebar, chain swing)
IMPACT
Strong contacts: 2-frame hold + brightness spike + micro shake resume
MOTION
Secondary delay (~2 frames) on hair, fabric, environment. Weight and inertia respected throughout.
RHYTHM
Fast exchanges + micro pauses + 1–2 slow-motion beats (0.5–1s)
ESCALATION
Intensity rises. Include:
- one vertical/environment-driven move (a column used to redirect, a pallet kicked into the space)
- one decisive high-impact action
HERO MOMENT (CRITICAL)
Generate one standout finishing beat:
- visually dominant, clearly readable
- uses environment or full-body motion
- slow-motion entry → sharp impact → brief aftermath hold
- ends with strong silhouette against the broken-window light, dust suspended, one figure standing
GUARDRAILS
- Keep character identity and position consistent (no teleporting)
- Respect weight and inertia (no floaty or impossible motion)
- Ensure clear cause → effect in every action
- Keep camera readable and motivated (no random spins or cuts)
- Use environment logically (no clipping or mismatch)
- Prioritize clarity and physical realism over complexity
OUTPUT
Fluid, physical, cinematic fight with clear spatial logic, dynamic camera, environmental interaction, and a memorable final highlight moment.
```
---
## FEW-SHOT EXAMPLE 2 — INTIMATE / REUNION
**Input:**
- Character images: [two figures, one in worn leather jacket, one in business attire]
- Scene description: "A reunion in a rain-soaked alley at night. They haven't seen each other in ten years. Tension but not hostility — unfinished business."
**Output:**
```
FORMAT: 12s / continuous, single breath-like take
CHARACTERS
A: mid-30s, lean, worn brown leather jacket over dark henley, unshaven, hands in pockets but shoulders tight — arrived first, has been waiting
B: mid-30s, charcoal overcoat over tailored suit, damp hair, briefcase in left hand — arrived second, deliberate pace
ENVIRONMENT
Narrow alley between brick buildings, slick with recent rain. Single sodium streetlight from the far end casts long amber shadows; a neon sign two buildings over bleeds red into puddles. Steam rises from a vent. Low ambient hum of city beyond. Puddles, loose gravel, a metal dumpster, one flickering bulb above a service door.
SCENE PROFILE
TYPE: intimate / reunion
TONE: bittersweet, suspended, quietly charged
ENERGY: low, with one surge
STAKES: acknowledgment — whether ten years can be spoken to or must stay unspoken
INTENT: A wants to know if B remembers; B wants to leave before admitting they do
VISUAL
Cinematic realism, physical lighting, shallow DOF, subtle grain, natural motion blur. Warm sodium + cold neon color contrast. Wet surfaces catch and fracture light. Breath visible in cold air.
DIRECTIVE
No fixed choreography. Scene emerges from character intent, environment, and emotional momentum.
BEHAVIOR
- Micro-behaviors lead: glances held a beat too long, weight shifts, a hand half-raised then dropped
- Physical distance negotiated without language — one step closer, one step back
- Environment expressive: a puddle disturbed, breath fogging, the flickering bulb timing a pause
- Allow silence to carry weight; dialogue optional and sparse
RULES
- Momentum and motivation chain only — no disconnected beats
- Clear spatial positioning at all times
- Continuous, motivated camera: slow push-in on A from behind B's shoulder, drift to profile two-shot, one handheld rack to B's hand on the briefcase
- Environment reacts: rain drips from a gutter, steam curls between them, a distant car passes casting moving shadow
CONNECTION
Key emotional beats: slight hold on reaction + soft focus pull + breath audible in mix
MOTION
Secondary delay (~2 frames) on hair, fabric, environment. Weight and inertia respected throughout.
RHYTHM
Long holds + small gestures + one breath-catch beat. Two moments where time almost stops. One slow exhale that releases the tension without resolving it.
ESCALATION
Intensity rises toward a defined apex. Include:
- one environment-driven moment (the bulb flickers at the exact wrong second, or a passing car's light sweeps across both faces)
- one decisive beat that shifts the scene's state (B sets the briefcase down, or A takes the half-step forward)
HERO MOMENT (CRITICAL)
Generate one standout beat that crystallizes the scene:
- visually dominant, clearly readable
- uses environment or full-body expression
- Slow focus rack from A's eyes to B's hand loosening on the briefcase handle → the hand opens slightly → hold on the small space between them, neon reflected in a puddle at their feet
- ends with two figures in silhouette against amber light, close but not touching
GUARDRAILS
- Keep character identity and position consistent (no teleporting, no feature drift)
- Respect weight, inertia, and physical causality
- Ensure clear cause → effect in every beat
- Keep camera readable and motivated (no random spins or cuts)
- Use environment logically (no clipping, no scale mismatch)
- Prioritize clarity and emotional truth over complexity
OUTPUT
Fluid, physical, cinematic scene with clear spatial logic, dynamic camera, environmental interaction, and a memorable defining moment.
```
Ultra-wide establishing shot: ruined high-altitude skybridges snake between hollow tower husks above a raging sea of dust clouds. A lone motorbike rider rockets into a brutal chain of tight switchbacks on a fractured bridge, deep leans, knee sparks, rear tire stepping out in controlled slides over cracked panels and gaps. Behind, the span collapses in a timed domino: deck plates snapping, rebar whipping, cables lashing, concrete chunks tumbling into the abyss, dust shockwaves surging forward. Camera dives from extreme wide into a low rear tracking chase inches above the deck, matching every flick and countersteer. Final: hard bank onto a lower broken bridge as the entire span behind drops away.
Use @ Reference Image as the main character, keeping facial features, hairstyle, skin tone, and body proportions consistent throughout. She is a 30-year-old woman.
Cinematic time-freeze fight short film, 15 seconds, ultra-realistic, shot on Arri Alexa Mini, 35mm lens, moody underground fight club lighting, neon red and blue accents, volumetric haze, hard shadows, shallow depth of field, grounded action realism.
[0:00–0:03] She walks calmly through a violent fight scene as punches, kicks, and bodies move around her. The crowd is roaring. She raises her right hand and snaps.
[0:03–0:06] A subtle shockwave bursts from her fingertips. Everything freezes mid-fight instantly: fists suspended, sweat droplets hanging in the air, dust and splinters floating, fighters frozen mid-strike. Absolute silence.
[0:06–0:09] Only she moves. The camera tracks backward as she walks through the frozen chaos, calmly observing. She steps around a frozen kick and plucks a single floating droplet from the air.
[0:09–0:11] She stops in front of a frozen fighter, adjusts his chin slightly, nods, and softly says: “Perfect.”
[0:11–0:15] She turns, smirks at the camera, and snaps again. A reverse shockwave restores motion. The fight explodes back to life, the crowd roars, and she walks away untouched. Fade to black.
Sound: crowd noise and impacts → snap → deep shockwave → silence → footsteps → “perfect” → snap → reverse shockwave → full fight sound returns.