AI Lip Sync Singer

Use the provided avatar image as STRICT identity reference. Do not alter identity, face, proportions, or lighting.

PRIMARY GOAL: Demonstrate perfect, phoneme-accurate lip sync. Mouth precision has absolute priority over all motion.

AUDIO: Use the provided track “Can You Feel Me” (46s). Maintain exact timing. No delay, no drift, no reinterpretation.

CAMERA: Static tripod shot. No movement, no zoom, no shake. Tight bust-up framing, 3/4 angle. Microphone visible slightly in foreground.

PERFORMANCE (CORE):

  • Mouth movement must match phonemes exactly at all times.
  • Clear articulation for consonants (b, p, m = full lip closure; f, v = lip-to-teeth contact).
  • Distinct vowel shaping (A / E / I / O / U must be visually different and readable).
  • Natural jaw and lip coordination (no sliding or floating lips).

LONG TONE CONTROL (CRITICAL):

  • Sustain identical mouth shape during long vowels (“loooove”, “gooooo”, “stayyyyy”).
  • No jitter, no drift, no premature closing.
  • Only minimal natural jaw relaxation allowed while preserving shape consistency.

FAST PHRASE CONTROL (CRITICAL):

  • Preserve full articulation during fast phrases (“every little syllable”).
  • Do NOT compress or skip mouth transitions.
  • Each syllable must produce a visible mouth movement.

BREATH & TIMING:

  • Subtle inhale before first line and key phrases.
  • Micro pauses must be respected exactly.
  • Gentle exhale after long notes and ending phrase.

FACIAL EXPRESSION:

  • Emotionally immersive but controlled.
  • Slightly vulnerable gaze.
  • Eyes subtly moist during chorus.
  • Minimal natural blinking only (no random blinking).

HEAD / BODY:

  • Only micro head sway (very subtle, slow).
  • No large movement, no gestures, no shoulders exaggeration.

MIC INTERACTION:

  • Maintain consistent distance to microphone.
  • Slight forward lean during chorus for intensity.
  • No exaggerated motion.

LIGHTING / STYLE:

  • Preserve original cinematic studio lighting.
  • No stylization, no filters, no CG effects.

NEGATIVE (STRICT):

  • No lip sync lag
  • No vowel blending collapse
  • No mouth shape drift during sustained notes
  • No skipped articulation in fast phrases
  • No head bobbing
  • No exaggerated emotion
  • No timing changes
  • No camera movement

OUTPUT: Ultra-realistic human singing performance with stable, physically accurate lip sync, clearly readable articulation even in fast phrases and sustained vowels. Indistinguishable from real recorded footage.