Introduction
MP4E — A programmable video container format.
MP4E (MP4 Enhanced) is a programmable video container — a format that turns standard MP4 video files into self-contained, interactive applications. An MP4E file is a valid MP4 that plays normally in any player. But when loaded with the MP4E engine, it becomes interactive, shoppable, trackable, branching, and programmable — with no external dependencies.
PDF made documents portable and intelligent. MP4E does the same for video. The file carries its own logic, interactivity, and assets wherever it goes.
What is MP4E?
An MP4E video embeds a complete application runtime inside the file: a compiled engine, a plugin system, object tracking data, event rules, variables, scenes, and an embedded file system for assets. The result is a video that behaves like software — responding to user input, managing state, enforcing permissions, and rendering dynamic content — while remaining a standard MP4 file that any player can decode.
Compiled Engine
A Rust-based binary engine handles all logic — rules, gating, variables, interpolation. Runs as compiled code on every platform, isolated from the DOM.
Everything Is a Plugin
Controls, subtitles, overlays, modals, analytics, cart management — all are plugins running in sandboxed iframes with a controlled bridge API.
Object Tracking
AI-powered object detection and frame-by-frame tracking. Objects can be grouped, given data schemas, and bound to interactive overlays that follow them on screen.
65+ Built-in Actions
A programmable event/action system with variables, conditions, timers, scene management, and plugin-to-plugin communication.
Self-Contained
All metadata, assets, plugins, and logic are embedded in the file. No server, no CDN, no external dependencies. The video IS the application.
Three-Tier Gating
Every action passes through the engine's permission model: creator rules, host platform policies, and viewer preferences — enforced in compiled code.
The Engine
The MP4E engine is a compiled Rust binary that serves as the video's runtime. It is not a JavaScript library — it is machine code compiled from Rust to WebAssembly for browsers, and to native binaries for iOS, Android, and other platforms. The same source code produces identical behavior across all targets.
Compiled Binary, Not Script
All business logic — visibility rules, variable evaluation, action gating, plugin communication, template interpolation — executes as compiled code with near-native performance. No interpreter overhead, no garbage collector pauses, no JIT warmup.
Compilation targets: WebAssembly (browsers), native binary (iOS, Android, desktop, servers).
Sandboxed Execution
The engine's memory is completely isolated from the DOM. JavaScript cannot inspect, modify, or bypass the engine's internal state. Gating rules, permission checks, and variable values all live inside sandboxed memory that the host environment treats as opaque binary. This makes the permission model tamper-resistant — when a host restricts an action, that restriction is enforced in compiled code, not in an inspectable script.
Three-Tier Permission Model
Every action in the video — including play and pause — passes through the engine's gating system. The gate validates actions against three layers of rules that intersect (each layer restricts, never expands):
- Creator rules — the video author's declared restrictions and experience design
- Host policies — the embedding platform's controls (action blocking, content restrictions, plugin overrides)
- Viewer preferences — the end user's privacy, accessibility, and playback settings
Plugin System
Everything visible in an MP4E video is a plugin — player controls, subtitles, overlays, modals, tooltips, product cards, analytics trackers, shopping carts, AI avatars. Plugins are HTML/CSS/JS bundles that run in sandboxed iframes, communicating with the engine through a controlled bridge API. A plugin cannot access the host page's DOM, other plugins' state, or the engine's internals.
| Type | Description | Examples |
|---|---|---|
| Overlay | Positioned over the video with time-based visibility | Buttons, CTAs, banners, watch party widgets, AI avatars |
| Object Display | Bound to detected objects or groups, follows them on screen | Tooltips, product cards, contact cards, info panels |
| Modal | Centered dialog triggered by events, pauses video | Checkout forms, detail views, signup forms, quizzes |
| Service | Invisible background plugin, always running | Cart management, inventory checks, analytics, API integrations |
| Controls | Fully customizable player controls replacing the default UI | Play/pause, seek bar, volume, menus, thumbnails |
| Subtitle | Custom subtitle renderers with per-word events | Karaoke-style, multi-track, styled text |
Plugins have inputs (config), outputs (variables), actions (callable by other plugins or system events), and emits (events the plugin reports). Plugins can create and share project variables, listen to system events (variable changes, playback status, user interactions), and call system functions through the bridge API. Users wire plugin events to actions visually in the Studio — no code required.
A plugin marketplace provides ready-made plugins for common use cases, and developers can publish their own.
Object Intelligence
MP4E can detect, track, and make objects in the video interactive. Objects are identified using AI models or defined manually, then tracked frame-by-frame with bounding boxes, segmentation polygons, or surface corner tracking.
Object Groups & Data Schemas
Objects are organized into groups (e.g., "Products", "Characters") with shared configuration. Each group defines a data schema — custom fields like title, price, URL — that appear as editable properties per object and are accessible in plugin templates via {{object.data.price}}.
Display Bindings
Groups configure which plugin to show on different interactions — a tooltip on hover, a product card on click, a detail modal on long-press. Overlays bound to a group automatically expand to one instance per visible object, each following its object on screen with interpolated data.
Replacement Zones
Replace tracked regions in real-time — swap colors, fabrics, images, or apply blur/pixelation with mesh-aware perspective correction. Works for billboards, clothing, backdrops, and any flat or tracked surface.
Tracking Visualization
Runtime visualization of tracking data — polygons, bounding boxes, mesh, corners — controllable through actions. Useful for selection highlighting, product emphasis, or debugging.
Events & Actions
MP4E provides a fully programmable event/action system. Objects, plugins, scenes, and the video itself all emit events that can trigger actions — and actions can trigger further events, creating complex interactive flows without writing code.
69 Actions
Play, pause, seek, set variable, show/hide overlay, go to scene, toggle layer, show notification, execute plugin action, control tracking visualization, and more.
15 Variable Types
Text, number, boolean, counter, timer, date, state machine, computed, mapped, list, object, map, set, JSON, and accumulated — with support for expressions and cross-variable references.
Scenes & Layers
Scenes define segments with lifecycle hooks (onEnter, onExit) that trigger actions. Layers group overlays with visibility conditions. Both support branching, conditional flow, and time-based activation.
Self-Contained Format
An MP4E file carries everything it needs to run. All interactivity metadata is stored inside the MP4 file as a custom atom (----:com.mp4e.data:payload), and the embedded file system can include additional media, documents, sound effects, or any asset the interactive experience requires. No external CDN, no broken links, no server dependency.
Embedded Metadata
Object tracking, overlays, plugins, variables, rules, scenes, layers — all serialized as compressed JSON inside the MP4 container.
Embedded File System
Media for alternate scenes, PIP content, PDFs, sound effects, images — any file can be embedded inside the video and referenced by the interactive layers.
Portable
The same file works across websites, apps, email, corporate intranets, digital signage — anywhere an MP4 plays. The intelligence and interactivity travel with the file.
Graceful Degradation
Without the MP4E engine, the file plays as a normal video. With the engine loaded, the full interactive experience activates — no special apps or browser extensions required.
Host Integration
Hosts — platforms and applications that embed MP4E videos — have full control over what the video can do in their environment. The host integration layer lets platforms enforce their own policies while preserving the creator's experience design.
Metadata Injection
Hosts can deep-merge their own layers, plugins, variables, and settings into any video at the player level — the original file is untouched. Inject branded watermarks, analytics plugins, compliance layers, or custom controls across all videos.
Action Gating
Hosts define which actions are allowed, blocked, or require approval. A host could disable downloads but keep quizzes, or require user confirmation before any navigation action — all enforced in the engine's compiled gating layer.
Plugin Overrides
Replace the video's built-in controls or subtitle renderer with the host's own plugin. Force a branded player skin across all videos, or inject a platform-specific analytics plugin automatically.
Security & Sandbox
Configure sandbox levels, trusted plugin signers, network gating (control which URLs plugins can fetch), and CSP violation callbacks. The host controls the security boundary for all plugins running in their environment.
Getting Started
Choose your path based on what you want to build:
Embed your first interactive video in 5 minutes
Install and configure the player for your platform
Build custom plugins with HTML, CSS, and JavaScript
Explore all 65+ available actions
Understand the schema for creating metadata
Learn to use the visual editor — no code required