Introduction

MP4E — A programmable video container format.

MP4E (MP4 Enhanced) is a programmable video container — a format that turns standard MP4 video files into self-contained, interactive applications. An MP4E file is a valid MP4 that plays normally in any player. But when loaded with the MP4E engine, it becomes interactive, shoppable, trackable, branching, and programmable — with no external dependencies.

PDF made documents portable and intelligent. MP4E does the same for video. The file carries its own logic, interactivity, and assets wherever it goes.

What is MP4E?

An MP4E video embeds a complete application runtime inside the file: a compiled engine, a plugin system, object tracking data, event rules, variables, scenes, and an embedded file system for assets. The result is a video that behaves like software — responding to user input, managing state, enforcing permissions, and rendering dynamic content — while remaining a standard MP4 file that any player can decode.

Compiled Engine

A Rust-based binary engine handles all logic — rules, gating, variables, interpolation. Runs as compiled code on every platform, isolated from the DOM.

Everything Is a Plugin

Controls, subtitles, overlays, modals, analytics, cart management — all are plugins running in sandboxed iframes with a controlled bridge API.

Object Tracking

AI-powered object detection and frame-by-frame tracking. Objects can be grouped, given data schemas, and bound to interactive overlays that follow them on screen.

65+ Built-in Actions

A programmable event/action system with variables, conditions, timers, scene management, and plugin-to-plugin communication.

Self-Contained

All metadata, assets, plugins, and logic are embedded in the file. No server, no CDN, no external dependencies. The video IS the application.

Three-Tier Gating

Every action passes through the engine's permission model: creator rules, host platform policies, and viewer preferences — enforced in compiled code.

The Engine

The MP4E engine is a compiled Rust binary that serves as the video's runtime. It is not a JavaScript library — it is machine code compiled from Rust to WebAssembly for browsers, and to native binaries for iOS, Android, and other platforms. The same source code produces identical behavior across all targets.

Compiled Binary, Not Script

All business logic — visibility rules, variable evaluation, action gating, plugin communication, template interpolation — executes as compiled code with near-native performance. No interpreter overhead, no garbage collector pauses, no JIT warmup.

Compilation targets: WebAssembly (browsers), native binary (iOS, Android, desktop, servers).

Sandboxed Execution

The engine's memory is completely isolated from the DOM. JavaScript cannot inspect, modify, or bypass the engine's internal state. Gating rules, permission checks, and variable values all live inside sandboxed memory that the host environment treats as opaque binary. This makes the permission model tamper-resistant — when a host restricts an action, that restriction is enforced in compiled code, not in an inspectable script.

Three-Tier Permission Model

Every action in the video — including play and pause — passes through the engine's gating system. The gate validates actions against three layers of rules that intersect (each layer restricts, never expands):

  • Creator rules — the video author's declared restrictions and experience design
  • Host policies — the embedding platform's controls (action blocking, content restrictions, plugin overrides)
  • Viewer preferences — the end user's privacy, accessibility, and playback settings
Engine = Brain, Player = Body
The engine never touches the DOM or video element directly. It evaluates state, processes rules, and emits instructions. The player (a thin platform-specific bridge) renders visuals and captures interactions. This separation is what makes the same engine work across web, iOS, Android, and any future platform.

Plugin System

Everything visible in an MP4E video is a plugin — player controls, subtitles, overlays, modals, tooltips, product cards, analytics trackers, shopping carts, AI avatars. Plugins are HTML/CSS/JS bundles that run in sandboxed iframes, communicating with the engine through a controlled bridge API. A plugin cannot access the host page's DOM, other plugins' state, or the engine's internals.

TypeDescriptionExamples
OverlayPositioned over the video with time-based visibilityButtons, CTAs, banners, watch party widgets, AI avatars
Object DisplayBound to detected objects or groups, follows them on screenTooltips, product cards, contact cards, info panels
ModalCentered dialog triggered by events, pauses videoCheckout forms, detail views, signup forms, quizzes
ServiceInvisible background plugin, always runningCart management, inventory checks, analytics, API integrations
ControlsFully customizable player controls replacing the default UIPlay/pause, seek bar, volume, menus, thumbnails
SubtitleCustom subtitle renderers with per-word eventsKaraoke-style, multi-track, styled text

Plugins have inputs (config), outputs (variables), actions (callable by other plugins or system events), and emits (events the plugin reports). Plugins can create and share project variables, listen to system events (variable changes, playback status, user interactions), and call system functions through the bridge API. Users wire plugin events to actions visually in the Studio — no code required.

A plugin marketplace provides ready-made plugins for common use cases, and developers can publish their own.

Object Intelligence

MP4E can detect, track, and make objects in the video interactive. Objects are identified using AI models or defined manually, then tracked frame-by-frame with bounding boxes, segmentation polygons, or surface corner tracking.

Object Groups & Data Schemas

Objects are organized into groups (e.g., "Products", "Characters") with shared configuration. Each group defines a data schema — custom fields like title, price, URL — that appear as editable properties per object and are accessible in plugin templates via {{object.data.price}}.

Display Bindings

Groups configure which plugin to show on different interactions — a tooltip on hover, a product card on click, a detail modal on long-press. Overlays bound to a group automatically expand to one instance per visible object, each following its object on screen with interpolated data.

Replacement Zones

Replace tracked regions in real-time — swap colors, fabrics, images, or apply blur/pixelation with mesh-aware perspective correction. Works for billboards, clothing, backdrops, and any flat or tracked surface.

Tracking Visualization

Runtime visualization of tracking data — polygons, bounding boxes, mesh, corners — controllable through actions. Useful for selection highlighting, product emphasis, or debugging.

Events & Actions

MP4E provides a fully programmable event/action system. Objects, plugins, scenes, and the video itself all emit events that can trigger actions — and actions can trigger further events, creating complex interactive flows without writing code.

69 Actions

Play, pause, seek, set variable, show/hide overlay, go to scene, toggle layer, show notification, execute plugin action, control tracking visualization, and more.

15 Variable Types

Text, number, boolean, counter, timer, date, state machine, computed, mapped, list, object, map, set, JSON, and accumulated — with support for expressions and cross-variable references.

Scenes & Layers

Scenes define segments with lifecycle hooks (onEnter, onExit) that trigger actions. Layers group overlays with visibility conditions. Both support branching, conditional flow, and time-based activation.

Variables + Events + Actions + Service Plugins
These four systems compose together into reactive chains: a plugin emits an event, the event triggers an action that sets a variable, the variable change fires an onChange rule that evaluates conditions and triggers further actions, which may set more variables or emit more events — creating cascading feedback loops. A single user click can ripple through multiple layers of logic before settling. This is how complex interactive experiences are built — declaratively, without code.

Self-Contained Format

An MP4E file carries everything it needs to run. All interactivity metadata is stored inside the MP4 file as a custom atom (----:com.mp4e.data:payload), and the embedded file system can include additional media, documents, sound effects, or any asset the interactive experience requires. No external CDN, no broken links, no server dependency.

Embedded Metadata

Object tracking, overlays, plugins, variables, rules, scenes, layers — all serialized as compressed JSON inside the MP4 container.

Embedded File System

Media for alternate scenes, PIP content, PDFs, sound effects, images — any file can be embedded inside the video and referenced by the interactive layers.

Portable

The same file works across websites, apps, email, corporate intranets, digital signage — anywhere an MP4 plays. The intelligence and interactivity travel with the file.

Graceful Degradation

Without the MP4E engine, the file plays as a normal video. With the engine loaded, the full interactive experience activates — no special apps or browser extensions required.

Host Integration

Hosts — platforms and applications that embed MP4E videos — have full control over what the video can do in their environment. The host integration layer lets platforms enforce their own policies while preserving the creator's experience design.

Metadata Injection

Hosts can deep-merge their own layers, plugins, variables, and settings into any video at the player level — the original file is untouched. Inject branded watermarks, analytics plugins, compliance layers, or custom controls across all videos.

Action Gating

Hosts define which actions are allowed, blocked, or require approval. A host could disable downloads but keep quizzes, or require user confirmation before any navigation action — all enforced in the engine's compiled gating layer.

Plugin Overrides

Replace the video's built-in controls or subtitle renderer with the host's own plugin. Force a branded player skin across all videos, or inject a platform-specific analytics plugin automatically.

Security & Sandbox

Configure sandbox levels, trusted plugin signers, network gating (control which URLs plugins can fetch), and CSP violation callbacks. The host controls the security boundary for all plugins running in their environment.

Format, Not Service
MP4E provides a portable video format and creation tools — not a hosted service. Plugins requiring server-side functionality (payments, databases, webhooks) are implemented by the host application, not MP4E. The video carries the client-side experience; the host provides the backend.

Getting Started

Choose your path based on what you want to build: