Core Concepts
The building blocks of MP4E and how they relate to each other.
This page introduces the key concepts you'll encounter when working with MP4E. Whether you're embedding a player, building a plugin, or creating interactive content, understanding these building blocks will help you work effectively with the platform.
The Engine
The engine is the core runtime of MP4E — a compiled Rust binary that handles all logic, state, and decisions. It compiles to WebAssembly for browsers and native binaries for iOS and Android.
Engine (Brain)
Compiled binary. Runs in a sandbox isolated from the DOM.
- • Visibility & position calculations
- • Variable interpolation
- • Rule evaluation & action execution
- • Timer, counter & state machine logic
- • Plugin content generation
- • Permission enforcement
Bridge (Body)
Thin platform adapter. Connects the engine to native APIs.
- • Video playback (play, pause, seek)
- • DOM/UIKit rendering
- • User interaction capture
- • Plugin iframe/WebView management
- • Browser/OS API access
The engine makes all decisions — "is this overlay visible?", "what position should it be at?", "does this rule match?" — and the bridge simply executes the result on the platform. This means the same logic runs identically whether the video plays in a browser, a native iOS app, or an Android app.
Layers & Overlays
Layers are organizational containers. Overlays are the individual interactive elements positioned on top of the video. Every overlay belongs to a layer.
Overlays can be fixed (static position on screen), object-bound (follow a specific object), or group-bound (automatically expand to create one instance per object in a group). Each overlay has visibility rules, z-index ordering, and time-based visibility tied to scenes.
Layers can be activated and deactivated at runtime. This lets you organize overlays into logical groups — "product hotspots", "educational annotations", "host branding" — and toggle entire groups on or off.
Plugins
Plugins provide the content inside overlays. An overlay is a positioned container; a plugin is what renders inside it. Plugins are built with HTML, CSS, and JavaScript, and run inside sandboxed iframes with a controlled bridge API.
| Type | Description | Examples |
|---|---|---|
| Overlay | Fixed position UI element on the video | CTA buttons, banners, countdowns |
| Object Display | Appears near an object on interaction (hover, click, touch, remote select) | Tooltips, product cards, info panels |
| Modal | Centered dialog that pauses video | Quizzes, forms, checkout flows |
| Service | Background plugin with no visible UI | Shopping cart, analytics, API connectors |
| Controls | Custom video player controls | Play/pause, seek bar, volume |
| Subtitle | Custom subtitle rendering | Karaoke, highlighted words, styled captions |
All plugins — including core ones — use the same JSON-based format. This means a tooltip from the MP4E marketplace, a custom quiz you built, and a third-party checkout widget all work identically. Plugins communicate with each other through variables (shared state) and events (notifications), keeping them decoupled and reusable.
mp4e bridge API. This protects both the host application and the video content from untrusted plugin code.Objects & Groups
Objects represent things detected or defined in the video — products, people, furniture, text, surfaces, or any region of interest. Each object has:
- IDIdentity — A unique identifier and label (AI-detected or user-defined)
- ☉Tracking Data — Position (bounding box, polygon, mesh) at each frame
- ▢Custom Data — Arbitrary fields (price, SKU, description, URL) defined by the group's data schema
Groups organize objects by category (e.g., "Shoppable Products", "People", "Furniture"). Each group defines a data schema (the custom fields its objects have) and display bindings (what plugin to show when a user interacts with an object — hover, click, touch, remote select, etc.).
Example: A group called "Shoppable Products" has a data schema with title, price, and image fields. Its display bindings say: "on hover or focus, show a tooltip with {{object.data.title}}; on click or tap, show a product card." Every object in the group automatically gets these behaviors across all input methods — mouse, touch, keyboard, and remote control.
Variables
Variables store state that persists throughout the video experience. They are the shared memory that connects plugins, rules, and actions together.
Text, number, boolean, counter
Timer (count up/down), date, accumulated values
State machines, expressions, mapped values
Lists, objects, maps, sets, JSON
Variables can be referenced anywhere using {{variableName}} syntax — in plugin content, overlay configs, rule conditions, and action parameters. The engine handles all interpolation, including array indexing ({{items[0]}}), dynamic indexing ({{items[currentIndex]}}), timer decomposition ({{timer.minutes}}), and fallback values ({{name || 'Guest'}}).
onChange rules attached to that variable, which can trigger further actions.Rules & Actions
Rules define interactive behaviors using a trigger → condition → action pattern. They are the primary way to make videos respond to user interaction.
plugin event, variable change
time ranges, event data checks
seek, navigate, plugin action
MP4E supports 69 action types spanning playback control, navigation, UI changes, variable operations, layer management, plugin communication, and more. Actions compose into reactive chains: a rule can set a variable, which triggers an onChange rule, which evaluates conditions and fires more actions — creating cascading feedback loops from a single user interaction.
Rules can be attached at multiple levels: project-wide (global), per-scene, per-group, per-overlay, or triggered by plugin events. The engine evaluates all matching rules in priority order and executes their actions — all within the sandboxed runtime.
Scenes
Scenes are time-based markers that divide the video into logical segments. Each scene has a time range and fires lifecycle events when playback crosses its boundaries: onEnter, onExit, onStart, onEnd, and onLoop. These events can trigger rules and actions — showing overlays, setting variables, changing layers, or anything else the rules engine supports.
Example: An e-commerce video has three scenes: "Product Showcase" (0:00–0:15), "Feature Comparison" (0:15–0:35), and "Call to Action" (0:35–0:45). Entering "Call to Action" fires an onEnter event that triggers a rule to show a purchase button overlay.
Use goToScene actions to create non-linear video flows, branching narratives, or conditional story paths.
How It Fits Together
All these concepts connect in a straightforward way:
Layers contain overlays. Scenes mark time segments and fire events when playback enters or exits them.
Overlays render plugins — the interactive content the viewer sees and interacts with.
Overlays can stand alone, or optionally bind to objects and groups to follow tracked objects in the video.
User interactions (clicks, taps, hovers, remote select, keyboard focus), plugin events, scene transitions, and variable changes fire rules. Rules check conditions and execute actions.
Actions modify variables, control playback, show/hide overlays, navigate between scenes, or communicate with plugins — which can trigger more rules, creating reactive chains.
The engine orchestrates everything — evaluating rules, tracking state, interpolating variables, enforcing permissions — while the bridge renders the result on whatever platform the video plays on.
Ready to build?
Now that you understand the building blocks, dive into the specific documentation: