Core Concepts

The building blocks of MP4E and how they relate to each other.

This page introduces the key concepts you'll encounter when working with MP4E. Whether you're embedding a player, building a plugin, or creating interactive content, understanding these building blocks will help you work effectively with the platform.

The Engine

The engine is the core runtime of MP4E — a compiled Rust binary that handles all logic, state, and decisions. It compiles to WebAssembly for browsers and native binaries for iOS and Android.

🧠

Engine (Brain)

Compiled binary. Runs in a sandbox isolated from the DOM.

  • • Visibility & position calculations
  • • Variable interpolation
  • • Rule evaluation & action execution
  • • Timer, counter & state machine logic
  • • Plugin content generation
  • • Permission enforcement
🎬

Bridge (Body)

Thin platform adapter. Connects the engine to native APIs.

  • • Video playback (play, pause, seek)
  • • DOM/UIKit rendering
  • • User interaction capture
  • • Plugin iframe/WebView management
  • • Browser/OS API access

The engine makes all decisions — "is this overlay visible?", "what position should it be at?", "does this rule match?" — and the bridge simply executes the result on the platform. This means the same logic runs identically whether the video plays in a browser, a native iOS app, or an Android app.

Sandboxed Execution
The engine runs inside its own memory sandbox, completely isolated from the page's DOM. Host code cannot reach into the engine, and the engine cannot directly manipulate the page. All communication flows through a narrow, well-defined event interface. This makes the runtime tamper-resistant and predictable.

Layers & Overlays

Layers are organizational containers. Overlays are the individual interactive elements positioned on top of the video. Every overlay belongs to a layer.

Layer: "Product Hotspots"
├─ Overlay: "Buy Button" (fixed position, bottom-right)
├─ Overlay: "Price Tag" (follows object "sneaker")
└─ Overlay: "Hotspot" (group-bound to "Shoppable Items")
Layer: "Captions"
└─ Overlay: "Subtitle Track"

Overlays can be fixed (static position on screen), object-bound (follow a specific object), or group-bound (automatically expand to create one instance per object in a group). Each overlay has visibility rules, z-index ordering, and time-based visibility tied to scenes.

Layers can be activated and deactivated at runtime. This lets you organize overlays into logical groups — "product hotspots", "educational annotations", "host branding" — and toggle entire groups on or off.

Plugins

Plugins provide the content inside overlays. An overlay is a positioned container; a plugin is what renders inside it. Plugins are built with HTML, CSS, and JavaScript, and run inside sandboxed iframes with a controlled bridge API.

TypeDescriptionExamples
OverlayFixed position UI element on the videoCTA buttons, banners, countdowns
Object DisplayAppears near an object on interaction (hover, click, touch, remote select)Tooltips, product cards, info panels
ModalCentered dialog that pauses videoQuizzes, forms, checkout flows
ServiceBackground plugin with no visible UIShopping cart, analytics, API connectors
ControlsCustom video player controlsPlay/pause, seek bar, volume
SubtitleCustom subtitle renderingKaraoke, highlighted words, styled captions

All plugins — including core ones — use the same JSON-based format. This means a tooltip from the MP4E marketplace, a custom quiz you built, and a third-party checkout widget all work identically. Plugins communicate with each other through variables (shared state) and events (notifications), keeping them decoupled and reusable.

Plugin Sandbox
Every plugin runs in its own sandboxed iframe. It cannot access the host page's DOM, cookies, or JavaScript — only the controlled mp4e bridge API. This protects both the host application and the video content from untrusted plugin code.

Objects & Groups

Objects represent things detected or defined in the video — products, people, furniture, text, surfaces, or any region of interest. Each object has:

  • ID
    Identity — A unique identifier and label (AI-detected or user-defined)
  • Tracking Data — Position (bounding box, polygon, mesh) at each frame
  • Custom Data — Arbitrary fields (price, SKU, description, URL) defined by the group's data schema

Groups organize objects by category (e.g., "Shoppable Products", "People", "Furniture"). Each group defines a data schema (the custom fields its objects have) and display bindings (what plugin to show when a user interacts with an object — hover, click, touch, remote select, etc.).

Example: A group called "Shoppable Products" has a data schema with title, price, and image fields. Its display bindings say: "on hover or focus, show a tooltip with {{object.data.title}}; on click or tap, show a product card." Every object in the group automatically gets these behaviors across all input methods — mouse, touch, keyboard, and remote control.

Variables

Variables store state that persists throughout the video experience. They are the shared memory that connects plugins, rules, and actions together.

Basic Types

Text, number, boolean, counter

Temporal Types

Timer (count up/down), date, accumulated values

Computed Types

State machines, expressions, mapped values

Collection Types

Lists, objects, maps, sets, JSON

Variables can be referenced anywhere using {{variableName}} syntax — in plugin content, overlay configs, rule conditions, and action parameters. The engine handles all interpolation, including array indexing ({{items[0]}}), dynamic indexing ({{items[currentIndex]}}), timer decomposition ({{timer.minutes}}), and fallback values ({{name || 'Guest'}}).

Variable Change Propagation
When a variable changes, the engine automatically re-interpolates all content that references it and pushes updates to affected plugins. It also evaluates any onChange rules attached to that variable, which can trigger further actions.

Rules & Actions

Rules define interactive behaviors using a trigger → condition → action pattern. They are the primary way to make videos respond to user interaction.

Trigger
When does this fire?
Object interaction, scene enter,
plugin event, variable change
Conditions
Should it execute?
Variable comparisons,
time ranges, event data checks
Actions
What happens?
Set variable, show overlay,
seek, navigate, plugin action

MP4E supports 69 action types spanning playback control, navigation, UI changes, variable operations, layer management, plugin communication, and more. Actions compose into reactive chains: a rule can set a variable, which triggers an onChange rule, which evaluates conditions and fires more actions — creating cascading feedback loops from a single user interaction.

Rules can be attached at multiple levels: project-wide (global), per-scene, per-group, per-overlay, or triggered by plugin events. The engine evaluates all matching rules in priority order and executes their actions — all within the sandboxed runtime.

Scenes

Scenes are time-based markers that divide the video into logical segments. Each scene has a time range and fires lifecycle events when playback crosses its boundaries: onEnter, onExit, onStart, onEnd, and onLoop. These events can trigger rules and actions — showing overlays, setting variables, changing layers, or anything else the rules engine supports.

Example: An e-commerce video has three scenes: "Product Showcase" (0:00–0:15), "Feature Comparison" (0:15–0:35), and "Call to Action" (0:35–0:45). Entering "Call to Action" fires an onEnter event that triggers a rule to show a purchase button overlay.

Use goToScene actions to create non-linear video flows, branching narratives, or conditional story paths.

How It Fits Together

All these concepts connect in a straightforward way:

1

Layers contain overlays. Scenes mark time segments and fire events when playback enters or exits them.

2

Overlays render plugins — the interactive content the viewer sees and interacts with.

3

Overlays can stand alone, or optionally bind to objects and groups to follow tracked objects in the video.

4

User interactions (clicks, taps, hovers, remote select, keyboard focus), plugin events, scene transitions, and variable changes fire rules. Rules check conditions and execute actions.

5

Actions modify variables, control playback, show/hide overlays, navigate between scenes, or communicate with plugins — which can trigger more rules, creating reactive chains.

6

The engine orchestrates everything — evaluating rules, tracking state, interpolating variables, enforcing permissions — while the bridge renders the result on whatever platform the video plays on.

Self-Contained
All of this — objects, layers, overlays, plugins, variables, rules, scenes — is embedded directly in the MP4 file. The video carries its entire interactive application with it, no external dependencies required.

Ready to build?

Now that you understand the building blocks, dive into the specific documentation: