Camera bots

Kid points the iPad at the world. Your skill scores or rewards what it sees.

Anatomy of a camera bot

The canvas owns the camera. Your agent owns the model. Sprout owns the gem.

Canvas - <video> + getUserMedia(), capture a frame, SproutBridge.postMessage the base64 to the host (or hand it to your skill body via completionSchema).
Skill body - call your vision model with the frame + the rubric ("does this plate include a vegetable?").
Tool calls - on a pass, gems_adjust with the parent's verbatim reason. On a miss, sprout.signal('hint-requested') and let the kid try again.

Four bot patterns

Nutrition bot

Point at a meal, gem if veggies present. Canvas captures a frame; your agent runs a vision rubric; on pass, gems_adjust with reason: "lunch had a vegetable". Parents see the audit trail later when they wonder why Jay's balance jumped.

Take-medicine bot

Verify the pill was taken before screen-time unlocks. The canvas walks the kid through showing the pill, taking it, showing the empty hand. Your agent grades each frame; the skill posts a screentime_review_request approval only on a successful sequence. Refusal is a refusal: never auto-approve.

Soccer drill bot

Form check on a free kick. Canvas records a short clip, your agent runs a pose model, returns a critique ("plant foot too close, try again"). Use sprout.signal('attempt-successful') on the third clean rep; celebration on a streak.

Yoga pose bot

Hold the pose, hold the gem. Canvas opens the camera, your agent samples frames every second, the gem ticks up while the kid holds. sprout.timed on completion. Works for piano scales, plank, breathing: anything that rewards duration.

What ships today

A canvas can open the camera. The CSP allows <video> and getUserMedia inside the iframe; the canvas can capture frames, encode them, and post them to the host via SproutBridge.postMessage. From there your skill body (running on your agent) calls any vision model you like and decides what to do with the result.

infoThe pattern works end-to-end if you wire your own vision step. The kid app handles the camera; you handle the model; Sprout handles the gem.

Roadmap

First-class vision tools on the partner MCP (a hand the skill declares, the platform runs, no model wiring needed) are intended but not shipped. When they land, the same bots get simpler: drop your bespoke vision call, declare vision:read at authoring time, get back a structured grade.

Try it

HTML

<!-- Inside the canvas HTML -->
<video id="cam" autoplay playsinline></video>
<button class="btn btn-primary btn-lg" onclick="check()">Check my plate</button>
<script>
async function init() {
const s = await navigator.mediaDevices.getUserMedia({video: true});
document.getElementById('cam').srcObject = s;
}
async function check() {
const video = document.getElementById('cam');
const c = document.createElement('canvas');
c.width = video.videoWidth; c.height = video.videoHeight;
c.getContext('2d').drawImage(video, 0, 0);
const dataUrl = c.toDataURL('image/jpeg', 0.8);
// Hand the frame to your skill (via completion or bridge)
sprout.complete({frame: dataUrl});
window.SproutBridge.postMessage(JSON.stringify({
type: 'scored', score: 1, total: 1
}));
}
init();
</script>

Shell

# 1. Read completion result from task_review or your bridge listener
# 2. Call your vision model with the frame + rubric
# 3. On pass:
gems_adjust({
childId: "<kid>",
delta: 3,
reason: "lunch had a vegetable (auto-checked)"
})