Camera bots
Vision-grounded coaching. Kid points the iPad at the world. Your skill scores or rewards what it sees.
Kid points the iPad at the world. Your skill scores or rewards what it sees.
Anatomy of a camera bot
The canvas owns the camera. Your agent owns the model. Sprout owns the gem.
- Canvas -
<video>+getUserMedia(), capture a frame,SproutBridge.postMessagethe base64 to the host (or hand it to your skill body viacompletionSchema). - Skill body - call your vision model with the frame + the rubric ("does this plate include a vegetable?").
- Tool calls - on a pass,
gems_adjustwith the parent's verbatim reason. On a miss,sprout.signal('hint-requested')and let the kid try again.
Four bot patterns
Nutrition bot
Point at a meal, gem if veggies present. Canvas captures a frame; your agent runs a vision rubric; on pass, gems_adjust with reason: "lunch had a vegetable". Parents see the audit trail later when they wonder why Jay's balance jumped.
Take-medicine bot
Verify the pill was taken before screen-time unlocks. The canvas walks the kid through showing the pill, taking it, showing the empty hand. Your agent grades each frame; the skill posts a screentime_review_request approval only on a successful sequence. Refusal is a refusal: never auto-approve.
Soccer drill bot
Form check on a free kick. Canvas records a short clip, your agent runs a pose model, returns a critique ("plant foot too close, try again"). Use sprout.signal('attempt-successful') on the third clean rep; celebration on a streak.
Yoga pose bot
Hold the pose, hold the gem. Canvas opens the camera, your agent samples frames every second, the gem ticks up while the kid holds. sprout.timed on completion. Works for piano scales, plank, breathing: anything that rewards duration.
What ships today
A canvas can open the camera. The CSP allows <video> and getUserMedia inside the iframe; the canvas can capture frames, encode them, and post them to the host via SproutBridge.postMessage. From there your skill body (running on your agent) calls any vision model you like and decides what to do with the result.
Roadmap
First-class vision tools on the partner MCP (a hand the skill declares, the platform runs, no model wiring needed) are intended but not shipped. When they land, the same bots get simpler: drop your bespoke vision call, declare vision:read at authoring time, get back a structured grade.
Try it
<!-- Inside the canvas HTML -->
<video id="cam" autoplay playsinline></video>
<button class="btn btn-primary btn-lg" onclick="check()">Check my plate</button>
<script>
async function init() {
const s = await navigator.mediaDevices.getUserMedia({video: true});
document.getElementById('cam').srcObject = s;
}
async function check() {
const video = document.getElementById('cam');
const c = document.createElement('canvas');
c.width = video.videoWidth; c.height = video.videoHeight;
c.getContext('2d').drawImage(video, 0, 0);
const dataUrl = c.toDataURL('image/jpeg', 0.8);
// Hand the frame to your skill (via completion or bridge)
sprout.complete({frame: dataUrl});
window.SproutBridge.postMessage(JSON.stringify({
type: 'scored', score: 1, total: 1
}));
}
init();
</script># 1. Read completion result from task_review or your bridge listener
# 2. Call your vision model with the frame + rubric
# 3. On pass:
gems_adjust({
childId: "<kid>",
delta: 3,
reason: "lunch had a vegetable (auto-checked)"
})Further reading
- Canvas guts
- Author a canvas
- Tip gems
- Sensitivity tiers (composite safety)