Voice + vision.

Multi-modal in, structured out. Talk to @Atlas, drop a screenshot, paste a Shopify admin URL. the copilot resolves the surface and answers with structured tool calls.

What it is

Voice + vision is the input layer of the BoostEcom workspace. It turns spoken instructions and visual references into the same structured context the rest of the system uses. so the model behind @Atlas sees the same shape regardless of how you brought it the data.

How it works

Voice turn detection via Hume.ai. push-to-talk and continuous modes.
Vision via uploaded screenshots, drag-and-drop, or admin URL resolution.
Output is always structured tool-call JSON so workflows can consume it.
Same model surface. voice and vision route through AI Gateway like text.

When to use it

Voice for "while I drive home, summarise yesterday's drop" or quick brainstorms. Vision for "this product page looks off. fix the spacing", or pasting a third-party site to extract layout intent.

Limitations

Voice latency depends on network. Hume.ai is global but mobile networks vary.
Vision OCR accuracy degrades on low-resolution or heavily compressed images.
Real-time voice is metered separately from text credits.

Bring your own context.

Talk, paste, drop. @Atlas makes sense of it.

Start free See pricing