HuGR Pilot — Operate any mobile app from your desktop

How it works

Read the screen. Act on it.

Pilot reads the native UI tree of any app — the same structure screen readers use — and operates it with surgical precision.

screen_read — iFood

# AI reads the screen as structured data app: iFood (br.com.brainweb.ifood) keyboard: false overlay: none [toolbar] btn "Voltar" btn "Buscar em Bob's Shakes" [list: 20 items, scrollable] [item 0] "Milk Shake Tropical" heading "R$ 21,90" "A nossa baunilha gelada batida..." [item 1] "Bobs Max Tropical" "R$ 16,90"

agent action

# AI selects the item ❯ select_item("Milk Shake Tropical") → Found in list at index 0 → Tapped item → Waiting for screen change... OK Screen changed → product detail ❯ press("Adicionar") → Found button "Adicionar ao carrinho" → Tapped OK Item added to cart

Native UI tree

No screenshots.
No computer vision.

Other tools take screenshots and guess where to click. Pilot reads the actual accessibility tree — the same semantic structure that screen readers use. Every button, every list item, every text field is a structured node with role, label, and coordinates. No guessing. No pixel matching. No hallucinated clicks.

screen_elements

# Every interactive element, with role and coordinates i role text x y ───────────────────────────────────────────── 0 button "Voltar" 52 103 1 button "Buscar" 653 96 2 listitem "Milk Shake Tropical" 360 779 3 edittext "Buscar..." 360 125 4 tab "Pedidos" 540 1540 5 tab "Perfil" 680 1540

open_app + search

# Open any app by name — no drawer, no scrolling ❯ open_app("Uber") OK Launched com.ubercab ❯ screen_status() app: Uber keyboard: false overlay: none ❯ search("Aeroporto de Guarulhos") → Found search field → Typed query → Submitted OK 3 results found ❯ select_item("GRU Airport") OK Destination set

Universal

Any app. Any action.
Zero integration.

iFood, Uber, WhatsApp, banking apps, government services — if it runs on your phone, Pilot can operate it. No APIs to configure. No OAuth flows. No developer access. The app doesn't know it's being automated. It just works.

24 MCP tools

Surgical tools,
not raw commands.

Each tool does exactly one thing well. press finds a button by text and taps it. toggle sets a switch to the desired state idempotently. fill_form fills multiple fields at once. wait_for polls until a condition is met. The AI agent orchestrates them — you describe the goal.

available tools

Perception screen_read screen_elements screen_status read_list read_form Interaction press type_in select_item toggle dismiss scroll_down go_back open_app Task search fill_form wait_for assert_screen System take_screenshot clipboard device_info

Features

Everything in Pilot.

Device routing

Connect your Android or iOS device over your local network. Push-based routing — no polling, no port forwarding required.

Accessibility tree

Reads the native UI tree via Android AccessibilityService. Every node is a structured object with role, text, bounds, and actions.

Smart search

Find the search field, type the query, submit, and return results — all in a single tool call. Works across any app.

Form filling

Pass a dictionary of label-value pairs and fill multiple form fields at once. Labels resolved by hint text, accessibility label, or position.

Idempotent toggle

Set a checkbox or switch to the desired state. Calling it twice does the same thing. No accidental toggles.

Auto-dismiss

Detects and closes overlays, popups, and dialogs automatically. Finds the dismiss button in the topmost window layer.

Screenshot capture

Take a screenshot when you need visual context. Returned as base64 for LLM analysis alongside the structured tree data.

Wait conditions

Poll until a condition is met — element appears, loading finishes, or screen changes. Configurable timeout.

Screen assertions

Verify the current screen matches expectations before acting. Check app name, visible text, or absence of elements.

Clipboard access

Read or write the device clipboard. Copy confirmation codes, share text between desktop and phone.

Device info

Check battery, WiFi status, screen state, and device model. Know the state of the device before acting.

Encrypted relay

Commands travel through an encrypted WebSocket relay. Token rotation, keepalive, and bounded executors. Audited and hardened.

Plans

What you get on each tier.

Feature	Free	Budget — $9/mo	Pro — $29/mo
Device routing	—	—	Yes
Connected devices	—	—	Up to 3
MCP tools	—	—	All 24
Screen read / elements	—	—	Unlimited
Screenshot capture	—	—	Yes
Encrypted relay	—	—	Yes

Your phone.
Operated from your desk.

Read the screen. Act on it.

No screenshots.
No computer vision.

Any app. Any action.
Zero integration.

Surgical tools,
not raw commands.

Everything in Pilot.

Device routing

Accessibility tree

Smart search

Form filling

Idempotent toggle

Auto-dismiss

Screenshot capture

Wait conditions

Screen assertions

Clipboard access

Device info

Encrypted relay

What you get on each tier.

Your phone has 200 apps.
Now your AI has them too.

Your phone.Operated from your desk.

Read the screen. Act on it.

No screenshots.No computer vision.

Any app. Any action.Zero integration.

Surgical tools,not raw commands.

Everything in Pilot.

Device routing

Accessibility tree

Smart search

Form filling

Idempotent toggle

Auto-dismiss

Screenshot capture

Wait conditions

Screen assertions

Clipboard access

Device info

Encrypted relay

What you get on each tier.

Your phone has 200 apps.Now your AI has them too.

Your phone.
Operated from your desk.

No screenshots.
No computer vision.

Any app. Any action.
Zero integration.

Surgical tools,
not raw commands.

Your phone has 200 apps.
Now your AI has them too.