Connect your Android or iOS device and let AI agents operate any mobile app through the native accessibility layer. No APIs, no screenshots, no root. Real apps, real actions, real results.
Pilot reads the native UI tree of any app — the same structure screen readers use — and operates it with surgical precision.
Other tools take screenshots and guess where to click. Pilot reads the actual accessibility tree — the same semantic structure that screen readers use. Every button, every list item, every text field is a structured node with role, label, and coordinates. No guessing. No pixel matching. No hallucinated clicks.
iFood, Uber, WhatsApp, banking apps, government services — if it runs on your phone, Pilot can operate it. No APIs to configure. No OAuth flows. No developer access. The app doesn't know it's being automated. It just works.
Each tool does exactly one thing well. press finds a button by text and taps it. toggle sets a switch to the desired state idempotently. fill_form fills multiple fields at once. wait_for polls until a condition is met. The AI agent orchestrates them — you describe the goal.
Connect your Android or iOS device over your local network. Push-based routing — no polling, no port forwarding required.
Reads the native UI tree via Android AccessibilityService. Every node is a structured object with role, text, bounds, and actions.
Find the search field, type the query, submit, and return results — all in a single tool call. Works across any app.
Pass a dictionary of label-value pairs and fill multiple form fields at once. Labels resolved by hint text, accessibility label, or position.
Set a checkbox or switch to the desired state. Calling it twice does the same thing. No accidental toggles.
Detects and closes overlays, popups, and dialogs automatically. Finds the dismiss button in the topmost window layer.
Take a screenshot when you need visual context. Returned as base64 for LLM analysis alongside the structured tree data.
Poll until a condition is met — element appears, loading finishes, or screen changes. Configurable timeout.
Verify the current screen matches expectations before acting. Check app name, visible text, or absence of elements.
Read or write the device clipboard. Copy confirmation codes, share text between desktop and phone.
Check battery, WiFi status, screen state, and device model. Know the state of the device before acting.
Commands travel through an encrypted WebSocket relay. Token rotation, keepalive, and bounded executors. Audited and hardened.
| Feature | Free | Budget — $9/mo | Pro — $29/mo |
|---|---|---|---|
| Device routing | — | — | Yes |
| Connected devices | — | — | Up to 3 |
| MCP tools | — | — | All 24 |
| Screen read / elements | — | — | Unlimited |
| Screenshot capture | — | — | Yes |
| Encrypted relay | — | — | Yes |
Download HuGR Wallet. Connect your device. Let agents handle the rest.