惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Why We Deliberately Crush Lithium Batteries (UN38.3 Crush Testing Explained) Command History & Completion The Three-Body Problem: AI Code, Supply Chain Attacks, and the Talent Exodus 로컬 LLM 셋업 가이드 (v27) Building Better .NET Worker Services with Cursor Rules Generate Professional PDF Invoices via REST API — JSON In, PDF Out Redis: Big Keys Destroem o Desempenho Compartilhado Agentic AI for Cybersecurity: Autonomous Threat Detection and Response Cron vs systemd daemon: which one for Node.js? Designing XSLT transforms with parameters and multiple inputs I Downloaded Gemma4:e2b On My Macbook in 2 steps Building an Autonomous SRE Agent: From Raw Telemetry to Safe, AI-Driven Remediation The EU AI Act in 2026: Reading the Law After the Omnibus I had zero coding knowledge. Here is "RetroTube", a 2010 YouTube sandbox prototype I built using AI! How to Validate Environment Variables in TypeScript (and Why You Should) I Built a CLI Tool That Writes Better Git Commits Than I Do Transfer Fees, Metadata, and Soulbound Tokens: My First Real Token Experiments on Solana Stop Using Fetch() in React: A Better Way To Call Your Backend Creando un Tetris con JavaScript VI: Complicando el juego. DeepSeek's API Price Cut Changed My Claude Code and ChatGPT Math [Boost] Perl 🐪 Weekly #774 - Perl is too HOT How to Track AI Usage Without Losing Revenue (Complete Guide) 77 Rules Later: What Graduating Our First Stack Actually Looked Like RAG 시스템 실전 구축 (v26) When Premature Scaling Leads to Operator Burnout Multi-Repo Microservice Changes Are a Coordination Problem. I Solved It With AI Agent Teams. The Next Frontier: How Multi-Agent Systems are Redefining Productivity The Kimwolf Bust Just Outed Android Webcams as Botnet Fodder — Here's the Question Every Repurposed-Phone Camera Setup Has to Answer I'm an autonomous AI agent. I shipped 18 fixes to myself in one session. Building a Secure Future with Zero Trust Security Architecture Asynchronous Functions in Dart How I migrated magic-link login from Resend to AWS SES + Lambda five days before launch Edge Computing He creado una empresa ficticia IT/OT para poder encontrar sus vulnerabilidades y reforzar su seguridad en sus activos críticos Why I Built @editora/react I built a tiny UGC script generator because hooks are the hardest part The Phone Is Becoming the New Terminal Why Most AI Music Tools Feel Wrong to Developers Goroutines vs. Promises: Why Go and JavaScript Look at Concurrency Completely Differently How I Use Antigravity 2.0 to Navigate Open-Source Codebases and Make Better Technical Decisions Understanding Basic HTML & CSS Concepts for Beginners Go Error Handling: Annoying or Awesome? Your To-Do List Doesn't Know You — So I Gave Mine Three Brains Shell Basics (Bash, Zsh, Sh) Free MongoDB GUI Tool for Developers, Students, and Teams Designing High-Performance Blockchain Indexers Choosing Models for an Agentic Chat App on Amazon Bedrock How Smart Growth Teams Automate Their Marketing Stack in 2026 (Without Hiring More People) What I Learned About Memory-Augmented AI Agents Seven Docker Tips Every Engineer Should Know (from Docker Captains) Welcome to the Fast-Food Era of Testing: Over-Weight by Tests How to use Claude in vscode? Prompt Engineering for Automated Evaluation: Making LLMs the Judge in AI Builder Solutions Full Stack Projects Are Not Enough Anymore Virtualization & Cloud Basics Orakle: Turning Raw Blockchain Data into Intelligence with Gemma 4 Building an Autoposting Pipeline with Hermes Agent: Why Waterfall Beats Parallel, and the Edge Cases Nobody Talks About OpenShift Virtualization Migration Advisor — Local-First, Powered by Gemma 4 26B MoE WebMCP is coming — so I’m building webmcp.js I Disappeared for 4 Months After Launch - Here's What Brought Me Back Jira Is Turing-Complete (And You've Been Coding in It) NyayAI: Building an AI Legal Assistant for 1.4 Billion People — A Technical Deep Dive E-commerce Order Automation: Stripe + Invoice + Shipping Workflow How to Evaluate AI Agents: LLM-as-Judge Tutorial The Interview Prep Stack I Used as a Senior Software Engineer Targeting Big Tech Gemma4 Challenge OptiLearn - Powered by Google Gemma 4 Aura — The Gemma 4 Powered Agentic Web Copilot & Self-Healing Accessibility Engine I built a tool that catches misleading charts using Gemma 4 running locally Worklog companion with Gemma4 GBase: Building LLM Agents That Actually Learn from Their Mistakes Blossom — a small step toward student mental wellbeing WordPress Performance Monitoring: A Complete Guide Principal Components in TypeScript (Part 4) When three sharp wallets agree: what consensus signals on Polymarket actually mean I Built a Fail-Fast Rust Scheduler with Background OAuth Auto-Refresh (Part 2) Sharing is caring How Putting Faces (Literally) to My AI Garden Images Gave It a Personality Sofi Log #001: Thailand's Tourism Tax & the 180-Day AI Surveillance Wall Sofi Log #006: Decentralized IP-Address Obfuscation Specs Sofi Log #008: Bypassing Legacy Cross-Border Bank Fee Traps Secret Rotation Automation: The Operational Cost of Security Sofi Log #009: Portable Identity & DID Passport Framework Sofi Log #011: Autonomous Smart Treasury Repatriation Specs History of Linux & Unix I asked Claude if my plan was on track for the goal — and got an honest 'No' PHPStan 'expects X, Y given' — the trace it doesn't give you Using Gemma4 2B to Assist Community Health Workers Open-source Playwright wrapper that passes bot.sannysoft.com, pixelscan, and CreepJS in headless mode Policy Storyteller: Turning Nepali Bills into Human Stories with Gemma 4 Avoid Cross Module Dependencies with Dependency Cruiser Invariant-Driven Architecture: 20M transactions on a €80/mo Cloud VM. Stop using external npm packages just to generate a UUID v4 Choosing the Right Gemma 4 Model Matters More Than Choosing the Best One Your LLM Is Not an Agent. Your Framework Is Not Enough. You Need a Harness. From HTTPS to UCP: Shopping Is About to Stop Being Your Problem From Creation to Consumption: How Antigravity 2.0 and Gemini Spark Are Defining the Agentic Era 10 Mistakes I Wish I Knew Before Taking the CKA Exam AI That Actually Does Stuff: Autonomous Agents Explained
How to Automate Android Without Appium
Elliot Gao · 2026-05-25 · via DEV Community

Elliot Gao

You do not need Appium for every Android automation task.

Appium is the right tool when you need a full WebDriver-based mobile testing framework. But many Android workflows are smaller than that. You may only need to open an app, tap visible buttons, type into fields, wait for a result, and collect a screenshot on failure.

For those jobs, a CLI can be enough.

Handsets lets you automate Android from the terminal without root and without installing a visible helper app on the phone.

Quick answer

The fastest way to automate Android without Appium is:

hs use
hs tap "Continue"
hs fill "Email" "you@example.com"
hs wait "Dashboard"

Enter fullscreen mode Exit fullscreen mode

That gives you the core automation loop: connect to the device, act on visible UI labels, and wait for the next state. You still use normal Android debugging access. You do not need root, WebDriver, or an Appium server.

What you need

You need:

  • An Android phone or emulator.
  • USB debugging enabled.
  • adb on your path.
  • Handsets installed on your host machine.

Install:

curl -fsSL https://raw.githubusercontent.com/elliotgao2/handsets/main/install.sh | bash

Enter fullscreen mode Exit fullscreen mode

Connect:

hs use

Enter fullscreen mode Exit fullscreen mode

Now you can control the device with commands.

Tap by text

Raw adb can tap coordinates:

adb shell input tap 540 860

Enter fullscreen mode Exit fullscreen mode

That is brittle. It depends on screen size, density, orientation, and layout.

With Handsets, tap the visible label:

hs tap "Continue"

Enter fullscreen mode Exit fullscreen mode

Use --visible and --unique when a script should fail rather than guess:

hs tap "Continue" --visible --unique --timeout 5s

Enter fullscreen mode Exit fullscreen mode

This is the main difference from raw adb shell input tap. The script says what it means. It does not encode where a button happened to be on one device.

Fill fields

Use fill for text fields:

hs fill "Email" "you@example.com"
hs fill "Password" "$PASSWORD"
hs tap "Sign in"

Enter fullscreen mode Exit fullscreen mode

If labels are repeated, use selectors:

hs fill 'EditText:below(TextView[text=Email])' "you@example.com"
hs fill 'EditText:below(TextView[text=Password])' "$PASSWORD"

Enter fullscreen mode Exit fullscreen mode

The selector syntax is CSS-like and built for real Android UI trees.

Wait for the next screen

Do not write sleeps unless you truly need a fixed delay.

Instead of:

hs tap "Continue"
sleep 5

Enter fullscreen mode Exit fullscreen mode

Use:

hs tap "Continue"
hs wait "Dashboard" --timeout 15s

Enter fullscreen mode Exit fullscreen mode

This makes the script faster on fast devices and more reliable on slow devices.

A complete script

Here is a no-Appium Android login script:

#!/usr/bin/env bash
set -euo pipefail

hs use
hs go com.example.app
hs wait "Sign in" --timeout 15s

hs fill "Email" "$APP_EMAIL"
hs fill "Password" "$APP_PASSWORD"
hs tap "Continue" --visible --unique
hs wait "Dashboard" --timeout 20s

Enter fullscreen mode Exit fullscreen mode

Add failure artifacts:

ARTIFACTS="/tmp/android-run-$$"
mkdir -p "$ARTIFACTS"
trap 'hs ui > "$ARTIFACTS/ui.txt"; hs see --size 768 "$ARTIFACTS/screen.jpg"; hs logs --tail 200 > "$ARTIFACTS/logs.txt"; echo "$ARTIFACTS"' ERR

Enter fullscreen mode Exit fullscreen mode

Now the script leaves behind the UI, screenshot, and logs when something breaks.

Can I just use adb?

Sometimes, yes.

Raw adb is great for low-level device commands:

adb shell am start -n com.example/.MainActivity
adb shell input keyevent BACK
adb shell wm size

Enter fullscreen mode Exit fullscreen mode

It becomes awkward when you need semantic UI automation:

  • Tap the button labeled "Continue".
  • Fill the password field below "Password".
  • Wait until "Dashboard" appears.
  • Fail if there are two matching buttons.

That is where a higher-level tool helps. Handsets still uses adb underneath, but it gives you label-based actions and structured failure modes.

When Appium is still better

Use Appium if you need:

  • iOS support.
  • WebDriver compatibility.
  • A full test framework.
  • Cloud device farm integrations.
  • Rich reports and recorders.
  • A large QA ecosystem.

Those are real strengths.

But if your goal is Android-only CLI automation, Appium may be more stack than you need.

Why this matters for LLM agents

LLM-driven Android automation benefits from a small text interface.

Handsets can print the current screen as an action table:

fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860

Enter fullscreen mode Exit fullscreen mode

That is easier for a model to consume than a full XML tree. It also reduces prompt size for long trajectories.

Summary

You can automate Android without Appium if your workflow is:

  • Android only.
  • CLI-first.
  • Label-based.
  • No-root.
  • Scripted from shell or Python.

Start with:

hs use
hs ui
hs tap "Continue"
hs wait "Welcome"

Enter fullscreen mode Exit fullscreen mode

That covers more Android automation work than you might expect.

FAQ

Do I need Appium to automate Android?

No. Appium is useful for full mobile test frameworks, especially cross-platform suites, but Android can also be automated from the command line with adb and tools like Handsets.

Can I automate Android without root?

Yes. For normal UI automation, root is not required. You can tap, type, swipe, inspect visible UI, wait for text, and capture screenshots when the current app allows it.

Is this better than Appium?

It depends. Handsets is better for Android-only CLI scripts, LLM agents, and fast tap-heavy flows. Appium is better for cross-platform QA infrastructure and WebDriver-based test suites.

Can I run this in CI?

Yes, as long as your CI runner can access an Android emulator or connected device with adb. The commands are shell-friendly and return normal exit codes.

Related guides


Originally published at https://handsets.dev/blog/how-to-automate-android-without-appium/.