Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了，但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程，有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now

Ratratatyu · 2026-05-23 · via DEV Community

Hi everyone! Today I want to share practical experience in integrating machine learning models into mobile ecosystems. I recently completed the research and development of two MVP applications (on Native Android and Flutter), defended this project at an international conference and now I want to share my integration experience with you.

In this article, we will analyze in detail the difference between local and server AI computing, compare the implementation features of the native layer on Kotlin and the cross-platform layer on Dart, and also analyze non-obvious bugs that you may encounter when working with the camera and file system.

If you are interested in a specific platform, you can skip directly to the relevant section in the navigation below. The source code of both projects is open under the MIT license, links to repositories (GitHub/GitLab) can be found below.

Navigation:

Analysis of approaches (On-Device / On-Server)
Implementation on Native Android (Kotlin)
Implementation in Flutter (Dart)
Conclusions and links to projects

Part 1. On-Device vs On-Server: Architectural Choice

Выбор места деплоя мобильной модели — это всегда компромисс между ресурсами устройства, точностью вычислений и требованиями к безопасности.

Local approach (On-Device)

The model is deployed directly in the application sandbox and carries out inference (inference) locally, using the computing power of the CPU, GPU or specialized neural processors (NPU/TPU) of the smartphone. As an On-Device solution, I used Google ML Kit (Image Labeling SDK).

Pros:

Minimum ping (Latency): There are no network delays for the transmission of heavy media data.
Autonomy: Complete independence from the availability and quality of the Internet connection.
Confidentiality (Data Privacy): User data does not leave the device and is not transferred to third parties.

Cons:

Resource limitation: To prevent the application from burning out the battery and taking up gigabytes of memory, models are heavily quantized and cut down.
Reduced accuracy: Due to compression, the accuracy of lightweight models decreases (on average, the upper threshold of accuracy in basic classification tasks is about 61-65%)

Server approach (On-Server)

The model lives on a remote server and is accessible via the API. In my research, I used the Hugging Face Inference API (model google/vit-base-patch16-224).

Pros:

High accuracy: You can deploy heavy State-of-the-Art (SOTA) models, LLMs, or huge ensembles of neural networks with a colossal class base on the server.
Client unloading: The smartphone only fulfills the network request, does not heat up and does not waste power on complex mathematical calculations.

Cons:

Network dependence: No Internet - no AI.
Infrastructure and security costs: It is necessary to provide encryption of communication channels (TLS/SSL), protect API keys from reverse engineering and pay for server capacity

Part 2. Native implementation: Android

In a native application, the key task is to isolate heavy operations from the main interface thread (Main Thread) to avoid UI friezes and Application Not Responding (ANR) errors.

Working with API (On-Server)

To send POST requests (transmitting an image byte array), a combination of OkHTTP and Retrofit was used. Conversion of the server's JSON response into strictly typed Kotlin data classes occurs automatically thanks to converters. The network call is encapsulated in a suspend function, which allows declarative control of asynchrony.

private const val TOKEN = "..."

data class PredictionResponse(val label: String, val score: Float)

interface HuggingFaceApi {

    @POST
    suspend fun postImage(
        @Url url: String,
        @Header("Authorization") token: String,
        @Body body: RequestBody
    ): List<PredictionResponse>
}
class ApiModel(){
    private val retrofit = Retrofit.Builder()
        .baseUrl("https://router.huggingface.co/hf-inference/")
        .addConverterFactory(GsonConverterFactory.create())
        .build()

    val service: HuggingFaceApi = retrofit.create(HuggingFaceApi::class.java)

     suspend fun classifyImage(imageBitmap: Bitmap): Pair<String, Float>{
        return try {
            val imageToButeArray = compressBitmap((imageBitmap))
            val requestBody = imageToButeArray.toRequestBody("image/jpeg".toMediaTypeOrNull())

            val result = service.postImage(
                "models/google/vit-base-patch16-224",
                "Bearer $token",
                requestBody
            )
            val firstLabel = result.firstOrNull()
                ?: return "Не распознано" to 0f

            firstLabel.label to firstLabel.score



        }catch (e: Exception){
            Log.e("network", "Request failed", e)
            "Oшибка сервера" to 0f

        }
    }

}

Please note that before sending the image to the server we compress it and that we do not send too many bytes over the network

suspend fun compressBitmap(bitmap: Bitmap): ByteArray = withContext(Dispatchers.IO) {

    val stream = ByteArrayOutputStream()

    bitmap.compress(
        Bitmap.CompressFormat.JPEG,
        80,
        stream
    )

     stream.toByteArray()
}

Local inference (On-Device via ML Kit)

To work with the local ML Kit model, the library is configured via ImageLabelerOptions. We explicitly set setConfidenceThreshold(0.4f) - the model’s confidence threshold. By increasing this threshold, we cut off false positives, but force the algorithm to work more intensively.

To ensure stability and save RAM, the labeler object is initialized through the Kotlin delegate mechanism (by lazy):

private val labeler by lazy {
    val options = ImageLabelerOptions.Builder()
        .setConfidenceThreshold(0.4f)
        .build()
    ImageLabeling.getClient(options)
}

Why is by lazy here?

Saving resources: An instance of the heavy ML Kit client is created not when the Activity is launched, but strictly at the time of the first request (when the user takes a photo).
Context safety: Initialization is guaranteed to occur when the applicationContext is already fully formed by the operating system, which prevents NullPointerException from occurring.

The call to labeler.process(image) is asynchronous in nature (runs on Google's Task API). To make it linear and MVVM-friendly, we wrap it with coroutines and wait for the execution result.

Architectural layer and flow control

In MainViewModel, all calls are wrapped in viewModelScope.launch. Depending on the position of the state switch (On-Device / On-Server), the required method is launched:

private fun classifyImage(bitmap: Bitmap?) {
        if (bitmap == null) return
        _uiState.update {
            it.copy(isLoading = true)
        }
        viewModelScope.launch {
            val startTime = System.currentTimeMillis()

            try {
                val (label, confidence) = if (_uiState.value.isOnDevice) {
                    mlKit.analyze(bitmap!!)
                } else {
                    apiModel.classifyImage(bitmap!!)
                }

                val duration = System.currentTimeMillis() - startTime

                _uiState.update {
                    it.copy(
                        classificationText = label,
                        confidenceValue = confidence,
                        timeTakenDuration = duration,
                        isLoading = false
                    )
                }
            } catch (e: Exception) {
                _uiState.update {
                    it.copy(
                        classificationText = "Ошибка",
                        confidenceValue = 0f,
                        timeTakenDuration = 0L
                    )
                }
            }
        }

Working with the camera on Native Android

On the Native Android side, working with the camera looks concise thanks to the modern SDK CameraX. This is a Lifecycle-aware library: it knows when an Activity is minimized (onPause) or destroyed (onDestroy), and automatically releases camera resources and closes streams (ImageAnalysis / ImageCapture). We do not need to manually write the onDispose logic, and the result of a successful snapshot in the code can be a ready-made Bitmap object held in RAM, which eliminates unnecessary disk read-write operations.

Part 3. Cross-platform implementation: Flutter (Dart, Dio, Method Channels)

The Flutter application conceptually solves the same problems, but faces the specifics of Dart’s single-threaded architecture (Event Loop).

Network Inference (Dio + Futures)

To communicate with Hugging Face on Flutter, we used the Dio package. To prevent a heavy request and network packet processing from blocking the rendering of UI frames (after all, Dart runs on a single thread), we package the call into an asynchronous Future/Await model. While the network is chasing bytes, Event Loop calmly continues to render the interface.

final dio = Dio();

Future<List<dynamic>?> apiModel(String path) async {
  final Uint8List? imageBytes = await compressImage(path);

  if (imageBytes == null) {
    return null;
  }

  try {
    final response = await dio.post(
      "https://router.huggingface.co/hf-inference/models/google/vit-base-patch16-224",
      data: imageBytes,
      options: Options(
        headers: {
          "Authorization": "Bearer 'your_token'", // put your token from hugging face here
          "Content-Type": "image/jpeg",
        },
      ),
    );
    return response.data;


  } on DioException catch (e) {
    debugPrint("Error: $e");

  }
  return null;
}

Please note that before sending the image to the server we compress it and that we do not send too many bytes over the network

Future<Uint8List?> compressImage(String path) async {
  final Uint8List? result = await FlutterImageCompress.compressWithFile(
    path,
    quality: 80,
    format: CompressFormat.jpeg,
  );

  return result;
}

Native bridge: MethodChannel for ML Kit

Since there is no full-fledged direct SDK for ML Kit Image Labeling on Dart that provides the required level of customization, a Production approach is used: creating a MethodChannel (native bridge).

The Dart code acts as a client: it generates the predictOnDevice event and passes the path to the saved photo through the channel.

class NativeMlService {
  static const MethodChannel _channel = MethodChannel("mlkit_photo_analyze");

  static Future<Map> onDeviceMethod(String imagePath) async {
    final result = await _channel.invokeMethod(
      'imageLabeling',
      {'imagePath': imagePath},
    );
    return Map.from(result);
  }
}

On the Android side (MainActivity.kt) we catch this call through setMethodCallHandler. The same rules apply here: we deploy the coroutine on a background thread, process the image via ML Kit, but we transmit the response to result.success() strictly returning to the Main Thread, since the Flutter engine will not be able to accept data from the Android side thread.

 override fun configureFlutterEngine(flutterEngine: FlutterEngine) {
        super.configureFlutterEngine(flutterEngine)

        MethodChannel(flutterEngine.dartExecutor.binaryMessenger, CHANNEL).setMethodCallHandler { call, result ->
            if(call.method == "imageLabeling"){
                val imagePath = call.argument<String>("imagePath")
                if (imagePath == null) {
                    result.error("ArgError", "Image path is null", null)
                    return@setMethodCallHandler
                }
                // Run ML inference on background thread to avoid blocking UI
                CoroutineScope(Dispatchers.IO).launch {
                    try {
                          // image processing and model calling....

                        // Return result on main thread
                        withContext(Dispatchers.Main){
                            result.success(response)
                        }
                    }....
//rest of the code on GitHub/ GitLab....

Camera in Flutter and Data Race (Race Condition)

The most difficult and interesting stage of developing the Flutter version was the integration of the camera plugin and debugging the interaction of file systems. Here two important differences from the native were revealed:

Manual Lifecycle Management: In Flutter, the developer must manually initialize the CameraController, catch available lenses (by selecting CameraLensDirection.back) and, most importantly, be sure to call _controller?.dispose() in the dispose() method of the widget.

If you forget, the camera will remain locked in the operating system, and other applications will not be able to open it.

Ghost File Problem (Race Condition):
The _controller?.takePicture() method in Flutter returns an XFile object that physically stores the snapshot in the device cache directory (image.path). This is where the classic data engineering race comes into play.

When Flutter happily reports that the photo has been taken and passes the path to the native code via MethodChannel, the native part (Kotlin) instantly tries to execute BitmapFactory.decodeFile(imagePath). But at the level of the Android operating system, the file in the cache may still be blocked - the stream of data writing from the camera buffer to the disk has not yet had time to physically close.

This was reflected in the logs as a hard crash:
E/ple.flutter_mvp: FrameInsert open fail: No such file or directory
The native code crashed, Bitmap returned null, and Flutter received an empty null reference instead of a data structure.

We get a similar error when we send a picture to the server because we are practically sending an empty picture

Solution to the problem:
To eliminate this data race, two-way protection was applied:

On the Dart (Provider) side: Before calling the native method/sending to the server, we artificially let the system “breathe out” by adding a micro-delay:

await Future.delayed(const Duration(milliseconds: 200));

This time is enough for the OS to complete disk operations.

Conclusion and conclusions

The conducted MVP study clearly proves: On-Device and On-Server approaches do not compete, but complement each other.

On-Server is indispensable for heavy computing (LLM, GPT, high-definition video processing).
On-Device is ideal for utilitarian tasks (scanning documents, recognizing simple objects, working in strict offline conditions).

In modern Production applications, the best practice is a hybrid approach**: fast primary output is done locally, and deep data validation is sent to the backend server.

Regarding the choice of platform: Native Android gives absolute control over resources, hardware and threads out of the box. Flutter, despite the limitations of single-threading Dart, with proper use of MethodChannel, compliance with the rules for dispatching coroutines in the native layer and taking into account file system timings, allows you to create responsive and productive AI applications.

GitHub

RatRatatyu / mobile-ai-mvp

Two MVP applications demonstrating on-device and on-server AI model integration in Jetpack Compose (Android) and Flutter.

Mobile AI Integration: On-Device vs On-Server MVP Comparison

This repository contains two MVP applications developed for the International Scientific and Practical Conference
"Student Research: Challenges and Development Trends".

🏫 Conference Information

Event: International Scientific and Practical Conference
"Student Research: Challenges and Development Trends"
Organizers:
Ministry of Education of the Republic of Kazakhstan,
Department of Education of Aktobe Region,
Aktobe Higher Humanitarian College,
National Centre for Professional Development "Orleu"
Section:
Science, Technology, and Digital Innovations
Date:
May 22, 2026

📱 Project Overview

The project explores the architectural choice between running AI models directly on a smartphone (On-Device) versus processing them on a remote server (On-Server)

For this research, image classification was chosen as the primary use case to demonstrate the differences in performance and accuracy

🤖 Applied Models

On-Device: Powered by the ML Kit Image Labeling API from Google for local, real-time inference

On-Server: Powered by the Hugging Face google/vit-base-patch16-224…

GitLab

RatRatatyu / mobile-ai-mvp · GitLab

Two MVP applications demonstrating on-device and on-server AI model integration in Jetpack Compose (Android) and Flutter.

favicon gitlab.com

I would like to note that I am just developing and learning in this direction, so perhaps my conclusions may be inaccurate, or the descriptions may not be entirely correct, so I will be grateful if you point out my mistakes in the comments, and I will also be glad if you put stars in GitHub and GitLab if the projects are useful to you

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

DEV Community

Part 1. On-Device vs On-Server: Architectural Choice

Local approach (On-Device)

Server approach (On-Server)

Part 2. Native implementation: Android

Working with API (On-Server)

Local inference (On-Device via ML Kit)

Architectural layer and flow control

Working with the camera on Native Android

Part 3. Cross-platform implementation: Flutter (Dart, Dio, Method Channels)

Network Inference (Dio + Futures)

Native bridge: MethodChannel for ML Kit

Camera in Flutter and Data Race (Race Condition)

Conclusion and conclusions

RatRatatyu / mobile-ai-mvp

Two MVP applications demonstrating on-device and on-server AI model integration in Jetpack Compose (Android) and Flutter.

Mobile AI Integration: On-Device vs On-Server MVP Comparison

🏫 Conference Information

📱 Project Overview

🤖 Applied Models

RatRatatyu / mobile-ai-mvp · GitLab