Hi everyone! Today I want to share practical experience in integrating machine learning models into mobile ecosystems. I recently completed the research and development of two MVP applications (on Native Android and Flutter), defended this project at an international conference and now I want to share my integration experience with you.
In this article, we will analyze in detail the difference between local and server AI computing, compare the implementation features of the native layer on Kotlin and the cross-platform layer on Dart, and also analyze non-obvious bugs that you may encounter when working with the camera and file system.
If you are interested in a specific platform, you can skip directly to the relevant section in the navigation below. The source code of both projects is open under the MIT license, links to repositories (GitHub/GitLab) can be found below.
Navigation:
- Analysis of approaches (On-Device / On-Server)
- Implementation on Native Android (Kotlin)
- Implementation in Flutter (Dart)
- Conclusions and links to projects
Part 1. On-Device vs On-Server: Architectural Choice
Выбор места деплоя мобильной модели — это всегда компромисс между ресурсами устройства, точностью вычислений и требованиями к безопасности.
Local approach (On-Device)
The model is deployed directly in the application sandbox and carries out inference (inference) locally, using the computing power of the CPU, GPU or specialized neural processors (NPU/TPU) of the smartphone. As an On-Device solution, I used Google ML Kit (Image Labeling SDK).
Pros:
- Minimum ping (Latency): There are no network delays for the transmission of heavy media data.
- Autonomy: Complete independence from the availability and quality of the Internet connection.
- Confidentiality (Data Privacy): User data does not leave the device and is not transferred to third parties.
Cons:
- Resource limitation: To prevent the application from burning out the battery and taking up gigabytes of memory, models are heavily quantized and cut down.
- Reduced accuracy: Due to compression, the accuracy of lightweight models decreases (on average, the upper threshold of accuracy in basic classification tasks is about 61-65%)
Server approach (On-Server)
The model lives on a remote server and is accessible via the API. In my research, I used the Hugging Face Inference API (model google/vit-base-patch16-224).
Pros:
- High accuracy: You can deploy heavy State-of-the-Art (SOTA) models, LLMs, or huge ensembles of neural networks with a colossal class base on the server.
- Client unloading: The smartphone only fulfills the network request, does not heat up and does not waste power on complex mathematical calculations.
Cons:
- Network dependence: No Internet - no AI.
- Infrastructure and security costs: It is necessary to provide encryption of communication channels (TLS/SSL), protect API keys from reverse engineering and pay for server capacity
Part 2. Native implementation: Android
In a native application, the key task is to isolate heavy operations from the main interface thread (Main Thread) to avoid UI friezes and Application Not Responding (ANR) errors.
Working with API (On-Server)
To send POST requests (transmitting an image byte array), a combination of OkHTTP and Retrofit was used. Conversion of the server's JSON response into strictly typed Kotlin data classes occurs automatically thanks to converters. The network call is encapsulated in a suspend function, which allows declarative control of asynchrony.
private const val TOKEN = "..."
data class PredictionResponse(val label: String, val score: Float)
interface HuggingFaceApi {
@POST
suspend fun postImage(
@Url url: String,
@Header("Authorization") token: String,
@Body body: RequestBody
): List<PredictionResponse>
}
class ApiModel(){
private val retrofit = Retrofit.Builder()
.baseUrl("https://router.huggingface.co/hf-inference/")
.addConverterFactory(GsonConverterFactory.create())
.build()
val service: HuggingFaceApi = retrofit.create(HuggingFaceApi::class.java)
suspend fun classifyImage(imageBitmap: Bitmap): Pair<String, Float>{
return try {
val imageToButeArray = compressBitmap((imageBitmap))
val requestBody = imageToButeArray.toRequestBody("image/jpeg".toMediaTypeOrNull())
val result = service.postImage(
"models/google/vit-base-patch16-224",
"Bearer $token",
requestBody
)
val firstLabel = result.firstOrNull()
?: return "Не распознано" to 0f
firstLabel.label to firstLabel.score
}catch (e: Exception){
Log.e("network", "Request failed", e)
"Oшибка сервера" to 0f
}
}
}
Please note that before sending the image to the server we compress it and that we do not send too many bytes over the network
suspend fun compressBitmap(bitmap: Bitmap): ByteArray = withContext(Dispatchers.IO) {
val stream = ByteArrayOutputStream()
bitmap.compress(
Bitmap.CompressFormat.JPEG,
80,
stream
)
stream.toByteArray()
}
Local inference (On-Device via ML Kit)
To work with the local ML Kit model, the library is configured via ImageLabelerOptions. We explicitly set setConfidenceThreshold(0.4f) - the model’s confidence threshold. By increasing this threshold, we cut off false positives, but force the algorithm to work more intensively.
To ensure stability and save RAM, the labeler object is initialized through the Kotlin delegate mechanism (by lazy):
private val labeler by lazy {
val options = ImageLabelerOptions.Builder()
.setConfidenceThreshold(0.4f)
.build()
ImageLabeling.getClient(options)
}
Why is by lazy here?
- Saving resources: An instance of the heavy ML Kit client is created not when the Activity is launched, but strictly at the time of the first request (when the user takes a photo).
- Context safety: Initialization is guaranteed to occur when the applicationContext is already fully formed by the operating system, which prevents NullPointerException from occurring.
The call to labeler.process(image) is asynchronous in nature (runs on Google's Task API). To make it linear and MVVM-friendly, we wrap it with coroutines and wait for the execution result.
Architectural layer and flow control
In MainViewModel, all calls are wrapped in viewModelScope.launch. Depending on the position of the state switch (On-Device / On-Server), the required method is launched:
private fun classifyImage(bitmap: Bitmap?) {
if (bitmap == null) return
_uiState.update {
it.copy(isLoading = true)
}
viewModelScope.launch {
val startTime = System.currentTimeMillis()
try {
val (label, confidence) = if (_uiState.value.isOnDevice) {
mlKit.analyze(bitmap!!)
} else {
apiModel.classifyImage(bitmap!!)
}
val duration = System.currentTimeMillis() - startTime
_uiState.update {
it.copy(
classificationText = label,
confidenceValue = confidence,
timeTakenDuration = duration,
isLoading = false
)
}
} catch (e: Exception) {
_uiState.update {
it.copy(
classificationText = "Ошибка",
confidenceValue = 0f,
timeTakenDuration = 0L
)
}
}
}
Working with the camera on Native Android
On the Native Android side, working with the camera looks concise thanks to the modern SDK CameraX. This is a Lifecycle-aware library: it knows when an Activity is minimized (onPause) or destroyed (onDestroy), and automatically releases camera resources and closes streams (ImageAnalysis / ImageCapture). We do not need to manually write the onDispose logic, and the result of a successful snapshot in the code can be a ready-made Bitmap object held in RAM, which eliminates unnecessary disk read-write operations.
Part 3. Cross-platform implementation: Flutter (Dart, Dio, Method Channels)
The Flutter application conceptually solves the same problems, but faces the specifics of Dart’s single-threaded architecture (Event Loop).
Network Inference (Dio + Futures)
To communicate with Hugging Face on Flutter, we used the Dio package. To prevent a heavy request and network packet processing from blocking the rendering of UI frames (after all, Dart runs on a single thread), we package the call into an asynchronous Future/Await model. While the network is chasing bytes, Event Loop calmly continues to render the interface.
final dio = Dio();
Future<List<dynamic>?> apiModel(String path) async {
final Uint8List? imageBytes = await compressImage(path);
if (imageBytes == null) {
return null;
}
try {
final response = await dio.post(
"https://router.huggingface.co/hf-inference/models/google/vit-base-patch16-224",
data: imageBytes,
options: Options(
headers: {
"Authorization": "Bearer 'your_token'", // put your token from hugging face here
"Content-Type": "image/jpeg",
},
),
);
return response.data;
} on DioException catch (e) {
debugPrint("Error: $e");
}
return null;
}
Please note that before sending the image to the server we compress it and that we do not send too many bytes over the network
Future<Uint8List?> compressImage(String path) async {
final Uint8List? result = await FlutterImageCompress.compressWithFile(
path,
quality: 80,
format: CompressFormat.jpeg,
);
return result;
}
Native bridge: MethodChannel for ML Kit
Since there is no full-fledged direct SDK for ML Kit Image Labeling on Dart that provides the required level of customization, a Production approach is used: creating a MethodChannel (native bridge).
The Dart code acts as a client: it generates the predictOnDevice event and passes the path to the saved photo through the channel.
class NativeMlService {
static const MethodChannel _channel = MethodChannel("mlkit_photo_analyze");
static Future<Map> onDeviceMethod(String imagePath) async {
final result = await _channel.invokeMethod(
'imageLabeling',
{'imagePath': imagePath},
);
return Map.from(result);
}
}
On the Android side (MainActivity.kt) we catch this call through setMethodCallHandler. The same rules apply here: we deploy the coroutine on a background thread, process the image via ML Kit, but we transmit the response to result.success() strictly returning to the Main Thread, since the Flutter engine will not be able to accept data from the Android side thread.
override fun configureFlutterEngine(flutterEngine: FlutterEngine) {
super.configureFlutterEngine(flutterEngine)
MethodChannel(flutterEngine.dartExecutor.binaryMessenger, CHANNEL).setMethodCallHandler { call, result ->
if(call.method == "imageLabeling"){
val imagePath = call.argument<String>("imagePath")
if (imagePath == null) {
result.error("ArgError", "Image path is null", null)
return@setMethodCallHandler
}
// Run ML inference on background thread to avoid blocking UI
CoroutineScope(Dispatchers.IO).launch {
try {
// image processing and model calling....
// Return result on main thread
withContext(Dispatchers.Main){
result.success(response)
}
}....
//rest of the code on GitHub/ GitLab....
Camera in Flutter and Data Race (Race Condition)
The most difficult and interesting stage of developing the Flutter version was the integration of the camera plugin and debugging the interaction of file systems. Here two important differences from the native were revealed:
Manual Lifecycle Management: In Flutter, the developer must manually initialize the CameraController, catch available lenses (by selecting CameraLensDirection.back) and, most importantly, be sure to call _controller?.dispose() in the dispose() method of the widget.
If you forget, the camera will remain locked in the operating system, and other applications will not be able to open it.
Ghost File Problem (Race Condition):
The _controller?.takePicture() method in Flutter returns an XFile object that physically stores the snapshot in the device cache directory (image.path). This is where the classic data engineering race comes into play.
When Flutter happily reports that the photo has been taken and passes the path to the native code via MethodChannel, the native part (Kotlin) instantly tries to execute BitmapFactory.decodeFile(imagePath). But at the level of the Android operating system, the file in the cache may still be blocked - the stream of data writing from the camera buffer to the disk has not yet had time to physically close.
This was reflected in the logs as a hard crash:
E/ple.flutter_mvp: FrameInsert open fail: No such file or directory
The native code crashed, Bitmap returned null, and Flutter received an empty null reference instead of a data structure.
We get a similar error when we send a picture to the server because we are practically sending an empty picture
Solution to the problem:
To eliminate this data race, two-way protection was applied:
On the Dart (Provider) side: Before calling the native method/sending to the server, we artificially let the system “breathe out” by adding a micro-delay:
await Future.delayed(const Duration(milliseconds: 200));
This time is enough for the OS to complete disk operations.
Conclusion and conclusions
The conducted MVP study clearly proves: On-Device and On-Server approaches do not compete, but complement each other.
- On-Server is indispensable for heavy computing (LLM, GPT, high-definition video processing).
- On-Device is ideal for utilitarian tasks (scanning documents, recognizing simple objects, working in strict offline conditions).
In modern Production applications, the best practice is a hybrid approach**: fast primary output is done locally, and deep data validation is sent to the backend server.
Regarding the choice of platform: Native Android gives absolute control over resources, hardware and threads out of the box. Flutter, despite the limitations of single-threading Dart, with proper use of MethodChannel, compliance with the rules for dispatching coroutines in the native layer and taking into account file system timings, allows you to create responsive and productive AI applications.
GitHub
RatRatatyu
/
mobile-ai-mvp
Two MVP applications demonstrating on-device and on-server AI model integration in Jetpack Compose (Android) and Flutter.
Mobile AI Integration: On-Device vs On-Server MVP Comparison
This repository contains two MVP applications developed for the International Scientific and Practical Conference
"Student Research: Challenges and Development Trends".
🏫 Conference Information
-
Event: International Scientific and Practical Conference
"Student Research: Challenges and Development Trends" -
Organizers:
Ministry of Education of the Republic of Kazakhstan,
Department of Education of Aktobe Region,
Aktobe Higher Humanitarian College,
National Centre for Professional Development "Orleu" -
Section:
Science, Technology, and Digital Innovations -
Date:
May 22, 2026
📱 Project Overview
The project explores the architectural choice between running AI models directly on a smartphone (On-Device) versus processing them on a remote server (On-Server)
For this research, image classification was chosen as the primary use case to demonstrate the differences in performance and accuracy
🤖 Applied Models
On-Device: Powered by the ML Kit Image Labeling API from Google for local, real-time inference
On-Server: Powered by the Hugging Face google/vit-base-patch16-224…
GitLab
RatRatatyu / mobile-ai-mvp · GitLab
Two MVP applications demonstrating on-device and on-server AI model integration in Jetpack Compose (Android) and Flutter.
gitlab.com
I would like to note that I am just developing and learning in this direction, so perhaps my conclusions may be inaccurate, or the descriptions may not be entirely correct, so I will be grateful if you point out my mistakes in the comments, and I will also be glad if you put stars in GitHub and GitLab if the projects are useful to you





























