This is a submission for the Gemma 4 Challenge: Write About Gemma 4
Problem Statement
Nigeria has a scarcity of health workers, with the ratio of doctors to patients standing at 1:9000. This need is felt more in rural and semi-urban areas.
To address this need, Community Health Extension Workers (CHEW) exist to help bridge this gap in rural and underserved communities.
These workers undergo 2-3 years of training and can provide care for select conditions, referring patients to secondary and tertiary health facilities if needed.
In 2022, the Nigerian Federal Ministry of Health published the latest Standard Treatment Guidelines for Nigeria (STG) to provide evidence-based clinical protocols for the diagnosis, prevention, and management of common diseases and clinical conditions within the Nigerian context.
AI as a Solution
Health workers have to make decisions relying on the STG. What if we have a way to assist health workers in getting detailed treatment information, based on the STG?
This article draws inspiration from some existing work that has developed a similar solution. But in all cases, the solutions are web-based and rely on the internet and heavy LLMs.
We propose using the Gemma4 2B model to power solutions that utilize RAG to deliver treatment guides to health workers.
Why Gemma4 2B?
It's a light-weight model that can run on mobile phones. With a size of about 7GB, it can be loaded easily on a high-end phone. This means we can deploy a full RAG solution on mobile phones or tablets.
This is significant as it means that the health workers can still use the AI solution even in places where there is no internet, a common issue for workers in rural and underserved communities.
Proof of Concept
To prove our solution, we will deploy a simple RAG solution on a Windows system. This solution will use local models hosted on Windows.
Requirements
- Ollama
- Visual Studio
- .NET AI Framework Libraries.
Setting Up
We use Ollama to download and run models locally. We start by signing up on Ollama and downloading the Ollama Windows application.
After installing the Ollama application, we check if it's properly installed by running ollama in PowerShell:

We also pull some models using Ollama:
> ollama pull embeddinggemma
> ollama pull gemma4:e2b
Here we pull 2 models. embeddinggemma which will be used to generate embeddings for our RAG solution, and gemma4:e2b which will be used as the reasoning LLM:

Developing the RAG Solution
The source code can be found here.
First, we create a Console project in Visual Studio and add some important libraries:
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
dotnet add package Microsoft.Extensions.Configuration.DependencyInjection
dotnet add package OllamaSharp
dotnet add package Microsoft.Extensions.AI
dotnet package add Microsoft.SemanticKernel.Connectors.InMemory --prerelease
We have added AI Extension and Vector Store libraries to interact with our model and generate embeddings. We also added OllamSharp to enable interaction with our Ollama-hosted models, locally or in the cloud.
Dataset
The STG has been extracted and formatted into JSON to help developers. We can download it from https://github.com/chisomrutherford/nigeria-clinical-guidelines-dataset/tree/main.
Implementation
Our implementation step is quite straightforward
- Load dataset
- Use an embedding model to generate embeddings for the dataset
- Save all embeddings in an in-memory vector store
- When a user queries, generate an embedding for the user's query
- Do a vector search against the in-memory vector store.
- Use retrieved data to enhance the prompt.
- Send prompt to Gemma-4 2B model and show response.
- Save the response and allow the user to chat with the model.
First, we declare our model and chat client:
string? model = _configuration["Ollama:ModelName"];
string? embedModel = _configuration["Ollama:EmbedModelName"];
string? url = _configuration["Ollama:BaseUrl"];
var client = new HttpClient();
client.BaseAddress = new Uri(url);
var ollamaGen = new OllamaApiClient(client, embedModel);
Next, we define our vector store. Before we do, let's define what our data structure will look like:
public class ClinicalGuidelineVector
{
[VectorStoreKey]
public int Id { get; set; }
[VectorStoreData]
public string ConditionName { get; set; } = "";
[VectorStoreData]
public string RawJson { get; set; } = "";
sou
[VectorStoreVector(Dimensions: 768, DistanceFunction = DistanceFunction.CosineSimilarity)] // depends on embedding model
public ReadOnlyMemory<float> Embedding { get; set; } = new ReadOnlyMemory<float>();
}
It's important to use the exact dimensions for the embedding model. To check the exact dimension, run this in PowerShell:
ollama show embeddinggemma:latest
We can then see the model information:

Next, we create and initialize our vector store using an in-memory vector store:
var vectorStore = new InMemoryVectorStore();
VectorStoreCollection<int, ClinicalGuidelineVector> collection =
vectorStore.GetCollection<int, ClinicalGuidelineVector>("clinical_guidelines");
await collection.EnsureCollectionExistsAsync();
After this, we load the data and generate our embeddings:
var data = await LoadData();
if (data.Count == 0)
{
return;
}
int id = 0;
foreach (var item in data)
{
var text = BuildSearchText(item);
var embeddingResponse = await ollamaGen.EmbedAsync(text);
var embedding = embeddingResponse.Embeddings[0];
var vector = new ClinicalGuidelineVector
{
ConditionName = item.ConditionName!,
RawJson = JsonSerializer.Serialize(item),
Embedding = embedding,
Id = id++
};
await collection.UpsertAsync(vector);
Console.WriteLine("Inserted vector ID, {0}", id);
}
Here, we first build a string of the search text and then generate an embedding for the text. Finally, we save the embedding, a serialized string of that data, and the condition name.
When a user asks a question, we first convert their information into an embedding and search the vector store to find the information relevant to the query:
Console.Write("Your prompt: ");
Console.WriteLine(Environment.NewLine);
var query = Console.ReadLine();
var queryEmbed = await ollamaGen.EmbedAsync(query);
var queryVector = queryEmbed.Embeddings[0];
IAsyncEnumerable<VectorSearchResult<ClinicalGuidelineVector>> results =
collection.SearchAsync(queryVector, top: 3);
List<ClinicalGuidelineVector> clinicalGuidelines = [];
await foreach (var result in results)
{
clinicalGuidelines.Add(result.Record);
}
Then we flatten the guidelines retrieved and add them to the user's prompt before sending to Gemma-4 2B:
model = "gemma4:e2b";
IChatClient chatClient = new OllamaApiClient(client, model);
List<ChatMessage> messages = [];
messages.Add(new ChatMessage(ChatRole.System, Constants.SystemPrompt));
var context = string.Join("\n\n",
clinicalGuidelines.Select(m => m.RawJson));
var prompt = string.Format(Constants.PromptTemplate, context, query);
messages.Add(new ChatMessage(ChatRole.User, prompt));
var chatResponse = chatClient.GetStreamingResponseAsync(messages);
string fullResponse = "";
await foreach (var response in chatResponse)
{
fullResponse += response;
Console.Write(response.Text);
}
messages.Add(new ChatMessage(ChatRole.Assistant, fullResponse));
Here, we first set the model to the reasoning model to gemma4:e2b. Next, we set the system prompt. After this, we flattened the retrieved guidelines and used them to augment the user's prompt. The entire information is then passed to the gemma4:e2b model.
The AI-generated response is also saved as history to keep the conversation going.
Challenges
We have seen how to implement a good RAG solution to help health workers use the STG. However, this comes with one challenge. The Gemma-4 2B model, despite its size and specification to allow it on mobile phones, can only work on high-end mobile phones, which may not be affordable for the target health workers (CHEW), many of whom rely on ubiquitous mobile phones as their work tools.
Conclusion
Using the Gemma-4 2B can be a game-changer in the health sector as it enhances the capability of health workers using government-approved STG. The solution can be deployed to assist community health workers working in underserved communities, although the cost of high-end mobile phones can be a limiting factor to deploying the solution.




















