Monday, September 30, 2024

Using Sematic Kernel with SLM AI models downloaded to your computer

In this walkthrough, I will demonstrate how you can download the Phi-3 AI SLM (small language model) from Hugging Face and use it in a C# application. 

What is Hugging Face?

Hugging Face ( provides AI/ML researchers & developers with access to thousands of curated datasets, machine learning models, and AI-powered demo apps. We will download he Phi-3 SLM model in ONNX format onto our computers from

What is ONNX?

ONNX is an open format built to represent machine learning models. Visit for more information.

Getting Started

We will download the Phi-3 Mini SLM for the ONNX runtime from Hugging Face. Run the following command from within a terminal window so that the destination is a location of your choice. In the below example the destination is a folder named phi-3-mini on a Windows C: drive.

git clone C:/phi-3-mini

Be patient as the download could take some time. On my Windows computer the size of the download is 30.1 GB comprising 97 files and 48 folders.

We will be using the files in the cpu_and_mobile folder. Inside that folder, navigate into the cpu-int4-rtn-block-32 folder where you will find this pair of files that contain the AI ONNX model:


In a working directory, create a C# console app named LocalAiModelSK inside a terminal window with the following command:

dotnet new console -n LocalAiModelSK 

Change into the newly created directory LocalAiModelSK with:

cd LocalAiModelSK

Next, let's add two packages to our console application with:

dotnet add package Microsoft.SemanticKernel -v 1.16.2

dotnet add package Microsoft.SemanticKernel.Connectors.Onnx -v 1.16.2-alpha

Open the project in VS Code and add this directive to the .csproj file right below: <Nullable>enable</Nullable>:


Replace the contents of Program.cs with the following C# code:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI; 
// PHI-3 local model location 
var modelPath = @"C:\phi-3-mini\cpu_and_mobile\cpu-int4-rtn-block-32"; 
// Load the model and services
var builder = Kernel.CreateBuilder();
builder.AddOnnxRuntimeGenAIChatCompletion("phi-3", modelPath); 
// Build Kernel
var kernel = builder.Build(); 
// Create services such as chatCompletionService and embeddingGeneration
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>(); 
// Start the conversation
while (true) {
    // Get user input
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.Write("User : ");
    var question = Console.ReadLine()!; 
    OpenAIPromptExecutionSettings openAIPromptExecutionSettings = new() {
        MaxTokens = 200
    var response = kernel.InvokePromptStreamingAsync(
        promptTemplate: @"{{$input}}",
        arguments: new KernelArguments(openAIPromptExecutionSettings){
            { "input", question }
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("\nAssistant : ");
    string combinedResponse = string.Empty;
    await foreach (var message in response) {
        // Write the response to the console
        combinedResponse += message;

In the above code, make sure that modelPath points to the proper location of the model on your computer.

I asked the question: How long do mosquito live?

This is the response I received:


You can choose from a variety of  SLMs at Hugging Face. Of course, the penalty is that the actual ONNX model sizes are significant making it, in some circumstances, more desirable to use a model that resides online.

