Thursday, September 19, 2024

Phi-3 Small Language Model (SLM) in a C# console application with Ollama and Sematic Kernel

What is small language model (SLM)?

A small language model (SLM) is a machine learning model typically based on a large language mode (LLM) but of greatly reduced size. An SLM retains much of the functionality of the LLM from which it is built but with far less complexity and computing resource demand.

What is Ollama?

Ollama is an application you can download onto your computer or server to run open source generative AI large-language-models (LLMs) such as Meta's Llama 3 and Microsoft's Phi-3. You can see the many models available at https://www.ollama.com/library.

What is Semantic Kernel

Semantic Kernel is a lightweight, open-source development kit that lets you easily build AI agents and integrate the latest AI models into your C#, Python, or Java codebase.

Overview

In this tutorial, we will see how easy it is to use the Phi-3 small language model in a C# application. The best part is that it is free and runs entirely on your local device. Ollama will be used to serve the Phi-3 small language model and Semantic Kernel will be the C# development kit we will use for code development. 

Getting Started

Download Ollama installer from https://www.ollama.com/download.

Once you have installed Ollama, run these commands from a terminal window:

ollama pull phi3:latest
ollama list
ollama show phi3:latest

In a working directory, create a C# console app named Phi3SK inside a terminal window with the following command:

dotnet new console -n SlmSK

Change into the newly created directory Phi3SK with:

cd SlmSK

Next, let's add the Semantic Kernel package to our console application with:

dotnet add package Microsoft.SemanticKernel -v 1.19.0

Open the project in VS Code and add this directive to the .csproj file right below: <Nullable>enable</Nullable>:

<NoWarn>SKEXP0010</NoWarn> 

The Code

Replace contents of Program.cs with the following code:

using Microsoft.SemanticKernel;
using System.Text;
using Microsoft.SemanticKernel.ChatCompletion;

// Initialize the Semantic kernel
var builder = Kernel.CreateBuilder();

// We will use Semantic Kernel OpenAI API
builder.Services.AddOpenAIChatCompletion(
        modelId: "phi3",
        apiKey: null,
        endpoint: new Uri("http://localhost:11434"));

var kernel = builder.Build();

// get a chat completion service
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>(); 

// Create a new chat by specifying the assistant
ChatHistory chat = new(@"
    You are an AI assistant that helps people find information. 
    The response must be brief and should not exceed one paragraph.
    If you do not know the answer then simply say 'I do not know the answer'."
);
 
// Instantiate a StringBuilder
StringBuilder strBuilder = new();

// User question & answer loop
while (true) {
    Console.Write("Q: "); 
 
// Get the user's question
    chat.AddUserMessage(Console.ReadLine()!);

    // Clear contents of the StringBuilder
strBuilder.Clear();

    // Get the AI response streamed back to the console
    await foreach (var message in chatCompletionService.GetStreamingChatMessageContentsAsync(chat, kernel: kernel))
    {
        Console.Write(message);
        strBuilder.Append(message.Content);
    }
    Console.WriteLine();
    chat.AddAssistantMessage(strBuilder.ToString());

    Console.WriteLine();

}

Running the app

Run the application with:

dotnet run

I entered this question: How long does a direct flight take from Los Angles to Frankfurt?

Then, I got the following response:

Use another SLM

How about we use another SLM. Let's try llama3.1 ( https://www.ollama.com/library/llama3.1 ). Pull the image with:

ollama pull llama3.1:latest

In the code, REPLACE modelId: "phi3" WITH modelId: "llama3.1". Run your application with the same question. This is what I got:


The response is quite similar to that received from Phi-3, even though it shows that the flight to Frankfurt takes longer.

Conclusion

We can package our applications with a local SLM. This makes our applications cheaper, faster, connection-free, and self contained.


No comments:

Post a Comment