Monday, September 30, 2024

Using Sematic Kernel with SLM AI models downloaded to your computer

In this walkthrough, I will demonstrate how you can download the Phi-3 AI SLM (small language model) from Hugging Face and use it in a C# application. 

What is Hugging Face?

Hugging Face (https://huggingface.co/) provides AI/ML researchers & developers with access to thousands of curated datasets, machine learning models, and AI-powered demo apps. We will download he Phi-3 SLM model in ONNX format onto our computers from https://huggingface.co/models.

What is ONNX?

ONNX is an open format built to represent machine learning models. Visit https://onnx.ai/ for more information.

Getting Started

We will download the Phi-3 Mini SLM for the ONNX runtime from Hugging Face. Run the following command from within a terminal window so that the destination is a location of your choice. In the below example the destination is a folder named phi-3-mini on a Windows C: drive.

git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx C:/phi-3-mini


   NOTE:

   This example only works on the Windows Operating System.

Be patient as the download could take some time. On my Windows computer the size of the download is 30.1 GB comprising 97 files and 48 folders.

We will be using the files in the cpu_and_mobile folder. Inside that folder, navigate into the cpu-int4-rtn-block-32 folder where you will find this pair of files that contain the AI ONNX model:

phi3-mini-4k-instruct-cpu-int4-rtn-block-32.onnx

phi3-mini-4k-instruct-cpu-int4-rtn-block-32.onnx.data

Getting Started

In a working directory, create a C# console app named LocalAiModelSK inside a terminal window with the following command:

dotnet new console -n LocalAiModelSK 

Change into the newly created directory LocalAiModelSK with:

cd LocalAiModelSK

Next, let's add two packages to our console application with:

dotnet add package Microsoft.SemanticKernel -v 1.16.2

dotnet add package Microsoft.SemanticKernel.Connectors.Onnx -v 1.16.2-alpha

Open the project in VS Code and add this directive to the .csproj file right below: <Nullable>enable</Nullable>:

<NoWarn>SKEXP0070</NoWarn>

Replace the contents of Program.cs with the following C# code:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI; 
 
// PHI-3 local model location 
var modelPath = @"C:\phi-3-mini\cpu_and_mobile\cpu-int4-rtn-block-32"; 
 
// Load the model and services
var builder = Kernel.CreateBuilder();
builder.AddOnnxRuntimeGenAIChatCompletion("phi-3", modelPath); 
 
// Build Kernel
var kernel = builder.Build(); 
 
// Create services such as chatCompletionService and embeddingGeneration
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>(); 
 
// Start the conversation
while (true) {
    // Get user input
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.Write("User : ");
    var question = Console.ReadLine()!; 
    
    OpenAIPromptExecutionSettings openAIPromptExecutionSettings = new() {
        MaxTokens = 200
    }; 
 
    var response = kernel.InvokePromptStreamingAsync(
        promptTemplate: @"{{$input}}",
        arguments: new KernelArguments(openAIPromptExecutionSettings){
            { "input", question }
        });
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("\nAssistant : ");
    string combinedResponse = string.Empty;
    await foreach (var message in response) {
        // Write the response to the console
        Console.Write(message);
        combinedResponse += message;
    }
    Console.WriteLine();
}

In the above code, make sure that modelPath points to the proper location of the model on your computer.

I asked the question: How long do mosquito live?

This is the response I received:


Conclusion

You can choose from a variety of  SLMs at Hugging Face. Of course, the penalty is that the actual ONNX model sizes are significant making it, in some circumstances, more desirable to use a model that resides online.


Saturday, September 28, 2024

Using Sematic Kernel with AI models hosted on GitHub

Overview

In this article I will show you how you can experiment with AI models hosted on GitHub. GitHub AI Models are intended for learning, experimentation and proof-of-concept activities. The feature is subject to various limits (including requests per minute, requests per day, tokens per request, and concurrent requests) and is not designed for production use cases.

Getting Started

There are many AI models from a variety of vendors that you can choose from. The starting point is to visit https://github.com/marketplace/models. At the time of writing, these are a subset of the models available:


For this article, I will use the "Phi-3.5-mini instruct (128k)" model highlighted above. If you click on that model you will be taken to the model's landing page:


Click on the green "Get started" button.


The first thing we need to do is get a 'personal access token' by clicking on the indicated button above.


Choose 'Generate new token', which happens to be in beta at the time of writing.


Give your token a name, set the expiration, and optionally describe the purpose of the token. Thereafter, click on the green 'Generate token' button at the bottom of the page.


Copy the newly generated token and place it is a safe place because you cannot view this token again once you leave the above page. 

Let's use Semantic Kernel

In a working directory, create a C# console app named GitHubAiModelSK inside a terminal window with the following command:

dotnet new console -n GitHubAiModelSK

Change into the newly created directory GitHubAiModelSK with:

cd GitHubAiModelSK

Next, let's add two packages to our console application with:

dotnet add package Microsoft.SemanticKernel -v 1.25.0

dotnet add package Microsoft.Extensions.Configuration.Json

Open the project in VS Code and add this directive to the .csproj file right below: <Nullable>enable</Nullable>:

<NoWarn>SKEXP0010</NoWarn>

Create a file named appsettings.json. Add this to appsettings.json:

{
    "AI": {
      "Endpoint": "https://models.inference.ai.azure.com",
      "Model": "Phi-3.5-mini-instruct",
      "PAT": "fake-token"
    }
}

Replace "fake-token" with the personal access token that you got from GitHub.

Next, open Program.cs in an editor and delete all contents of the file. Add this code to Program.cs:

using Microsoft.SemanticKernel;
using System.Text;
using Microsoft.SemanticKernel.ChatCompletion;
using OpenAI;
using System.ClientModel;
using Microsoft.Extensions.Configuration;

var config = new ConfigurationBuilder()
    .SetBasePath(Directory.GetCurrentDirectory())
    .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true)
    .Build();

var modelId = config["AI:Model"]!;
var uri = config["AI:Endpoint"]!;
var githubPAT = config["AI:PAT"]!;

var client = new OpenAIClient(new ApiKeyCredential(githubPAT), new OpenAIClientOptions { Endpoint = new Uri(uri) });

// Initialize the Semantic kernel
var builder = Kernel.CreateBuilder();

builder.AddOpenAIChatCompletion(modelId, client);
var kernel = builder.Build();

// get a chat completion service
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();

// Create a new chat by specifying the assistant
ChatHistory chat = new(@"
    You are an AI assistant that helps people find information. 
    The response must be brief and should not exceed one paragraph.
    If you do not know the answer then simply say 'I do not know the answer'."
);

// Instantiate a StringBuilder
StringBuilder strBuilder = new();

// User question & answer loop
while (true)
{
    // Get the user's question
    Console.Write("Q: ");
    chat.AddUserMessage(Console.ReadLine()!);

    // Clear contents of the StringBuilder
    strBuilder.Clear();

    // Get the AI response streamed back to the console
    await foreach (var message in chatCompletionService.GetStreamingChatMessageContentsAsync(chat, kernel: kernel))
    {
        Console.Write(message);
        strBuilder.Append(message.Content);
    }
    Console.WriteLine();
    chat.AddAssistantMessage(strBuilder.ToString());

    Console.WriteLine();
}

Run the application:


I asked the question "How many pyramids are there in Egypt?" and the AI answered as shown above. 

Using a different model

How about we use a different AI model. For example, I will try the 'Meta-Llama-3.1-405B-Instruct' model. We need to get the model ID. Click on the model on the https://github.com/marketplace/models page.

Change Model in appsettings.json to "Meta-Llama-3.1-405B-Instruct".

Run the application again. This is what I experienced with the AI model meta-llama-3.1-405b-instruct:


Conclusion

GitHub AI models are easy to access. I hope you come up with great AI driven applications that make a difference to our world.


Thursday, September 19, 2024

Phi-3 Small Language Model (SLM) in a C# console application with Ollama and Sematic Kernel

What is small language model (SLM)?

A small language model (SLM) is a machine learning model typically based on a large language mode (LLM) but of greatly reduced size. An SLM retains much of the functionality of the LLM from which it is built but with far less complexity and computing resource demand.

What is Ollama?

Ollama is an application you can download onto your computer or server to run open source generative AI small-language-models (SLMs) such as Meta's Llama 3 and Microsoft's Phi-3. You can see the many models available at https://www.ollama.com/library.

What is Semantic Kernel

Semantic Kernel is a lightweight, open-source development kit that lets you easily build AI agents and integrate the latest AI models into your C#, Python, or Java codebase.

Overview

In this tutorial, we will see how easy it is to use the Phi-3 small language model in a C# application. The best part is that it is free and runs entirely on your local device. Ollama will be used to serve the Phi-3 small language model and Semantic Kernel will be the C# development kit we will use for code development. 

Companion Video: https://youtu.be/TMmOgGjmxA8

Getting Started

Download Ollama installer from https://www.ollama.com/download.

Once you have installed Ollama, run these commands from a terminal window:

ollama pull phi3:latest
ollama list
ollama show phi3:latest

In a working directory, create a C# console app named Phi3SK inside a terminal window with the following command:

dotnet new console -n SlmSK

Change into the newly created directory Phi3SK with:

cd SlmSK

Next, let's add the Semantic Kernel package to our console application with:

dotnet add package Microsoft.SemanticKernel -v 1.19.0

Open the project in VS Code and add this directive to the .csproj file right below: <Nullable>enable</Nullable>:

<NoWarn>SKEXP0010</NoWarn> 

The Code

Replace contents of Program.cs with the following code:

using Microsoft.SemanticKernel;
using System.Text;
using Microsoft.SemanticKernel.ChatCompletion;

// Initialize the Semantic kernel
var builder = Kernel.CreateBuilder();

// We will use Semantic Kernel OpenAI API
builder.Services.AddOpenAIChatCompletion(
        modelId: "phi3",
        apiKey: null,
        endpoint: new Uri("http://localhost:11434"));

var kernel = builder.Build();

// get a chat completion service
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>(); 

// Create a new chat by specifying the assistant
ChatHistory chat = new(@"
    You are an AI assistant that helps people find information. 
    The response must be brief and should not exceed one paragraph.
    If you do not know the answer then simply say 'I do not know the answer'."
);
 
// Instantiate a StringBuilder
StringBuilder strBuilder = new();

// User question & answer loop
while (true) {
    Console.Write("Q: "); 
 
// Get the user's question
    chat.AddUserMessage(Console.ReadLine()!);

    // Clear contents of the StringBuilder
strBuilder.Clear();

    // Get the AI response streamed back to the console
    await foreach (var message in chatCompletionService.GetStreamingChatMessageContentsAsync(chat, kernel: kernel))
    {
        Console.Write(message);
        strBuilder.Append(message.Content);
    }
    Console.WriteLine();
    chat.AddAssistantMessage(strBuilder.ToString());

    Console.WriteLine();

}

Running the app

Run the application with:

dotnet run

I entered this question: How long does a direct flight take from Los Angles to Frankfurt?

Then, I got the following response:

Use another SLM

How about we use another SLM. Let's try llama3.1 ( https://www.ollama.com/library/llama3.1 ). Pull the image with:

ollama pull llama3.1:latest

In the code, REPLACE modelId: "phi3" WITH modelId: "llama3.1". Run your application with the same question. This is what I got:


The response is quite similar to that received from Phi-3, even though it shows that the flight to Frankfurt takes longer.

Conclusion

We can package our applications with a local SLM. This makes our applications cheaper, faster, connection-free, and self contained.