In this walkthrough, I will demonstrate how you can download the Phi-3 AI SLM (small language model) from Hugging Face and use it in a C# application.
What is Hugging Face?
Hugging Face (https://huggingface.co/) provides AI/ML researchers & developers with access to thousands of curated datasets, machine learning models, and AI-powered demo apps. We will download he Phi-3 SLM model in ONNX format onto our computers from https://huggingface.co/models.
What is ONNX?
ONNX is an open format built to represent machine learning models. Visit https://onnx.ai/ for more information.
Getting Started
We will download the Phi-3 Mini SLM for the ONNX runtime from Hugging Face. Run the following command from within a terminal window so that the destination is a location of your choice. In the below example the destination is a folder named phi-3-mini on a Windows C: drive.
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx C:/phi-3-mini
NOTE:
This example only works on the Windows Operating System.
Be patient as the download could take some time. On my Windows computer the size of the download is 30.1 GB comprising 97 files and 48 folders.
phi3-mini-4k-instruct-cpu-int4-rtn-block-32.onnx
phi3-mini-4k-instruct-cpu-int4-rtn-block-32.onnx.data
Getting Started
In a working directory, create a C# console app named LocalAiModelSK inside a terminal window with the following command:
dotnet new console -n LocalAiModelSK
Change into the newly created directory LocalAiModelSK with:
cd LocalAiModelSK
Next, let's add two packages to our console application with:
dotnet add package Microsoft.SemanticKernel -v 1.16.2
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx -v 1.16.2-alpha
Open the project in VS Code and add this directive to the .csproj file right below: <Nullable>enable</Nullable>:
<NoWarn>SKEXP0070</NoWarn>
Replace the contents of Program.cs with the following C# code:
using Microsoft.SemanticKernel;using Microsoft.SemanticKernel.ChatCompletion;using Microsoft.SemanticKernel.Connectors.OpenAI;
// PHI-3 local model locationvar modelPath = @"C:\phi-3-mini\cpu_and_mobile\cpu-int4-rtn-block-32";
// Load the model and servicesvar builder = Kernel.CreateBuilder();builder.AddOnnxRuntimeGenAIChatCompletion("phi-3", modelPath);
// Build Kernelvar kernel = builder.Build();
// Create services such as chatCompletionService and embeddingGenerationvar chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
// Start the conversationwhile (true) {// Get user inputConsole.ForegroundColor = ConsoleColor.Yellow;Console.Write("User : ");var question = Console.ReadLine()!;
OpenAIPromptExecutionSettings openAIPromptExecutionSettings = new() {MaxTokens = 200};
var response = kernel.InvokePromptStreamingAsync(promptTemplate: @"{{$input}}",arguments: new KernelArguments(openAIPromptExecutionSettings){{ "input", question }});Console.ForegroundColor = ConsoleColor.Green;Console.Write("\nAssistant : ");string combinedResponse = string.Empty;await foreach (var message in response) {// Write the response to the consoleConsole.Write(message);combinedResponse += message;}Console.WriteLine();}
In the above code, make sure that modelPath points to the proper location of the model on your computer.
I asked the question: How long do mosquito live?
This is the response I received:
Conclusion
You can choose from a variety of SLMs at Hugging Face. Of course, the penalty is that the actual ONNX model sizes are significant making it, in some circumstances, more desirable to use a model that resides online.