Medhat Elmasry: Base64 images with Azure OpenAI Dall-E 3, Semantic Kernel, and C#

We will generate Base64 images using the OpenAI Dall-E 3 service and Semantic Kernel. The Base64 representation of the image will be saved in a text file. Thereafter, we will read the text file from an index.html page using JavaScript and subsequently render the image on a web page.

Source Code: https://github.com/medhatelmasry/DalleImageBase64/

What is Semantic Kernel?

This is the official definition obtained from Create AI agents with Semantic Kernel | Microsoft Learn:

Semantic Kernel is an open-source SDK that lets you easily build agents that can call your existing code. As a highly extensible SDK, you can use Semantic Kernel with models from OpenAI, Azure OpenAI, Hugging Face, and more!

Getting Started

In a suitable directory, create a console application named DalleImageBase64 and add to it three packages needed for our application with the following terminal window commands:

dotnet new console -o DalleImageBase64
cd DalleImageBase64
dotnet add package Microsoft.SemanticKernel
dotnet add package System.Configuration.ConfigurationManager

dotnet add package SkiaSharp

Create a file named App.config in the root folder of the console application and add to it the important parameters that allow access to the Azure OpenAI service. Contents of App.config are like the following:

<?xml version="1.0"?>
<configuration>
<appSettings>
<add key="endpoint" value="https://fake.openai.azure.com/" />
<add key="azure-api-key" value="fake-azure-openai-key" />
<add key="openai-api-key" value="fake-openai-key" />
<add key="openai-org-id" value="fake-openai-org-id" />
<add key="gpt-deployment" value="gpt-4o-mini" />
<add key="dalle-deployment" value="dall-e-3" />
<add key="openai_or_azure" value="openai" />
</appSettings>
</configuration>

NOTE: Since I cannot share the endpoint and apiKey with you, I have fake values for these settings.

Let's Code

Open Program.cs and delete all its contents. Add the following using statements at the top:

using System.Configuration;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.TextToImage;

We need to read the App.config file settings into our application. We will use the ConfigurationManager from namespace System.Configuration. To read settings from App.config with ConfigurationManager, append the following code to Program.cs:

// Get configuration settings from App.config
string _endpoint = ConfigurationManager.AppSettings["endpoint"]!;
string _azureApiKey = ConfigurationManager.AppSettings["azure-api-key"]!;
string _openaiApiKey = ConfigurationManager.AppSettings["azure-api-key"]!;
string _dalleDeployment = ConfigurationManager.AppSettings["dalle-deployment"]!;
string _gptDeployment = ConfigurationManager.AppSettings["gpt-deployment"]!;
string _openai_or_azure = ConfigurationManager.AppSettings["openai_or_azure"]!;
string _openaiOrgId = ConfigurationManager.AppSettings["openai-org-id"]!;

Currently, we need to disable certain warning directives by adding the following into the .csproj file inside the <PropertyGroup> block:

<NoWarn>SKEXP0001, SKEXP0010</NoWarn>

Then, append this code to Program.cs:

// Create a kernel builder
var builder = Kernel.CreateBuilder();

// Add OpenAI services to the kernel
if (_openai_or_azure == "azure") {
// use azure openai services
builder.AddOpenAIChatCompletion(_gptDeployment, _endpoint, _azureApiKey);
builder.AddOpenAITextToImage(_dalleDeployment, _endpoint, _azureApiKey);
} else {
// use openai services
builder.AddOpenAIChatCompletion(_gptDeployment, _openaiApiKey, _openaiOrgId);
builder.AddOpenAITextToImage(_openaiApiKey, _openaiOrgId);
}

// Build the kernel
var kernel = builder.Build();

We created a builder object from SematicKernel, added the AddAzureOpenAITextToImage and AddAzureOpenAIChatCompletion services, then obtained an instance of the kernel object.

Get an instance of the "Dall-E" service from the kernel with the following code:

// Get AI service instance used to generate images
var dallE = kernel.GetRequiredService<ITextToImageService>();

Let us create a prompt that generates an image representing a phrase entered by the user. Append this code to Program.cs:

// create execution settings for the prompt
var prompt = @"
Think about an image that represents {{$input}}.";

We then configure the prompt execution settings with:

var executionSettings = new OpenAIPromptExecutionSettings {
MaxTokens = 256,
Temperature = 1
};

Temperature is a measure of how creative you want the AI to be. This ranges from 0 to 1, where 0 is least creative and 1 is most creative.

We will create a semantic function from our prompt with:

// create a semantic function from the prompt
var genImgFunction = kernel.CreateFunctionFromPrompt(prompt, executionSettings);

Let us ask the user for input with this code:

// Get a phrase from the user
Console.WriteLine("Enter a phrase to generate an image from: ");
string? phrase = Console.ReadLine();
if (string.IsNullOrEmpty(phrase)) {
Console.WriteLine("No phrase entered.");
return;
}

Next, we will ask the kernel to combine the prompt with the input received from to user.

// Invoke the semantic function to generate an image description
var imageDescResult = await kernel.InvokeAsync(genImgFunction, new() { ["input"] = phrase });
var imageDesc = imageDescResult.ToString();

Finally, ask Dall-E service to do the important work of generating an image based on the description. It returns an image url. This is done with the following code:

// Use DALL-E 3 to generate an image.
// In this case, OpenAI returns a URL (though you can ask to return a base64 image)
var imageUrl = await dallE.GenerateImageAsync(imageDesc.Trim(), 1024, 1024);

Let’s print the output URL so that the user can pop it into a browser to see what it looks like:

// Display the image URL
Console.WriteLine($"Image URL:\n\n{imageUrl}");

We will next use the SkiaSharp package (installed earlier on) to save the the image to the computer file system. Create a helper class named SkiaUtils with the following code:

public static class SkiaUtils {

public static async Task<string> SaveImageToFile(string url, int width, int height, string filename = "image.png") {

SKImageInfo info = new SKImageInfo(width, height);
SKSurface surface = SKSurface.Create(info);
SKCanvas canvas = surface.Canvas;
canvas.Clear(SKColors.White);
var httpClient = new HttpClient();
using (Stream stream = await httpClient.GetStreamAsync(url))
using (MemoryStream memStream = new MemoryStream()) {
await stream.CopyToAsync(memStream);
memStream.Seek(0, SeekOrigin.Begin);
SKBitmap webBitmap = SKBitmap.Decode(memStream);
canvas.DrawBitmap(webBitmap, 0, 0, null);
surface.Draw(canvas, 0, 0, null);
};
surface.Snapshot().Encode(SKEncodedImageFormat.Png, 100).SaveTo(new FileStream(filename, FileMode.Create));
return filename;
}

public static async Task<string> GetImageToBase64String(string url, int width, int height) {

SKImageInfo info = new SKImageInfo(width, height);
SKSurface surface = SKSurface.Create(info);
SKCanvas canvas = surface.Canvas;
canvas.Clear(SKColors.White);
var httpClient = new HttpClient();
using (Stream stream = await httpClient.GetStreamAsync(url))
using (MemoryStream memStream = new MemoryStream()) {
await stream.CopyToAsync(memStream);
memStream.Seek(0, SeekOrigin.Begin);
SKBitmap webBitmap = SKBitmap.Decode(memStream);
canvas.DrawBitmap(webBitmap, 0, 0, null);
surface.Draw(canvas, 0, 0, null);
};
using (MemoryStream memStream = new MemoryStream()) {
surface.Snapshot().Encode(SKEncodedImageFormat.Png, 100).SaveTo(memStream);
byte[] imageBytes = memStream.ToArray();
return Convert.ToBase64String(imageBytes);
}
}
}

The above SkiaUtils class contains two static methods: SaveImageToFile() and GetImageToBase64String(). The method names are self-explanatory. Let us use these methods in our application. Add the following code to the bottom of Program.cs:

// generate a random number between 0 and 200 to be used for filename
var random = new Random().Next(0, 200);

// use SkiaUtils class to save the image as a .png file
string filename = await SkiaUtils.SaveImageToFile(imageUrl, 1024, 1024, $"{random}-image.png");

// use SkiaUtils class to get base64 string representation of the image
var base64Image = await SkiaUtils.GetImageToBase64String(imageUrl, 1024, 1024);

// save base64 string representation of the image to a text file
File.WriteAllText($"{random}-base64image.txt", base64Image);

// Display the image filename
Console.WriteLine($"\nImage saved as {filename}");

// Display the base64 image filename
Console.WriteLine($"\nBase64 image saved as {random}-base64image.txt");

Running App

Let’s try it out. Run the app in a terminal window with:

dotnet run

The user is prompted with “Enter a phrase to generate an image from:”. I entered “a camel roaming the streets of New York”. This is the output I received:

I copied and pasted the URL into my browser. This is what the image looked like:

Two files were created in the root folder of the console application - namely: 94-image.png and 94-base64image.txt. Note that your filenames could be different because the numbers in the name are randomly generated.

You can double-click on the .png image file to view it in the default image app on your computer.

Viewing Base64 representation of image in a web page

In the root folder of your console application, create a file named index.html and add to it the following HTML/JavaScript code:

<!DOCTYPE html>
<html lang="en">

<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width,
initial-scale=1.0">
<title>Read Base64 image</title>
</head>

<body>
<input type="file" id="fileInput" />
<img src="" id="img"/>
<script>
document.getElementById('fileInput')
.addEventListener('change', (event) => {
const file = event.target.files[0];
const reader = new FileReader();

reader.onload = function () {
const content = reader.result;
console.log(content);
document.getElementById('img')
.src = 'data:image/png;base64,' + content;
};

reader.onerror = function () {
console.error('Error reading the file');
};

reader.readAsText(file, 'utf-8');
});
</script>
</body>

</html>

The JavaScript in the above index.html file reads the text file and sets its Base64 content to the src attribute of an image tag.

View Base64 representation of the image

Double click on the index.html file on your file system.

Navigate to the text file that contains the Base64 representation of the image and select it. You will see the same image that you had seen earlier loaded to the web page.

Conclusion

You can use the Image URL generated from the Dall-E 3 API, save it to your computer or generate a Base64 representation of the image,

Medhat Elmasry

Wednesday, February 7, 2024

Base64 images with Azure OpenAI Dall-E 3, Semantic Kernel, and C#

What is Semantic Kernel?

Getting Started

Let's Code

Running App

Viewing Base64 representation of image in a web page

View Base64 representation of the image

Conclusion

No comments:

Post a Comment

About Me

Blog Archive