Saturday, January 13, 2024

Generate Images with Azure OpenAI Dall-E 3, Semantic Kernel, and C#

It is very easy to generate images using the OpenAI Dall-E 3 service and Semantic Kernel. You provide the text describing what you want and OpenAI will generate for you the image. In this tutorial, we will use Semantic Kernel and Azure OpenAI to do exactly that.

Source Code:

Companion Video:

What is Semantic Kernel?

This is the official definition obtained from Create AI agents with Semantic Kernel | Microsoft Learn:

Semantic Kernel is an open-source SDK that lets you easily build agents that can call your existing code. As a highly extensible SDK, you can use Semantic Kernel with models from OpenAI, Azure OpenAI, Hugging Face, and more! 

Getting Started

In a suitable directory, create a console application named DalleImage and add to it two packages needed for our application with the following terminal window commands:

dotnet new console -o DalleImage
cd DalleImage
dotnet add package Microsoft.SemanticKernel
dotnet add package System.Configuration.ConfigurationManager

Create a file named App.config in the root folder of the console application and add to it the important parameters that allow access to the Azure OpenAI service. Contents of App.config are like the following:

<?xml version="1.0"?>
        <add key="endpoint" value="" />
        <add key="api-key" value="fakekey-fakekey-fakekey-fakekey" />
        <add key="gpt-deployment" value="gpt-35-turbo" />
        <add key="dalle-deployment" value="dall-e-3" />

NOTE: Since I cannot share the endpoint and apiKey with you, I have fake values for these settings.

Currently, the Dall-E 3 model is in preview and only available in the "Sweden Central" Azure data centre according to

Let's Code

Open Program.cs and delete all its contents. Add the following using statements at the top:

using System.Configuration;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.TextToImage;

We need to read the App.config file settings into our application. We will use the ConfigurationManager from namespace System.Configuration. To read settings from App.config with ConfigurationManager, append the following code to Program.cs:

// Get configuration settings from App.config
string _endpoint = ConfigurationManager.AppSettings["endpoint"]!;
string _apiKey = ConfigurationManager.AppSettings["api-key"]!;
string _dalleDeployment = ConfigurationManager.AppSettings["dalle-deployment"]!;
string _gptDeployment = ConfigurationManager.AppSettings["gpt-deployment"]!;

Currently, we need to disable certain warning directives by adding the following into the .csproj file inside the <PropertyGroup> block:

<NoWarn>SKEXP0001, SKEXP0002, SKEXP0011, SKEXP0012</NoWarn>

Then, append this code to Program.cs:

// Create a kernel builder
var builder = Kernel.CreateBuilder(); 
// Add OpenAI services to the kernel
builder.AddAzureOpenAITextToImage(_dalleDeployment, _endpoint, _apiKey);
builder.AddAzureOpenAIChatCompletion(_gptDeployment, _endpoint, _apiKey); 
// Build the kernel
var kernel = builder.Build();

e created a builder object from SematicKernel, added the AddAzureOpenAITextToImage and AddAzureOpenAIChatCompletion services, then obtained an instance of the kernel object.

Get an instance of the "Dall-E" service from the kernel with the following code:

// Get AI service instance used to generate images
var dallE = kernel.GetRequiredService<ITextToImageService>();

Let us create a prompt that generates an image representing a phrase entered by the user. Append this code to Program.cs:

// create execution settings for the prompt
var prompt = @"
Think about an artificial object that represents {{$input}}.";

We then configure the prompt execution settings with:

var executionSettings = new OpenAIPromptExecutionSettings {
    MaxTokens = 256,
    Temperature = 1

Temperature is a measure of how creative you want the AI to be. This ranges from 0 to 1, where 0 is least creative and 1 is most creative.

We will create a semantic function from our prompt with:

// create a semantic function from the prompt
var genImgFunction = kernel.CreateFunctionFromPrompt(prompt, executionSettings);

Let us ask the user for input with:

// Get a phrase from the user
Console.WriteLine("Enter a phrase to generate an image from: ");
string? phrase = Console.ReadLine();
if (string.IsNullOrEmpty(phrase)) {
    Console.WriteLine("No phrase entered.");

Next, ask the kernel to combine the prompt with the input received from to user, producing a description.

// Invoke the semantic function to generate an image description
var imageDescResult = await kernel.InvokeAsync(genImgFunction, new() { ["input"] = phrase });
var imageDesc = imageDescResult.ToString();

Finally, ask Dall-E service to do the important work of generating an image based on the description. It returns an image url. This is done with the following code:

// Use DALL-E 3 to generate an image. 
// In this case, OpenAI returns a URL (though you can ask to return a base64 image)
var imageUrl = await dallE.GenerateImageAsync(imageDesc.Trim(), 1024, 1024);

Let’s print the output URL so that the user can pop it into a browser to see what it looks like:

Console.WriteLine($"Image URL:\n\n{imageUrl}");

Running App

Let’s try it out. Run the app in a terminal window with:

dotnet run

The user is prompted with “Enter a phrase to generate an image from:”. I entered “a lobster flying over the pyramids in giza”, and received this output:

I find it pretty fascinating how OpenAI can generate images based on text-based descriptions. I hope you do too.

No comments:

Post a Comment