Medhat Elmasry: Using Azure OpenAI Whisper in an ASP.NET Razor Pages app

Using Azure OpenAI, we will explore the audio-centric Whisper neural net from OpenAI. You can find more details about Whisper at https://github.com/openai/whisper. The examples in this article assume that you have a developer account with Azure. These are the features we will explore:

Transcribing audio into text
Converting text into audio
Translating audio from another spoken language into English text

Source Code: https://github.com/medhatelmasry/WhisperWebAzureOpenAI

Prerequisites:

You need a subscription with Azure.
The example uses Razor pages in ASP.NET 9.0
We will use the standard VS Code editor
You have installed the “C# Dev Kit” extension in VS Code

Getting Started

We will start by:

creating an ASP.NET Razor Pages web app
adding Azure packages to the project

Execute these commands in a terminal window:

dotnet new razor -o WhisperWebAzureOpenAI
cd WhisperWebAzureOpenAI
dotnet add package Azure.AI.OpenAI -v 2.2.0-beta.2
dotnet add package Microsoft.Extensions.Azure

Start VS Code in the current project folder with:

code .

Add the following to appsettings.Development.json:

"AzOpenAI": {
"Key": "YOUR-AZURE-OPENAI-KEY-HERE",
"Url": "YOUR-AZURE-OPENAI-ENDPOINT-HERE",
"Audio2Text":
{
"Model": "whisper",
"Folder": "audio2text"
},
"Text2Audio": {
"Model": "tts",
"Folder": "text2audio"
},
"Translation": {
"Model": "whisper",
"Folder": "translation"
}
}

NOTE: Replace the value of the Key and Url settings above with your Azure OpenAI key and endpoint.

Model whisper is used for audio to text and audio translations. Model tts is used for converting text into audio.

Add this service to Program.cs:

builder.Services.AddAzureClients(clientBuilder =>
{
  // read key from configuration
string? key = builder.Configuration["AzOpenAI:Key"];
string? url = builder.Configuration["AzOpenAI:Url"];
var credentials = new AzureKeyCredential(key!);

  // Register a custom client factory
clientBuilder.AddClient<AzureOpenAIClient, AzureOpenAIClientOptions>(
(options, _, _) => new AzureOpenAIClient(
new Uri(url!), credentials, options));
});

Download a zip file from https://medhat.ca/images/audio.zip. Extract the file in the wwwroot folder. This creates the following directory structure under wwwroot:

Note the presence of these audio files in the /wwwroot/audio/audio2text folder:

aboutSpeechSdk.wav
audio_houseplant_care.mp3
speechService.wav
TalkForAFewSeconds16.wav
wikipediaOcelot.wav

Also, note the presence of these audio files in the /wwwroot/audio/translation folder:

audio_arabic.mp3
audio_french.wav
audio_spanish.mp3

Add razor pages

In VS Code, view your project in the “Solution Explorer” tab:

Right-click on the Pages folder and add a razor page named Audio2Text:

Similarly, add these two razor pages:

Text2Audio
Translation

Make these code replacements into the respective files:

Audio2Text Razor Page

Audio2Text.cshtml.cs

using Azure.AI.OpenAI;
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using Microsoft.AspNetCore.Mvc.Rendering;
namespace WhisperWebAzureOpenAI.Pages;
public class Audio2TextModel : PageModel {
private readonly ILogger<Audio2TextModel> _logger;
private readonly AzureOpenAIClient _azureOpenAIClient;
private readonly IConfiguration _configuration;
public List<SelectListItem>? AudioFiles { get; set; }
public Audio2TextModel(
ILogger<Audio2TextModel> logger,
AzureOpenAIClient client,
IConfiguration configuration
)
{
_logger = logger;
_azureOpenAIClient = client;
_configuration = configuration;
  // create wwroot/audio folder if it doesn't exist
string? folder = _configuration["AzOpenAI:Audio2Text:Folder"];
string? path = $"wwwroot/audio/{folder}";
if (!Directory.Exists(path)) {
Directory.CreateDirectory(path);
}
}
public void OnGet() {
AudioFiles = GetAudioFiles();
}
public async Task<IActionResult> OnPostAsync(string? audioFile) {
if (string.IsNullOrEmpty(audioFile)) {
return Page();
}
string? deploymentName = _configuration["AzOpenAI:Audio2Text:Model"];
var audioClient = _azureOpenAIClient.GetAudioClient(deploymentName);

var result = await audioClient.TranscribeAudioAsync(audioFile);
if (result is null) {
return Page();
}
string? folder = _configuration["AzOpenAI:Audio2Text:Folder"];
string? path = $"wwwroot/audio/{folder}";
ViewData["AudioFile"] = audioFile.StartsWith(path) ? audioFile.Substring(path.Length + 1) : audioFile;
ViewData["Transcription"] = result.Value.Text;
AudioFiles = GetAudioFiles();
return Page();
}
public List<SelectListItem> GetAudioFiles() {
List<SelectListItem> items = new List<SelectListItem>();
string? folder = _configuration["AzOpenAI:Audio2Text:Folder"];
string? path = $"wwwroot/audio/{folder}";

  // Get files with .wav or .mp3 extensions
string[] wavFiles = Directory.GetFiles(path, "*.wav");
string[] mp3Files = Directory.GetFiles(path, "*.mp3");
  // Combine the arrays
string[] list = wavFiles.Concat(mp3Files).ToArray();
foreach (var item in list) {
items.Add(new SelectListItem {
Value = item.ToString(),
Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item
});
}
return items;
}
}

Audio2Text.cshtml

@page
@model Audio2TextModel
@{ ViewData["Title"] = "Audio to Text Transcription"; }
<div class="text-center">
<h1 class="display-4">@ViewData["Title"]</h1>
<form method="post">
<select asp-items="@Model.AudioFiles" name="audioFile"></select>
<button type="submit">Submit</button>
</form>
</div>
@if (ViewData["AudioFile"] != null) {
<p></p>
<h3 class="text-danger">@ViewData["AudioFile"]</h3>
}
@if (ViewData["Transcription"] != null) {
<p class="alert alert-success">@ViewData["Transcription"]</p>
}

Text2Audio Razor Page

Text2Audio.cshtml.cs

using Azure.AI.OpenAI;
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using OpenAI;
using OpenAI.Audio;
namespace WhisperWebAzureOpenAI.Pages;
public class Text2AudioModel : PageModel {
private readonly ILogger<Text2AudioModel> _logger;
private readonly AzureOpenAIClient _openAIClient;
private readonly IConfiguration _configuration;
const string DefaultText = @"Security officials confiscating bottles of water, tubes of
shower gel and pots of face creams are a common sight at airport security.
But officials enforcing the no-liquids rule at South Korea's Incheon International Airport
have been busy seizing another outlawed item: kimchi, a concoction of salted and fermented
vegetables that is a staple of every Korean dinner table.";
public Text2AudioModel(ILogger<Text2AudioModel> logger,
AzureOpenAIClient client,
IConfiguration configuration
)
{
_logger = logger;
_openAIClient = client;
_configuration = configuration;
  // create wwroot/audio folder if it doesn't exist
string? folder = _configuration["AzOpenAI:Text2Audio:Folder"];
string? path = $"wwwroot/audio/{folder}";
if (!Directory.Exists(path)) {
Directory.CreateDirectory(path);
}
}
public void OnGet() {
ViewData["sampleText"] = DefaultText;
}
public async Task<IActionResult> OnPostAsync(string inputText) {
string? modelName = _configuration["AzOpenAI:Text2Audio:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
BinaryData speech = await audioClient.GenerateSpeechAsync(inputText, GeneratedSpeechVoice.Alloy);
  // Generate a consistent file name based on the hash of the input text
using var sha256 = System.Security.Cryptography.SHA256.Create();
byte[] hashBytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(inputText));
string hashString = BitConverter.ToString(hashBytes).Replace("-", "").ToLower();
string fileName = $"{hashString}.mp3";
string? folder = _configuration["AzOpenAI:Text2Audio:Folder"];
string filePath = Path.Combine("wwwroot", "audio", folder!, fileName);
  // Check if the file already exists
if (!System.IO.File.Exists(filePath)) {
using FileStream stream = System.IO.File.OpenWrite(filePath);
speech.ToStream().CopyTo(stream);
}
ViewData["sampleText"] = inputText;
ViewData["AudioFilePath"] = $"/audio/{folder}/{fileName}";
return Page();
}
}

Text2Audio.cshtml

@page
@model Text2AudioModel
@{ ViewData["Title"] = "Text to Audio"; }
<h1>@ViewData["Title"]</h1>
<div class="text-center">
<form method="post">
<label for="prompt">Enter text to convert to audio:</label>
<br />
<textarea type="text" name="inputText" id="inputText" cols="80" rows="5" required>@if (ViewData["sampleText"]!=null){@ViewData["sampleText"]}</textarea>
<br /><input type="submit" value="Submit" />
</form>
<p></p>
@if (ViewData["AudioFilePath"] != null) {
<audio controls>
<source src="@ViewData["AudioFilePath"]" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
}
</div>

Translation Razor Page

Translation.cshtml.cs

using Azure.AI.OpenAI;
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using Microsoft.AspNetCore.Mvc.Rendering;
using OpenAI;
namespace WhisperWebAzureOpenAI.Pages;
public class TranslationModel : PageModel {
private readonly ILogger<TranslationModel> _logger;
private readonly AzureOpenAIClient _openAIClient;
private readonly IConfiguration _configuration;
public List<SelectListItem>? AudioFiles { get; set; }
public TranslationModel(ILogger<TranslationModel> logger,
AzureOpenAIClient client,
IConfiguration configuration
)
{
_logger = logger;
_openAIClient = client;
_configuration = configuration;
  // create wwroot/audio folder if it doesn't exist
string? folder = _configuration["AzOpenAI:Translation:Folder"];
string? path = $"wwwroot/audio/{folder}";
if (!Directory.Exists(path)) {
Directory.CreateDirectory(path);
}
}
public void OnGet() {
AudioFiles = GetAudioFiles();
}
public async Task<IActionResult> OnPostAsync(string? audioFile) {
if (string.IsNullOrEmpty(audioFile)) {
return Page();
}
string? modelName = _configuration["AzOpenAI:Translation:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
var result = await audioClient.TranslateAudioAsync(audioFile);
if (result is null) {
return Page();
}

string? folder = _configuration["AzOpenAI:Translation:Folder"];
string? path = $"wwwroot/audio/{folder}";
ViewData["AudioFile"] = audioFile.StartsWith(path) ? audioFile.Substring(path.Length + 1) : audioFile;
ViewData["Transcription"] = result.Value.Text;
AudioFiles = GetAudioFiles();
return Page();
}
public List<SelectListItem> GetAudioFiles() {
List<SelectListItem> items = new List<SelectListItem>();
string? folder = _configuration["AzOpenAI:Translation:Folder"];
string? path = $"wwwroot/audio/{folder}";
  // Get files with .wav or .mp3 extensions
string[] wavFiles = Directory.GetFiles(path, "*.wav");
string[] mp3Files = Directory.GetFiles(path, "*.mp3");
  // Combine the arrays
string[] list = wavFiles.Concat(mp3Files).ToArray();
foreach (var item in list) {
items.Add(new SelectListItem {
Value = item.ToString(),
Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item
});
}
return items;
}
}

Translation.cshtml

@page
@model TranslationModel
@{ ViewData["Title"] = "Audio Translation"; }
<div class="text-center">
<h1 class="display-4">@ViewData["Title"]</h1>
<form method="post">
<select asp-items="@Model.AudioFiles" name="audioFile"></select>
<button type="submit">Submit</button>
</form>
</div>
@if (ViewData["AudioFile"] != null) {
<p></p>
<h3 class="text-danger">@ViewData["AudioFile"]</h3>
}
@if (ViewData["Transcription"] != null) {
<p class="alert alert-success">@ViewData["Transcription"]</p>
}

Adding pages to menu system

Let us see our new pages in action. But first, we need to add links to the three razor pages in the menu system. Open Pages/Shared/_Layout.cshtml in the editor and add these menu items inside the <ul> . . . </ul> block:

<li class="nav-item">
<a class="nav-link text-dark" asp-area="" asp-page="/Audio2Text">Audio to Text</a>
</li>
<li class="nav-item">
<a class="nav-link text-dark" asp-area="" asp-page="/Text2Audio">Text to Audio</a>
</li>
<li class="nav-item">
<a class="nav-link text-dark" asp-area="" asp-page="/Translation">Translation</a>
</li>

Let’s try it out!

Start the application by executing the following command in the terminal window:

dotnet watch

Audio to Text Page

Text to Audio Page

Translation

Bonus - Streaming audio

Going back to the Text2Audio pages, bear in mind that the audio is being saved to the server's file system then linked to the <audio ..> element. We can instead stream the audio without the need of saving a file on the server. Let us see how that works. In the Text2Audio,cshtml.cs, add the following method:

public async Task<IActionResult> OnGetSpeakAsync(string text) {
string? modelName = _configuration["AzOpenAI:Text2Audio:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
BinaryData speech = await audioClient.GenerateSpeechAsync(text, GeneratedSpeechVoice.Alloy);
MemoryStream memoryStream = new MemoryStream();
speech.ToStream().CopyTo(memoryStream);
memoryStream.Position = 0; // Reset the position to the beginning of the stream
return File(memoryStream, "audio/wav");
}

Add this code to Text2Audio,cshtml just before the closing </div> tag:

<button id="speakBtn" class="btn btn-warning">Speak</button>
<audio id="audioPlayer" type="audio/wav" ></audio>
<script>
document.getElementById('speakBtn').addEventListener('click', function () {
var text = encodeURIComponent(document.getElementById('inputText').value);
fetch('/Text2Audio?handler=Speak&text=' + text)
.then(response => response.blob())
.then(blob => {
var url = URL.createObjectURL(blob);
var audioPlayer = document.getElementById('audioPlayer');
audioPlayer.src = url;
audioPlayer.play();
});
});
</script>

Run the application and view the Text2Audio pages, you will notice a new "Speak" button:

Click on the speak button and you will be able to have the audio streamed back to you.

Conclusion

With the knowledge of how to use Azure OpenAI Whisper under your belt, I am sure you will build great apps. Happy Coding.

Medhat Elmasry

Monday, February 24, 2025

Using Azure OpenAI Whisper in an ASP.NET Razor Pages app