Medhat Elmasry: Using OpenAI Whisper in an ASP.NET Razor Pages app

In this article, we will explore the audio-centric Whisper neural net from OpenAI. You can find more details about Whisper at https://github.com/openai/whisper. The examples in this article assume that you have a developer account with OpenAI. These are the features we will explore:

Transcribing audio into text
Converting text into audio
Translating audio from another spoken language into English text

Source Code: https://github.com/medhatelmasry/WhisperWebOpenAI

Prerequisites:

You need a developer subscription with OpenAI.
The example uses Razor pages in ASP.NET 9.0
The editor used is the standard VS Code
You have installed the “C# Dev Kit” extension in VS Code

Getting Started

We will start by:

creating an ASP.NET Razor Pages web app
adding the OpenAI package to the project

Execute these commands in a terminal window:

dotnet new razor -o WhisperWebOpenAI
cd WhisperWebOpenAI
dotnet add package OpenAI

Start VS Code in the current project folder with:

code .

Add the following to appsettings.Development.json:

"OpenAI": {
"Key": "YOUR-OpenAI-KEY",
"Audio2Text": {
"Model": "whisper-1",
"Folder": "audio2text"
},
"Text2Audio": {
"Model": "tts-1",
"Folder": "text2audio"
},
"Translation": {
"Model": "whisper-1",
"Folder": "translation"
}
}

NOTE: Replace the value of the Key setting above with your OpenAI key.

Model whisper-1 is used for audio to text and audio translations. Model tts-1 is used for converting text into audio.

Add this service to Program.cs:

// Add OpenAI service
builder.Services.AddSingleton<OpenAIClient>(sp =>
{
string? apiKey = builder.Configuration["OpenAI:Key"];
return new OpenAIClient(apiKey);
});

Download a zip file from https://medhat.ca/images/audio.zip. Extract the file in the wwwroot folder. This creates the following directory structure under wwwroot:

Note the presence of these audio files in the /wwwroot/audio/audio2text folder:

aboutSpeechSdk.wav
audio_houseplant_care.mp3
speechService.wav
TalkForAFewSeconds16.wav
wikipediaOcelot.wav

Also, note the presence of these audio files in the /wwwroot/audio/translation folder:

audio_arabic.mp3
audio_french.wav
audio_spanish.mp3

Add razor pages

In VS Code, view your project in the “Solution Explorer” tab:

Right-click on the Pages folder and add a razor page named Audio2Text:

Similarly, add these two razor pages:

Text2Audio
Translation

Make these code replacements into the respective files:

Audio2Text Razor Page

Audio2Text.cshtml.cs

using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using Microsoft.AspNetCore.Mvc.Rendering;
using OpenAI;

namespace WhisperWebOpenAI.Pages;

public class Audio2TextModel : PageModel {
private readonly ILogger<Audio2TextModel> _logger;
private readonly OpenAIClient _openAIClient;
private readonly IConfiguration _configuration;
public List<SelectListItem>? AudioFiles { get; set; }
public Audio2TextModel(ILogger<Audio2TextModel> logger,
OpenAIClient client,
IConfiguration configuration
)
{
_logger = logger;
_openAIClient = client;
_configuration = configuration;
// create wwroot/audio folder if it doesn't exist
string? folder = _configuration["OpenAI:Audio2Text:Folder"];
string? path = $"wwwroot/audio/{folder}";
if (!Directory.Exists(path)) {
Directory.CreateDirectory(path);
}
}
public void OnGet() {
AudioFiles = GetWaveFiles();
}
public async Task<IActionResult> OnPostAsync(string? waveFile) {
if (string.IsNullOrEmpty(waveFile)){
return Page();
}
string? modelName = _configuration["OpenAI:Audio2Text:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
var result = await audioClient.TranscribeAudioAsync(waveFile);
if (result is null) {
return Page();
}
string? folder = _configuration["OpenAI:Audio2Text:Folder"];
string? path = $"wwwroot/audio/{folder}";
ViewData["AudioFile"] = waveFile.StartsWith(path) ? waveFile.Substring(path.Length + 1) : waveFile;
ViewData["Transcription"] = result.Value.Text;
AudioFiles = GetWaveFiles();
return Page();
}
public List<SelectListItem> GetWaveFiles() {
List<SelectListItem> items = new List<SelectListItem>();
string? folder = _configuration["OpenAI:Audio2Text:Folder"];
string? path = $"wwwroot/audio/{folder}";

// Get files with .wav or .mp3 extensions
string[] wavFiles = Directory.GetFiles(path, "*.wav");
string[] mp3Files = Directory.GetFiles(path, "*.mp3");
// Combine the arrays
string[] list = wavFiles.Concat(mp3Files).ToArray();
foreach (var item in list) {
items.Add(new SelectListItem
{
Value = item.ToString(),
Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item
});
}
return items;
}
}

Audio2Text.cshtml

@page
@model Audio2TextModel

@{ ViewData["Title"] = "Audio to Text Transcription"; }

<div class="text-center">
<h1 class="display-4">@ViewData["Title"]</h1>
<form method="post">
<select asp-items="@Model.AudioFiles" name="waveFile"></select>
<button type="submit">Submit</button>
</form>
</div>
@if (ViewData["AudioFile"] != null) {
<p></p>
<h3 class="text-danger">@ViewData["AudioFile"]</h3>
}
@if (ViewData["Transcription"] != null) {
<p class="alert alert-success">@ViewData["Transcription"]</p>
}

Text2Audio Razor Page

Text2Audio.cshtml.cs

using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using OpenAI;
using OpenAI.Audio;

namespace WhisperWebOpenAI.Pages;

public class Text2AudioModel : PageModel {
private readonly ILogger<Text2AudioModel> _logger;
private readonly OpenAIClient _openAIClient;
private readonly IConfiguration _configuration;
const string DefaultText = @"Security officials confiscating bottles of water, tubes of
shower gel and pots of face creams are a common sight at airport security.
But officials enforcing the no-liquids rule at South Korea's Incheon International Airport
have been busy seizing another outlawed item: kimchi, a concoction of salted and fermented
vegetables that is a staple of every Korean dinner table.";
public Text2AudioModel(ILogger<Text2AudioModel> logger,
OpenAIClient client,
IConfiguration configuration
)
{
_logger = logger;
_openAIClient = client;
_configuration = configuration;
// create wwroot/audio folder if it doesn't exist
string? folder = _configuration["OpenAI:Text2Audio:Folder"];
string? path = $"wwwroot/audio/{folder}";
if (!Directory.Exists(path)) {
Directory.CreateDirectory(path);
}
}
public void OnGet() {
ViewData["sampleText"] = DefaultText;
}
public async Task<IActionResult> OnPostAsync(string inputText) {
string? modelName = _configuration["OpenAI:Text2Audio:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
BinaryData speech = await audioClient.GenerateSpeechAsync(inputText, GeneratedSpeechVoice.Alloy);
// Generate a consistent file name based on the hash of the input text
using var sha256 = System.Security.Cryptography.SHA256.Create();
byte[] hashBytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(inputText));
string hashString = BitConverter.ToString(hashBytes).Replace("-", "").ToLower();
string fileName = $"{hashString}.mp3";
string? folder = _configuration["OpenAI:Text2Audio:Folder"];
string filePath = Path.Combine("wwwroot", "audio", folder!, fileName);
// Check if the file already exists
if (!System.IO.File.Exists(filePath)) {
using FileStream stream = System.IO.File.OpenWrite(filePath);
speech.ToStream().CopyTo(stream);
}
ViewData["sampleText"] = inputText;
ViewData["AudioFilePath"] = $"/audio/{folder}/{fileName}";
return Page();
}
}

Text2Audio.cshtml

@page
@model Text2AudioModel

@{ ViewData["Title"] = "Text to Audio"; }

<h1>@ViewData["Title"]</h1>
<div class="text-center">
<form method="post">
<label for="prompt">Enter text to convert to audio:</label>
<br />
<textarea type="text" name="inputText" id="inputText" cols="80" rows="5" required>@if (ViewData["sampleText"]!=null){@ViewData["sampleText"]}</textarea>
<br /><input type="submit" value="Submit" />
</form>
<p></p>
@if (ViewData["AudioFilePath"] != null) {
<audio controls>
<source src="@ViewData["AudioFilePath"]" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
}
</div>

Translation Razor Page

Translation.cshtml.cs

using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using Microsoft.AspNetCore.Mvc.Rendering;
using OpenAI;

namespace WhisperWebOpenAI.Pages;

public class TranslationModel : PageModel {
private readonly ILogger<TranslationModel> _logger;
private readonly OpenAIClient _openAIClient;
private readonly IConfiguration _configuration;
public List<SelectListItem>? AudioFiles { get; set; }
public TranslationModel(ILogger<TranslationModel> logger,
OpenAIClient client,
IConfiguration configuration
)
{
_logger = logger;
_openAIClient = client;
_configuration = configuration;
// create wwroot/audio folder if it doesn't exist
string? folder = _configuration["OpenAI:Translation:Folder"];
string? path = $"wwwroot/audio/{folder}";
if (!Directory.Exists(path)) {
Directory.CreateDirectory(path);
}
}
public void OnGet() {
AudioFiles = GetAudioFiles();
}
public async Task<IActionResult> OnPostAsync(string? audioFile) {
if (string.IsNullOrEmpty(audioFile)) {
return Page();
}
string? modelName = _configuration["OpenAI:Translation:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
var result = await audioClient.TranslateAudioAsync(audioFile);
if (result is null) {
return Page();
}
string? folder = _configuration["OpenAI:Translation:Folder"];
string? path = $"wwwroot/audio/{folder}";
ViewData["AudioFile"] = audioFile.StartsWith(path) ? audioFile.Substring(path.Length + 1) : audioFile;
ViewData["Transcription"] = result.Value.Text;
AudioFiles = GetAudioFiles();
return Page();
}
public List<SelectListItem> GetAudioFiles() {
List<SelectListItem> items = new List<SelectListItem>();
string? folder = _configuration["OpenAI:Translation:Folder"];
string? path = $"wwwroot/audio/{folder}";
// Get files with .wav or .mp3 extensions
string[] wavFiles = Directory.GetFiles(path, "*.wav");
string[] mp3Files = Directory.GetFiles(path, "*.mp3");
// Combine the arrays
string[] list = wavFiles.Concat(mp3Files).ToArray();
foreach (var item in list) {
items.Add(new SelectListItem {
Value = item.ToString(),
Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item
});
}
return items;
}
}

Translation.cshtml

@page
@model TranslationModel

@{ ViewData["Title"] = "Audio Translation"; }

<div class="text-center">
<h1 class="display-4">@ViewData["Title"]</h1>
<form method="post">
<select asp-items="@Model.AudioFiles" name="audioFile"></select>
<button type="submit">Submit</button>
</form>
</div>
@if (ViewData["AudioFile"] != null) {
<p></p>
<h3 class="text-danger">@ViewData["AudioFile"]</h3>
}
@if (ViewData["Transcription"] != null) {
<p class="alert alert-success">@ViewData["Transcription"]</p>
}

Adding pages to menu system

Let us see our new pages in action. But first, we need to add links to the three razor pages in the menu system. Open Pages/Shared/_Layout.cshtml in the editor and add these menu items inside the <ul> . . . </ul> block:

<li class="nav-item">
<a class="nav-link text-dark" asp-area="" asp-page="/Audio2Text">Audio to Text</a>
</li>
<li class="nav-item">
<a class="nav-link text-dark" asp-area="" asp-page="/Text2Audio">Text to Audio</a>
</li>
<li class="nav-item">
<a class="nav-link text-dark" asp-area="" asp-page="/Translation">Translation</a>
</li>

Let’s try it out!

Start the application by executing the following command in the terminal window:

dotnet watch

Audio to Text Page

Text to Audio Page

Translation

Bonus - Streaming audio

Going back to the Text2Audio pages, bear in mind that the audio is being saved to the server's file system then linked to the <audio ..> element. We can instead stream the audio without the need of saving a file on the server. Let us see how that works. In the Text2Audio,cshtml.cs, add the following method:

public async Task<IActionResult> OnGetSpeakAsync(string text) {
string? modelName = _configuration["OpenAI:Text2Audio:Model"];
var audioClient = _openAIClient.GetAudioClient(modelName);
BinaryData speech = await audioClient.GenerateSpeechAsync(text, GeneratedSpeechVoice.Alloy);
MemoryStream memoryStream = new MemoryStream();
speech.ToStream().CopyTo(memoryStream);
memoryStream.Position = 0; // Reset the position to the beginning of the stream
return File(memoryStream, "audio/wav");
}

Add this code to Text2Audio,cshtml just before the closing </div> tag:

<button id="speakBtn" class="btn btn-warning">Speak</button>
<audio id="audioPlayer" type="audio/wav" ></audio>
<script>
document.getElementById('speakBtn').addEventListener('click', function () {
var text = encodeURIComponent(document.getElementById('inputText').value);
fetch('/Text2Audio?handler=Speak&text=' + text)
.then(response => response.blob())
.then(blob => {
var url = URL.createObjectURL(blob);
var audioPlayer = document.getElementById('audioPlayer');
audioPlayer.src = url;
audioPlayer.play();
});
});
</script>

Run the application and view the Text2Audio pages, you will notice a new "Speak" button:

Click on the speak button and you will be able to have the audio streamed back to you.

Conclusion

With the knowledge of how to use OpenAI Whisper under your belt, I am sure you will build great apps. Happy Coding.

Medhat Elmasry

Thursday, February 20, 2025

Using OpenAI Whisper in an ASP.NET Razor Pages app

Prerequisites:

Getting Started

Add razor pages

Audio2Text Razor Page

Audio2Text.cshtml.cs

Audio2Text.cshtml

Text2Audio Razor Page

Text2Audio.cshtml.cs

Text2Audio.cshtml

Translation Razor Page

Translation.cshtml.cs

Translation.cshtml

Adding pages to menu system

Let’s try it out!

Audio to Text Page

Text to Audio Page

Translation

Bonus - Streaming audio

Conclusion

No comments:

Post a Comment

About Me

Blog Archive