Using Azure OpenAI, we will explore the audio-centric Whisper neural net from OpenAI. You can find more details about Whisper at https://github.com/openai/whisper. The examples in this article assume that you have a developer account with Azure. These are the features we will explore:
- Transcribing audio into text
- Converting text into audio
- Translating audio from another spoken language into English text
Source Code: https://github.com/medhatelmasry/WhisperWebAzureOpenAI
Prerequisites:
- You need a subscription with Azure.
- The example uses Razor pages in ASP.NET 9.0
- We will use the standard VS Code editor
- You have installed the “C# Dev Kit” extension in VS Code
Getting Started
We will start by:
- creating an ASP.NET Razor Pages web app
- adding Azure packages to the project
Execute these commands in a terminal window:
dotnet new razor -o WhisperWebAzureOpenAIcd WhisperWebAzureOpenAIdotnet add package Azure.AI.OpenAI -v 2.2.0-beta.2dotnet add package Microsoft.Extensions.Azure
Start VS Code in the current project folder with:
code .
Add the following to appsettings.Development.json:
"AzOpenAI": {"Key": "YOUR-AZURE-OPENAI-KEY-HERE","Url": "YOUR-AZURE-OPENAI-ENDPOINT-HERE","Audio2Text":{"Model": "whisper","Folder": "audio2text"},"Text2Audio": {"Model": "tts","Folder": "text2audio"},"Translation": {"Model": "whisper","Folder": "translation"}}
NOTE: Replace the value of the Key and Url settings above with your Azure OpenAI key and endpoint.
Model whisper is used for audio to text and audio translations. Model tts is used for converting text into audio.
Add this service to Program.cs:
builder.Services.AddAzureClients(clientBuilder =>{// read key from configurationstring? key = builder.Configuration["AzOpenAI:Key"];string? url = builder.Configuration["AzOpenAI:Url"];var credentials = new AzureKeyCredential(key!);// Register a custom client factoryclientBuilder.AddClient<AzureOpenAIClient, AzureOpenAIClientOptions>((options, _, _) => new AzureOpenAIClient(new Uri(url!), credentials, options));});
Download a zip file from https://medhat.ca/images/audio.zip. Extract the file in the wwwroot folder. This creates the following directory structure under wwwroot:
aboutSpeechSdk.wavaudio_houseplant_care.mp3speechService.wavTalkForAFewSeconds16.wavwikipediaOcelot.wav
Also, note the presence of these audio files in the /wwwroot/audio/translation folder:
audio_arabic.mp3audio_french.wavaudio_spanish.mp3
Add razor pages
In VS Code, view your project in the “Solution Explorer” tab:
Right-click on the Pages folder and add a razor page named Audio2Text:
Similarly, add these two razor pages:
- Text2Audio
- Translation
Make these code replacements into the respective files:
Audio2Text Razor Page
Audio2Text.cshtml.cs
using Azure.AI.OpenAI;using Microsoft.AspNetCore.Mvc;using Microsoft.AspNetCore.Mvc.RazorPages;using Microsoft.AspNetCore.Mvc.Rendering;namespace WhisperWebAzureOpenAI.Pages;public class Audio2TextModel : PageModel {private readonly ILogger<Audio2TextModel> _logger;private readonly AzureOpenAIClient _azureOpenAIClient;private readonly IConfiguration _configuration;public List<SelectListItem>? AudioFiles { get; set; }public Audio2TextModel(ILogger<Audio2TextModel> logger,AzureOpenAIClient client,IConfiguration configuration){_logger = logger;_azureOpenAIClient = client;_configuration = configuration;// create wwroot/audio folder if it doesn't existstring? folder = _configuration["AzOpenAI:Audio2Text:Folder"];string? path = $"wwwroot/audio/{folder}";if (!Directory.Exists(path)) {Directory.CreateDirectory(path);}}public void OnGet() {AudioFiles = GetAudioFiles();}public async Task<IActionResult> OnPostAsync(string? audioFile) {if (string.IsNullOrEmpty(audioFile)) {return Page();}string? deploymentName = _configuration["AzOpenAI:Audio2Text:Model"];var audioClient = _azureOpenAIClient.GetAudioClient(deploymentName);var result = await audioClient.TranscribeAudioAsync(audioFile);if (result is null) {return Page();}string? folder = _configuration["AzOpenAI:Audio2Text:Folder"];string? path = $"wwwroot/audio/{folder}";ViewData["AudioFile"] = audioFile.StartsWith(path) ? audioFile.Substring(path.Length + 1) : audioFile;ViewData["Transcription"] = result.Value.Text;AudioFiles = GetAudioFiles();return Page();}public List<SelectListItem> GetAudioFiles() {List<SelectListItem> items = new List<SelectListItem>();string? folder = _configuration["AzOpenAI:Audio2Text:Folder"];string? path = $"wwwroot/audio/{folder}";// Get files with .wav or .mp3 extensionsstring[] wavFiles = Directory.GetFiles(path, "*.wav");string[] mp3Files = Directory.GetFiles(path, "*.mp3");// Combine the arraysstring[] list = wavFiles.Concat(mp3Files).ToArray();foreach (var item in list) {items.Add(new SelectListItem {Value = item.ToString(),Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item});}return items;}}
Audio2Text.cshtml
@page@model Audio2TextModel@{ ViewData["Title"] = "Audio to Text Transcription"; }<div class="text-center"><h1 class="display-4">@ViewData["Title"]</h1><form method="post"><select asp-items="@Model.AudioFiles" name="audioFile"></select><button type="submit">Submit</button></form></div>@if (ViewData["AudioFile"] != null) {<p></p><h3 class="text-danger">@ViewData["AudioFile"]</h3>}@if (ViewData["Transcription"] != null) {<p class="alert alert-success">@ViewData["Transcription"]</p>}
Text2Audio Razor Page
Text2Audio.cshtml.cs
using Azure.AI.OpenAI;using Microsoft.AspNetCore.Mvc;using Microsoft.AspNetCore.Mvc.RazorPages;using OpenAI;using OpenAI.Audio;namespace WhisperWebAzureOpenAI.Pages;public class Text2AudioModel : PageModel {private readonly ILogger<Text2AudioModel> _logger;private readonly AzureOpenAIClient _openAIClient;private readonly IConfiguration _configuration;const string DefaultText = @"Security officials confiscating bottles of water, tubes ofshower gel and pots of face creams are a common sight at airport security.But officials enforcing the no-liquids rule at South Korea's Incheon International Airporthave been busy seizing another outlawed item: kimchi, a concoction of salted and fermentedvegetables that is a staple of every Korean dinner table.";public Text2AudioModel(ILogger<Text2AudioModel> logger,AzureOpenAIClient client,IConfiguration configuration){_logger = logger;_openAIClient = client;_configuration = configuration;// create wwroot/audio folder if it doesn't existstring? folder = _configuration["AzOpenAI:Text2Audio:Folder"];string? path = $"wwwroot/audio/{folder}";if (!Directory.Exists(path)) {Directory.CreateDirectory(path);}}public void OnGet() {ViewData["sampleText"] = DefaultText;}public async Task<IActionResult> OnPostAsync(string inputText) {string? modelName = _configuration["AzOpenAI:Text2Audio:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);BinaryData speech = await audioClient.GenerateSpeechAsync(inputText, GeneratedSpeechVoice.Alloy);// Generate a consistent file name based on the hash of the input textusing var sha256 = System.Security.Cryptography.SHA256.Create();byte[] hashBytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(inputText));string hashString = BitConverter.ToString(hashBytes).Replace("-", "").ToLower();string fileName = $"{hashString}.mp3";string? folder = _configuration["AzOpenAI:Text2Audio:Folder"];string filePath = Path.Combine("wwwroot", "audio", folder!, fileName);// Check if the file already existsif (!System.IO.File.Exists(filePath)) {using FileStream stream = System.IO.File.OpenWrite(filePath);speech.ToStream().CopyTo(stream);}ViewData["sampleText"] = inputText;ViewData["AudioFilePath"] = $"/audio/{folder}/{fileName}";return Page();}}
Text2Audio.cshtml
@page@model Text2AudioModel@{ ViewData["Title"] = "Text to Audio"; }<h1>@ViewData["Title"]</h1><div class="text-center"><form method="post"><label for="prompt">Enter text to convert to audio:</label><br /><textarea type="text" name="inputText" id="inputText" cols="80" rows="5" required>@if (ViewData["sampleText"]!=null){@ViewData["sampleText"]}</textarea><br /><input type="submit" value="Submit" /></form><p></p>@if (ViewData["AudioFilePath"] != null) {<audio controls><source src="@ViewData["AudioFilePath"]" type="audio/mpeg">Your browser does not support the audio element.</audio>}</div>
Translation Razor Page
Translation.cshtml.cs
using Azure.AI.OpenAI;using Microsoft.AspNetCore.Mvc;using Microsoft.AspNetCore.Mvc.RazorPages;using Microsoft.AspNetCore.Mvc.Rendering;using OpenAI;namespace WhisperWebAzureOpenAI.Pages;public class TranslationModel : PageModel {private readonly ILogger<TranslationModel> _logger;private readonly AzureOpenAIClient _openAIClient;private readonly IConfiguration _configuration;public List<SelectListItem>? AudioFiles { get; set; }public TranslationModel(ILogger<TranslationModel> logger,AzureOpenAIClient client,IConfiguration configuration){_logger = logger;_openAIClient = client;_configuration = configuration;// create wwroot/audio folder if it doesn't existstring? folder = _configuration["AzOpenAI:Translation:Folder"];string? path = $"wwwroot/audio/{folder}";if (!Directory.Exists(path)) {Directory.CreateDirectory(path);}}public void OnGet() {AudioFiles = GetAudioFiles();}public async Task<IActionResult> OnPostAsync(string? audioFile) {if (string.IsNullOrEmpty(audioFile)) {return Page();}string? modelName = _configuration["AzOpenAI:Translation:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);var result = await audioClient.TranslateAudioAsync(audioFile);if (result is null) {return Page();}string? folder = _configuration["AzOpenAI:Translation:Folder"];string? path = $"wwwroot/audio/{folder}";ViewData["AudioFile"] = audioFile.StartsWith(path) ? audioFile.Substring(path.Length + 1) : audioFile;ViewData["Transcription"] = result.Value.Text;AudioFiles = GetAudioFiles();return Page();}public List<SelectListItem> GetAudioFiles() {List<SelectListItem> items = new List<SelectListItem>();string? folder = _configuration["AzOpenAI:Translation:Folder"];string? path = $"wwwroot/audio/{folder}";// Get files with .wav or .mp3 extensionsstring[] wavFiles = Directory.GetFiles(path, "*.wav");string[] mp3Files = Directory.GetFiles(path, "*.mp3");// Combine the arraysstring[] list = wavFiles.Concat(mp3Files).ToArray();foreach (var item in list) {items.Add(new SelectListItem {Value = item.ToString(),Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item});}return items;}}
Translation.cshtml
@page@model TranslationModel@{ ViewData["Title"] = "Audio Translation"; }<div class="text-center"><h1 class="display-4">@ViewData["Title"]</h1><form method="post"><select asp-items="@Model.AudioFiles" name="audioFile"></select><button type="submit">Submit</button></form></div>@if (ViewData["AudioFile"] != null) {<p></p><h3 class="text-danger">@ViewData["AudioFile"]</h3>}@if (ViewData["Transcription"] != null) {<p class="alert alert-success">@ViewData["Transcription"]</p>}
Adding pages to menu system
Let us see our new pages in action. But first, we need to add links to the three razor pages in the menu system. Open Pages/Shared/_Layout.cshtml in the editor and add these menu items inside the <ul> . . . </ul> block:
<li class="nav-item"><a class="nav-link text-dark" asp-area="" asp-page="/Audio2Text">Audio to Text</a></li><li class="nav-item"><a class="nav-link text-dark" asp-area="" asp-page="/Text2Audio">Text to Audio</a></li><li class="nav-item"><a class="nav-link text-dark" asp-area="" asp-page="/Translation">Translation</a></li>
Let’s try it out!
Start the application by executing the following command in the terminal window:
dotnet watch
Audio to Text Page
Text to Audio Page
Translation
Bonus - Streaming audio
Going back to the Text2Audio pages, bear in mind that the audio is being saved to the server's file system then linked to the <audio ..> element. We can instead stream the audio without the need of saving a file on the server. Let us see how that works. In the Text2Audio,cshtml.cs, add the following method:
public async Task<IActionResult> OnGetSpeakAsync(string text) {string? modelName = _configuration["AzOpenAI:Text2Audio:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);BinaryData speech = await audioClient.GenerateSpeechAsync(text, GeneratedSpeechVoice.Alloy);MemoryStream memoryStream = new MemoryStream();speech.ToStream().CopyTo(memoryStream);memoryStream.Position = 0; // Reset the position to the beginning of the streamreturn File(memoryStream, "audio/wav");}
Add this code to Text2Audio,cshtml just before the closing </div> tag:
<button id="speakBtn" class="btn btn-warning">Speak</button><audio id="audioPlayer" type="audio/wav" ></audio><script>document.getElementById('speakBtn').addEventListener('click', function () {var text = encodeURIComponent(document.getElementById('inputText').value);fetch('/Text2Audio?handler=Speak&text=' + text).then(response => response.blob()).then(blob => {var url = URL.createObjectURL(blob);var audioPlayer = document.getElementById('audioPlayer');audioPlayer.src = url;audioPlayer.play();});});</script>
Run the application and view the Text2Audio pages, you will notice a new "Speak" button:
No comments:
Post a Comment