In this article, we will explore the audio-centric Whisper neural net from OpenAI. You can find more details about Whisper at https://github.com/openai/whisper. The examples in this article assume that you have a developer account with OpenAI. These are the features we will explore:
- Transcribing audio into text
- Converting text into audio
- Translating audio from another spoken language into English text
Source Code: https://github.com/medhatelmasry/WhisperWebOpenAI
Prerequisites:
- You need a developer subscription with OpenAI.
- The example uses Razor pages in ASP.NET 9.0
- The editor used is the standard VS Code
- You have installed the “C# Dev Kit” extension in VS Code
Getting Started
We will start by:
- creating an ASP.NET Razor Pages web app
- adding the OpenAI package to the project
Execute these commands in a terminal window:
dotnet new razor -o WhisperWebOpenAIcd WhisperWebOpenAIdotnet add package OpenAI
Start VS Code in the current project folder with:
code .
Add the following to appsettings.Development.json:
"OpenAI": {"Key": "YOUR-OpenAI-KEY","Audio2Text": {"Model": "whisper-1","Folder": "audio2text"},"Text2Audio": {"Model": "tts-1","Folder": "text2audio"},"Translation": {"Model": "whisper-1","Folder": "translation"}}
NOTE: Replace the value of the Key setting above with your OpenAI key.
Model whisper-1 is used for audio to text and audio translations. Model tts-1 is used for converting text into audio.
Add this service to Program.cs:
// Add OpenAI servicebuilder.Services.AddSingleton<OpenAIClient>(sp =>{string? apiKey = builder.Configuration["OpenAI:Key"];return new OpenAIClient(apiKey);});
Download a zip file from https://medhat.ca/images/audio.zip. Extract the file in the wwwroot folder. This creates the following directory structure under wwwroot:
aboutSpeechSdk.wavaudio_houseplant_care.mp3speechService.wavTalkForAFewSeconds16.wavwikipediaOcelot.wav
Also, note the presence of these audio files in the /wwwroot/audio/translation folder:
audio_arabic.mp3audio_french.wavaudio_spanish.mp3
Add razor pages
In VS Code, view your project in the “Solution Explorer” tab:
Right-click on the Pages folder and add a razor page named Audio2Text:
Similarly, add these two razor pages:
- Text2Audio
- Translation
Make these code replacements into the respective files:
Audio2Text Razor Page
Audio2Text.cshtml.cs
using Microsoft.AspNetCore.Mvc;using Microsoft.AspNetCore.Mvc.RazorPages;using Microsoft.AspNetCore.Mvc.Rendering;using OpenAI;
namespace WhisperWebOpenAI.Pages;
public class Audio2TextModel : PageModel {private readonly ILogger<Audio2TextModel> _logger;private readonly OpenAIClient _openAIClient;private readonly IConfiguration _configuration;public List<SelectListItem>? AudioFiles { get; set; }public Audio2TextModel(ILogger<Audio2TextModel> logger,OpenAIClient client,IConfiguration configuration){_logger = logger;_openAIClient = client;_configuration = configuration;// create wwroot/audio folder if it doesn't existstring? folder = _configuration["OpenAI:Audio2Text:Folder"];string? path = $"wwwroot/audio/{folder}";if (!Directory.Exists(path)) {Directory.CreateDirectory(path);}}public void OnGet() {AudioFiles = GetWaveFiles();}public async Task<IActionResult> OnPostAsync(string? waveFile) {if (string.IsNullOrEmpty(waveFile)){return Page();}string? modelName = _configuration["OpenAI:Audio2Text:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);var result = await audioClient.TranscribeAudioAsync(waveFile);if (result is null) {return Page();}string? folder = _configuration["OpenAI:Audio2Text:Folder"];string? path = $"wwwroot/audio/{folder}";ViewData["AudioFile"] = waveFile.StartsWith(path) ? waveFile.Substring(path.Length + 1) : waveFile;ViewData["Transcription"] = result.Value.Text;AudioFiles = GetWaveFiles();return Page();}public List<SelectListItem> GetWaveFiles() {List<SelectListItem> items = new List<SelectListItem>();string? folder = _configuration["OpenAI:Audio2Text:Folder"];string? path = $"wwwroot/audio/{folder}";// Get files with .wav or .mp3 extensionsstring[] wavFiles = Directory.GetFiles(path, "*.wav");string[] mp3Files = Directory.GetFiles(path, "*.mp3");// Combine the arraysstring[] list = wavFiles.Concat(mp3Files).ToArray();foreach (var item in list) {items.Add(new SelectListItem{Value = item.ToString(),Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item});}return items;}}
Audio2Text.cshtml
@page@model Audio2TextModel
@{ ViewData["Title"] = "Audio to Text Transcription"; }
<div class="text-center"><h1 class="display-4">@ViewData["Title"]</h1><form method="post"><select asp-items="@Model.AudioFiles" name="waveFile"></select><button type="submit">Submit</button></form></div>@if (ViewData["AudioFile"] != null) {<p></p><h3 class="text-danger">@ViewData["AudioFile"]</h3>}@if (ViewData["Transcription"] != null) {<p class="alert alert-success">@ViewData["Transcription"]</p>}
Text2Audio Razor Page
Text2Audio.cshtml.cs
using Microsoft.AspNetCore.Mvc;using Microsoft.AspNetCore.Mvc.RazorPages;using OpenAI;using OpenAI.Audio;
namespace WhisperWebOpenAI.Pages;
public class Text2AudioModel : PageModel {private readonly ILogger<Text2AudioModel> _logger;private readonly OpenAIClient _openAIClient;private readonly IConfiguration _configuration;const string DefaultText = @"Security officials confiscating bottles of water, tubes ofshower gel and pots of face creams are a common sight at airport security.But officials enforcing the no-liquids rule at South Korea's Incheon International Airporthave been busy seizing another outlawed item: kimchi, a concoction of salted and fermentedvegetables that is a staple of every Korean dinner table.";public Text2AudioModel(ILogger<Text2AudioModel> logger,OpenAIClient client,IConfiguration configuration){_logger = logger;_openAIClient = client;_configuration = configuration;// create wwroot/audio folder if it doesn't existstring? folder = _configuration["OpenAI:Text2Audio:Folder"];string? path = $"wwwroot/audio/{folder}";if (!Directory.Exists(path)) {Directory.CreateDirectory(path);}}public void OnGet() {ViewData["sampleText"] = DefaultText;}public async Task<IActionResult> OnPostAsync(string inputText) {string? modelName = _configuration["OpenAI:Text2Audio:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);BinaryData speech = await audioClient.GenerateSpeechAsync(inputText, GeneratedSpeechVoice.Alloy);// Generate a consistent file name based on the hash of the input textusing var sha256 = System.Security.Cryptography.SHA256.Create();byte[] hashBytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(inputText));string hashString = BitConverter.ToString(hashBytes).Replace("-", "").ToLower();string fileName = $"{hashString}.mp3";string? folder = _configuration["OpenAI:Text2Audio:Folder"];string filePath = Path.Combine("wwwroot", "audio", folder!, fileName);// Check if the file already existsif (!System.IO.File.Exists(filePath)) {using FileStream stream = System.IO.File.OpenWrite(filePath);speech.ToStream().CopyTo(stream);}ViewData["sampleText"] = inputText;ViewData["AudioFilePath"] = $"/audio/{folder}/{fileName}";return Page();}}
Text2Audio.cshtml
@page@model Text2AudioModel
@{ ViewData["Title"] = "Text to Audio"; }
<h1>@ViewData["Title"]</h1><div class="text-center"><form method="post"><label for="prompt">Enter text to convert to audio:</label><br /><textarea type="text" name="inputText" id="inputText" cols="80" rows="5" required>@if (ViewData["sampleText"]!=null){@ViewData["sampleText"]}</textarea><br /><input type="submit" value="Submit" /></form><p></p>@if (ViewData["AudioFilePath"] != null) {<audio controls><source src="@ViewData["AudioFilePath"]" type="audio/mpeg">Your browser does not support the audio element.</audio>}</div>
Translation Razor Page
Translation.cshtml.cs
using Microsoft.AspNetCore.Mvc;using Microsoft.AspNetCore.Mvc.RazorPages;using Microsoft.AspNetCore.Mvc.Rendering;using OpenAI;
namespace WhisperWebOpenAI.Pages;
public class TranslationModel : PageModel {private readonly ILogger<TranslationModel> _logger;private readonly OpenAIClient _openAIClient;private readonly IConfiguration _configuration;public List<SelectListItem>? AudioFiles { get; set; }public TranslationModel(ILogger<TranslationModel> logger,OpenAIClient client,IConfiguration configuration){_logger = logger;_openAIClient = client;_configuration = configuration;// create wwroot/audio folder if it doesn't existstring? folder = _configuration["OpenAI:Translation:Folder"];string? path = $"wwwroot/audio/{folder}";if (!Directory.Exists(path)) {Directory.CreateDirectory(path);}}public void OnGet() {AudioFiles = GetAudioFiles();}public async Task<IActionResult> OnPostAsync(string? audioFile) {if (string.IsNullOrEmpty(audioFile)) {return Page();}string? modelName = _configuration["OpenAI:Translation:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);var result = await audioClient.TranslateAudioAsync(audioFile);if (result is null) {return Page();}string? folder = _configuration["OpenAI:Translation:Folder"];string? path = $"wwwroot/audio/{folder}";ViewData["AudioFile"] = audioFile.StartsWith(path) ? audioFile.Substring(path.Length + 1) : audioFile;ViewData["Transcription"] = result.Value.Text;AudioFiles = GetAudioFiles();return Page();}public List<SelectListItem> GetAudioFiles() {List<SelectListItem> items = new List<SelectListItem>();string? folder = _configuration["OpenAI:Translation:Folder"];string? path = $"wwwroot/audio/{folder}";// Get files with .wav or .mp3 extensionsstring[] wavFiles = Directory.GetFiles(path, "*.wav");string[] mp3Files = Directory.GetFiles(path, "*.mp3");// Combine the arraysstring[] list = wavFiles.Concat(mp3Files).ToArray();foreach (var item in list) {items.Add(new SelectListItem {Value = item.ToString(),Text = item.StartsWith(path) ? item.Substring(path.Length + 1) : item});}return items;}}
Translation.cshtml
@page@model TranslationModel
@{ ViewData["Title"] = "Audio Translation"; }
<div class="text-center"><h1 class="display-4">@ViewData["Title"]</h1><form method="post"><select asp-items="@Model.AudioFiles" name="audioFile"></select><button type="submit">Submit</button></form></div>@if (ViewData["AudioFile"] != null) {<p></p><h3 class="text-danger">@ViewData["AudioFile"]</h3>}@if (ViewData["Transcription"] != null) {<p class="alert alert-success">@ViewData["Transcription"]</p>}
Adding pages to menu system
Let us see our new pages in action. But first, we need to add links to the three razor pages in the menu system. Open Pages/Shared/_Layout.cshtml in the editor and add these menu items inside the <ul> . . . </ul> block:
<li class="nav-item"><a class="nav-link text-dark" asp-area="" asp-page="/Audio2Text">Audio to Text</a></li><li class="nav-item"><a class="nav-link text-dark" asp-area="" asp-page="/Text2Audio">Text to Audio</a></li><li class="nav-item"><a class="nav-link text-dark" asp-area="" asp-page="/Translation">Translation</a></li>
Let’s try it out!
Start the application by executing the following command in the terminal window:
dotnet watch
Audio to Text Page
Text to Audio Page
Translation
Bonus - Streaming audio
Going back to the Text2Audio pages, bear in mind that the audio is being saved to the server's file system then linked to the <audio ..> element. We can instead stream the audio without the need of saving a file on the server. Let us see how that works. In the Text2Audio,cshtml.cs, add the following method:
public async Task<IActionResult> OnGetSpeakAsync(string text) {string? modelName = _configuration["OpenAI:Text2Audio:Model"];var audioClient = _openAIClient.GetAudioClient(modelName);BinaryData speech = await audioClient.GenerateSpeechAsync(text, GeneratedSpeechVoice.Alloy);MemoryStream memoryStream = new MemoryStream();speech.ToStream().CopyTo(memoryStream);memoryStream.Position = 0; // Reset the position to the beginning of the streamreturn File(memoryStream, "audio/wav");}
Add this code to Text2Audio,cshtml just before the closing </div> tag:
<button id="speakBtn" class="btn btn-warning">Speak</button><audio id="audioPlayer" type="audio/wav" ></audio><script>document.getElementById('speakBtn').addEventListener('click', function () {var text = encodeURIComponent(document.getElementById('inputText').value);fetch('/Text2Audio?handler=Speak&text=' + text).then(response => response.blob()).then(blob => {var url = URL.createObjectURL(blob);var audioPlayer = document.getElementById('audioPlayer');audioPlayer.src = url;audioPlayer.play();});});</script>
No comments:
Post a Comment