How to Transcribe Audio Files using OpenAI's Whisper Model from macOS Finder
Effortlessly Transcribe Audio Files Right from macOS Finder Using OpenAI's Whisper Model
Introduction
Welcome to a practical guide on how to transcribe audio files right from your macOS Finder using OpenAI's Whisper model. Have you ever found yourself with a heap of audio files that you wished could magically transform into written text? Well, your wish is about to come true.
To make it even easier, we've created an associated YouTube video tutorial for you to follow along. Whether you're a seasoned developer or someone who's new to the Terminal, this guide has got you covered.
Why Should You Care?
Efficiency: Transcribing manually is time-consuming. Automation saves you a ton of time.
Versatility: The transcriptions are created in multiple formats including TXT, SRT, VTT, JSON, and TSV.
Accessibility: This setup is especially useful if you are into content creation, academic research, or any field requiring accessibility features.
What Will You Learn?
Setting up Your Environment: Get your system ready for the magic.
Scripting a Quick Action: A small, powerful script that does the heavy lifting.
Automator Setup: Integrating the script into macOS's Automator for a seamless experience.
Usage: How to actually use your new quick action to transcribe audio files.
Ready to dive in? Let's get started by setting up your environment. On to the next section!
Step 1: Setting Up the Environment
Before diving into creating the quick action, it's essential to set up your computer's environment correctly. You'll need a macOS operating system and access to the Terminal for this tutorial.
1.1 Install Homebrew
First, you'll need to install Homebrew, which is a package manager for macOS. It allows you to easily download and manage software directly from the command line.
Open Terminal: You can find it in Applications > Utilities > Terminal.
Run the following command: This will download and install Homebrew.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
For more details, you can visit the Homebrew homepage.
1.2 Install OpenAI Whisper
Next, you'll install OpenAI's Whisper, the audio-to-text model we're going to use for transcribing audio files.
In the Terminal, execute this command: This will install the Whisper package.
brew install openai-whisper
1.3 Update Zsh Configuration File
To ensure that the installed packages can be accessed in our quick action script, we need to add Homebrew to our system path explicitly. For this, you'll update your Zsh configuration file (~/.zshrc
).
Open the Zsh configuration file: Use a text editor like nano
for this task.
nano ~/.zshrc
Add Homebrew to the system path: Insert the following line somewhere in the .zshrc
file:
export PATH="/opt/homebrew/bin:$PATH"
Save and Exit: Press CTRL + X
to save and exit when using nano
.
Reload the Zsh configuration: To apply the changes without restarting your terminal, run:
source ~/.zshrc
You've now set up your macOS environment to use Homebrew and OpenAI's Whisper model. In the next section, we'll dive into creating the script that will power our quick action.
Step 2: Creating the Quick Action Script
In this section, we will go through the steps to create a quick action script that will enable us to transcribe audio files right from the macOS Finder using OpenAI's Whisper model.
Start by creating a file where we’ll write our script:
touch transcribe.sh
2.1 Load Zshell Configuration
When running scripts via Automator, the runtime configuration is not automatically loaded. This means that if you've installed Whisper via Homebrew, the Automator won't recognize it by default.
To solve this problem, we need to explicitly load the Zsh configuration file at the beginning of our script.
#!/bin/zsh
source ~/.zshrc
2.2 Create the Transcription Loop
The next step is to create a loop that iterates through each of the selected audio files in the Finder. This loop will then execute the transcription command on each file.
Here is how you can create a basic loop in a Zsh script:
#!/bin/zsh
source ~/.zshrc
for audio_file in "$@"; do
done
2.3 Configure the Whisper Command
Once the loop is in place, we need to insert the Whisper command for transcribing the audio. To make the transcription more accurate and faster, specify the language being spoken in the audio.
Here's how you can add the Whisper command into the loop:
#!/bin/zsh
source ~/.zshrc
for audio_file in "$@"; do
whisper $audio_file --language English
done
2.4 Set Output Directory
It's important to specify where the transcriptions will be saved. For ease of access, we'll save the transcriptions in the same directory as the source audio files.
Add this line to your script to set the output directory:
#!/bin/zsh
source ~/.zshrc
for audio_file in "$@"; do
whisper $audio_file \
--language English \
--output_dir "$(dirname "$audio_file")"
done
2.5 Make Script Executable
After saving your script, you'll need to make it executable so that it can be run as a quick action. Open the Terminal and navigate to the directory where you saved your script, then run the following command:
chmod +x transcribe.sh
Your quick action script is now ready to be integrated into Automator, which we'll cover in the next section.
Summary:
We loaded the Zsh configuration for proper execution.
We created a loop to handle multiple audio files.
We used the Whisper command within the loop for transcription.
We specified the output directory for the transcriptions.
Finally, we made the script executable.
By completing these steps, you've built the core logic for your macOS quick action. This will allow you to transcribe audio files effortlessly right from your Finder window.
Step 3: Creating the Quick Action in Automator
Creating a Quick Action in Automator allows you to run your script directly from the macOS Finder. This section will walk you through each step.
3.1 Open Automator App
Search for Automator: Open Spotlight by pressing
Cmd + Space
and type "Automator."Launch Automator: Click on the Automator app to open it.
3.2 Configure Quick Action
Create a New Workflow: After launching Automator, select New Document and then choose Quick Action.
Set Workflow Parameters:
Workflow receives current: Choose "audio files" from the dropdown. This ensures the quick action only appears for audio files.
Image and Color: You can customize the icon and color for your quick action here, but if you like it simple, black works just fine.
3.3 Add Shell Script
Find Shell Script Option: In the left-hand panel, search for "Shell Script" in the Actions search bar.
Drag and Drop: Drag the "Run Shell Script" action to the workflow area on the right.
Here's what you need to focus on:
Shell: Make sure the shell is set to
/bin/zsh
.Pass input: Choose "as arguments" from the dropdown. This ensures that the audio files you select in Finder will be passed as arguments to the script.
3.4 Set Input and Arguments
You'll need to input the script you created earlier into this shell script action. The idea is to simply invoke the script that you've already created.
Copy-Paste the Script: Paste the script you created earlier into the code area of the "Run Shell Script" action.
For example:
/path/to/transcribe.sh $@
"$@"
ensures all the selected audio files are passed as arguments.
3.5 Save and Name
Save Your Quick Action: Hit
Cmd + S
to bring up the save dialog.Name It: Give your quick action a meaningful name like "Transcribe with Whisper."
Confirm: Click Save.
And there you have it! Your Quick Action is ready to use. The next time you right-click an audio file in Finder, you'll see your new Quick Action as an option for easy and fast transcription.
Step 4: Using Your Quick Action
Now that you've set everything up, it's time to put your Quick Action to use. It's a straightforward process, and once you get the hang of it, transcribing audio files will be a breeze.
Locating the Quick Action
Navigate to Finder: Open a Finder window and locate the audio file you want to transcribe.
Right-click the Audio File: Right-click (or Control-click, if you're using a single-button mouse) on the audio file to bring up the context menu.
Find Your Quick Action: Scroll down to the 'Quick Actions' section of the context menu and find the Quick Action you created. It should be named whatever you chose during the Automator setup process.
Running the Quick Action
Select Your Quick Action: Once you've located your custom Quick Action, click on it.
Wait for the Transcription: You will see a progress indicator or some other form of feedback (depending on how you configured your script). Wait for it to complete.
Checking the Output
Same Directory: By default, the transcriptions will be saved in the same directory as the source audio file.
File Formats: The transcriptions will be available in multiple formats including TXT, SRT, VTT, JSON, and TSV.
Troubleshooting
Check Terminal Output: If the Quick Action doesn’t seem to be working, you can open Terminal and run your script manually to see any error messages.
Refresh Finder: Sometimes, you may need to refresh the Finder window to see the newly created transcription files.
Tips
Batch Transcription: This Quick Action works on multiple files. Just select all the audio files you want to transcribe, right-click, and run your Quick Action.
Change Output Directory: If you want to save the transcriptions in a different directory, you’ll need to modify the script you created in the Automator app.
That’s it! You now know how to transcribe audio files directly from Finder using your custom Quick Action. It’s a simple and efficient method that removes the need for any third-party apps or services for this particular task.
Conclusion
Congratulations, you've just streamlined your audio transcription workflow by creating a quick action in the macOS Finder using OpenAI's Whisper model. Let's recap the main points:
Efficiency: Transcribing audio files is now as simple as a right-click in the Finder.
Customization: You can tailor the settings, such as specifying the language, to meet your specific needs.
Multiple Formats: Your transcriptions will be available in various formats like text, SRT, VTT, JSON, and TSV. This offers flexibility for different use-cases.
Next Steps
Experiment: Feel free to modify the script to include additional features or settings you may find useful.
Share: If you find this useful, consider sharing it with colleagues or contributing to community forums.
Learn More: To dive deeper into the tools and technologies used, here are some useful resources:
By incorporating this quick action into your routine, you've made a complex task straightforward and easily accessible right from your Finder. Whether you're well-versed in command line operations or new to the Terminal, this guide offers a simplified approach to audio transcription.
That's it! If you liked this guide, don't forget to subscribe for more content.
Additional Resources
If you're looking to dive deeper into the topics covered in this blog post, below are some valuable resources that can help you expand your knowledge.
Package Managers
Homebrew: If you're new to Homebrew, the package manager for macOS, their official website offers excellent guides and documentation.
# To install Homebrew, run this command in your Terminal
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Transcription Models
OpenAI's Whisper: OpenAI has comprehensive documentation for their Whisper ASR (Automatic Speech Recognition) system. Visit their official documentation for more details.
Shell Configuration
Zsh Configuration: If you're new to Zsh or shell scripting, this introductory guide can be a good starting point.
# To edit your Zsh configuration file
nano ~/.zshrc
Automator and Quick Actions
Automator for macOS: For a beginner-friendly guide on Automator, Apple's official support page is a great resource.
Quick Actions: To understand more about quick actions in Finder, check out this tutorial.
Learning More About Terminal
Command Line Basics: For those who are new to the Terminal, this Beginner’s Guide to the Mac OS Terminal can be incredibly helpful.
Feel free to explore these resources to become more proficient in handling transcriptions, scripting, and automations on your macOS system.