Skip to content

SingleApi

Internet, programming, artificial intelligence

Menu
  • Home
  • About
  • My Account
  • Registration
Menu

Offline Whisper Audio Transcription and Ollama-Voice Assistant

Posted on September 23, 2024

The WhisperLive project is a real-time transcription application that utilizes the OpenAI Whisper model to convert speech input into text output. This technology can be employed for transcribing both live audio input from a microphone as well as pre-recorded audio files.

To set up the server-side of this integration on a Mac, you’ll need to use:

from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)

This code snippet initializes a TranscriptionServer instance and begins running it on port 9090. For the client-side, you can utilize:

from whisper_live.client import TranscriptionClient
client = TranscriptionClient("localhost", 9090, model="base.en")
client()

Before you start with this integration, ensure that the necessary tools are installed on your Mac. Specifically, you’ll need to install PortAudio and Whisper-live via:

brew install portaudio whisper-live

This setup doesn’t require any online connectivity as it’s designed for offline use cases, making it an ideal solution for projects where real-time transcription capabilities are needed without the requirement of internet access.

Using Ollama-Voice for Whisper Audio Transcription

The Ollama-voice is a simple yet effective combination of three tools, designed to work seamlessly together in offline mode. This setup is ideal for applications where internet connectivity is limited or not available at all. The system consists of:

Speech Recognition: Whisper running local models in offline mode. This involves utilizing the Whisper model’s ability to transcribe speech-to-text locally on a device, without requiring any online connections.

whisper run --model your_model_name --lang en This line of code is used to start the Whisper model in offline mode, where “your_model_name” should be replaced with the actual name of the model being used.

Large Language Model: Ollama running local models in offline mode. This part utilizes the capabilities of the Ollama model for generating human-like responses based on the input provided.

ollama run --model your_model_name Similar to the Whisper model, this line starts the Ollama model in offline mode. Again, replace “your_model_name” with the actual name of the model being used.

Offline Text To Speech: Pyttsx3 is used for converting text into speech without needing any internet connection. This feature ensures that the system can produce audible responses even when it’s not connected to the internet.

pyttsx3.init(driverName='sapi5').say('Hello, World!') This code initializes the pyttsx3 driver and produces a simple “Hello, World!” message as an audio output. In a more complex setup, this tool would be used to convert the text generated by the Ollama model into spoken language.

By combining these three tools in offline mode, you can create a comprehensive system for speech recognition, large language processing, and text-to-speech conversion. This setup is particularly useful for applications that require local operation without internet access.

Leave a Reply Cancel reply

You must be logged in to post a comment.

Recent Posts

  • News for 2025-07-04
  • Gemini CLI
  • n8n
  • AI News
  • Vibe coding

Recent Comments

  • adrian on Kokoro TTS Model, LLM Apps Curated List
  • adrian on Repo Prompt and Ollama
  • adrian on A Content Creation Assistant

Archives

  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • November 2023
  • May 2022
  • March 2022
  • January 2022
  • August 2021
  • November 2020
  • September 2020
  • April 2020
  • February 2020
  • January 2020
  • November 2019
  • May 2019
  • February 2019

Categories

  • AI
  • Apple Intelligence
  • Claude
  • Cursor
  • DeepSeek
  • Gemini
  • Google
  • Graphics
  • IntelliJ
  • Java
  • LLM
  • Made in Poland
  • MCP
  • Meta
  • Open Source
  • OpenAI
  • Programming
  • Python
  • Repo Prompt
  • Technology
  • Uncategorized
  • Vibe coding
  • Work

agents ai apple apps automation blender cheatsheet china claude codegen comfyui deepseek docker draw things flux gemini google hidream hobby hugging face huggingface java langchain langchain4j llama llm mcp meta mlx movies n8n news nvidia ollama openai personal thoughts quarkus release repo prompt spring stable diffusion tts vibe coding whisper work

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Terms & Policies

  • Privacy Policy
©2025 SingleApi | Design: Newspaperly WordPress Theme
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT