Install

npx clawhub@latest install auto-whisper-safe

Full SKILL.md

Open original

Metadata table.

name	version	description	tags
auto-whisper-safe	1.0.0	RAM-safe voice transcription with auto-chunking — works on 16GB machines without crashes	whispertranscriptionvoiceaudioram-safe

SKILL.md content below is scrollable.

Auto-Whisper Safe — RAM-Friendly Voice Transcription

Name: Auto Whisper Safe OpenClaw Plugin
Author: neal-collab

Transcribe voice messages and long audio files using OpenAI Whisper without crashing your machine. Designed for 16GB RAM systems running other processes (like OpenClaw agents).

The Problem

Whisper's turbo and large models use 6-10GB RAM. On a 16GB machine running OpenClaw + Ollama + other services, this causes OOM crashes. Existing Whisper skills don't handle this.

The Solution

Auto-detects audio length via ffprobe
Splits long audio (>10min) into 10-min chunks automatically
Uses base model by default (~1.5GB RAM — safe on any 16GB machine)
Merges transcripts seamlessly — no gaps, no duplicates
Cleans up temp files automatically

Usage

# Basic usage
./transcribe.sh /path/to/audio.ogg

# Custom model (if you have more RAM)
WHISPER_MODEL=small ./transcribe.sh /path/to/audio.ogg

# Custom language
WHISPER_LANG=en ./transcribe.sh /path/to/audio.ogg

# Custom output directory
./transcribe.sh /path/to/audio.ogg /path/to/output/

RAM Usage by Model

Model	RAM	Speed	Accuracy	Recommended For
`tiny`	~1GB	⚡⚡⚡	★★	Quick previews, low-RAM systems
`base`	~1.5GB	⚡⚡	★★★	Default — best balance ✅
`small`	~2.5GB	⚡	★★★★	When accuracy matters more
`medium`	~5GB	🐢	★★★★★	32GB+ RAM only
`turbo`	~6GB	🐢🐢	★★★★★	Dedicated transcription machines

OpenClaw Integration

Add to your agent's BOOTSTRAP.md:

## Voice Message Handling

When you receive `<media:audio>`, ALWAYS transcribe first:

1. Run: `./skills/auto-whisper-safe/transcribe.sh <audio-path>`
2. Read the output transcript file
3. Respond based on the transcribed content

Do this automatically — voice messages are meant to be transcribed.

Environment Variables

Variable	Default	Description
`WHISPER_MODEL`	`base`	Whisper model size
`WHISPER_LANG`	`en`	Audio language (ISO code)

How Chunking Works

Audio ≤10min → transcribed directly (no splitting)
Audio >10min → split into 10-min segments via ffmpeg
Each segment transcribed independently
Transcripts concatenated in order
Temp files cleaned up on exit (even on errors)

Installation

# macOS
brew install openai-whisper ffmpeg

# Ubuntu/Debian
pip install openai-whisper
apt install ffmpeg

# Verify
whisper --help && ffmpeg -version

Why This Over Other Whisper Skills

✅ RAM-safe: Won't crash your 16GB machine
✅ Auto-chunking: Handles 1-hour podcasts without issues
✅ Cleanup: No temp files left behind
✅ Progress: Shows chunk-by-chunk progress
✅ Configurable: Model + language via env vars
✅ OpenClaw-native: Drop-in for any agent's BOOTSTRAP.md

Real-World Performance

Tested on Ubuntu 22.04, 16GB RAM, running OpenClaw (10 agents) + Ollama simultaneously:

Audio Length	Model	RAM Peak	Time	Result
2 min voice memo	base	1.4GB	~15s	✅ Perfect
12 min podcast clip	base	1.5GB (chunked)	~90s	✅ 2 chunks, seamless
45 min interview	base	1.5GB (chunked)	~6min	✅ 5 chunks, seamless
2 min voice memo	tiny	0.9GB	~8s	✅ Good enough for quick reads

Supported Audio Formats

ffmpeg handles the conversion, so virtually any format works:

✅ .ogg (Telegram voice messages)
✅ .mp3, .m4a, .wav, .flac
✅ .webm (browser recordings)
✅ .opus (WhatsApp voice messages)

Changelog

v1.0.0

Initial release
Auto-chunking for long audio (>10min)
RAM-safe defaults (base model, 1.5GB)
Progress tracking per chunk
Automatic temp file cleanup
Configurable model and language

Original Repository URL: https://github.com/openclaw/skills/blob/main/skills/neal-collab/auto-whisper-safe

Latest commit: https://github.com/openclaw/skills/commit/d8c21e42e07ff4d1163b87fbd348f89913de6464

Auto Whisper Safe OpenClaw Plugin & Skill | ClawHub

What this skill does

Install

Full SKILL.md

Auto-Whisper Safe — RAM-Friendly Voice Transcription

The Problem

The Solution

Usage

RAM Usage by Model

OpenClaw Integration

Environment Variables

How Chunking Works

Installation

Why This Over Other Whisper Skills

Real-World Performance

Supported Audio Formats

Changelog

v1.0.0

Related skills

addis-assistant-stt

agent-voice

akaunting

alexa-cli

announcer

assemblyai-transcribe