A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
Jun 3, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Synchronized Translation for Videos. Video dubbing
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
how to compress large knowledge base (.mp4, .mp3, .wav) and transfer it into readable, short, summarized form for effective knowledge transfer
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
faster_whisper GUI with PySide6
Comprehensive qualitative data analysis software for UX research. User interview tagging, AI-supported analysis, team management, etc.
A PyTorch-based Speech Toolkit
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Production First and Production Ready End-to-End Speech Recognition Toolkit
Official Python SDK for Deepgram's automated speech recognition APIs.
The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
An audio labeling tool for transcription, diarization, VAD, and more.
Speech-to-text server framework with next-gen Kaldi
The AssemblyAI Ruby SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.
The AssemblyAI Java SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.
Add a description, image, and links to the asr topic page so that developers can more easily learn about it.
To associate your repository with the asr topic, visit your repo's landing page and select "manage topics."