Overview
Enter a single word to find podcast episodes containing that specific term. This tool uses machine learning to transcribe, not analyze, transcripts from 4,634 recent podcast episodes in both German and English. The transcripts are generated using OpenAI's Whisper model "tiny" and SYSTRAN's Faster Whisper, then stored in a database accessible through our search function. Simply input a word, and the system will display episodes featuring your keyword. You can then listen to the episodes directly by pressing the "listen" button.
Technical Details
- Transcription Tools: SYSTRAN Faster Whisper, OpenAI Whisper
- Technology Stack: Python, SQL, Flask, Bootstrap, HTML, CSS, JavaScript
Why is it a Proof of Concept?
- Scope Limitation: While 4634 episodes may seem extensive, it's a small fraction compared to the potential number of podcast episodes that could be transcribed.
- Model Efficiency: The "tiny" model from OpenAI's Whisper is used for its speed, despite its lower accuracy. Transcribing the current dataset took three full days of continuous PC operation.
Potential Use Cases
This search tool can be extremely useful in a variety of scenarios, including:
- Fans of podcasts can search for their favorite topics, ensuring they never miss an episode relevant to their interests.
- Researchers can quickly locate specific topics or discussions within a large body of podcast content, facilitating qualitative research.
- Journalists can use this tool to track how frequently certain topics are discussed or to find quotes and opinions from specific episodes.
- Language learners can find podcasts where specific words are used, helping them understand the context and usage in natural conversations.
Next Steps
This project was developed as a weekend project by jimmydigital.de. It is primarily a demonstrative piece, but feedback is welcomed. If you have suggestions for additional features or have noticed any bugs, please contact me at [email protected].