If you are a knowledge worker, you probably learn a lot of stuff by watching educational videos.
And LLMs can help you get more out of the time you invest in watching these videos.
Here are some examples:
1 Format the autogenerated transcript
One of the best transcription services is TurboScribe, which offers unlimited transcription for a flat rate.
However, sometimes the transcription accuracy is not 100%. Sometimes you want to remove the ums and the ahs.
You can simply provide the transcript to the LLM and it will return a well formatted transcript without all the verbal tics, and also fix most of the mis-recognized words.
2 Identify the speaker names from the transcript
TurboScribe also provides speaker diarization, meaning it can identify the different voices and assign them labels as ‘Speaker 1’, ‘Speaker 2’ etc.
Since most of the speakers either introduce themselves, or are introduced by the hosts (for example in a podcast), it is also possible to send the transcript in a specific format and the LLM will usually identify the speaker name correctly.
3 Automatically generate chapters from the transcript
You can also send the transcript to the LLM and ask it to automatically generate chapters for the given transcript. LLMs have become surprisingly good at this task.
You can combine the chapter outline with the transcript and generate something I call an OutScript. This OutScript is easy to skim, so you can quickly preview as well as recall the contents of a video.
4 GPT search using Retrieval Augmented Generation
Of course, the most important use case for an LLM in this context is the ability to search across videos and provide a single answer which encompassed different sections/paragraphs in the videos.
This is commonly known as Retrieval Augmented Generation or RAG, where the relevant sections of the video transcripts are first identified based on some kind of similarity (to the question asked), and those chunks are merged and sent to the LLM, which then uses this as the basis for building the final response.
This can be especially helpful for videos, and you will find that it is usually far superior to simple keyword search.
You can implement this yourself by writing some code, but it will likely be quite error prone and not easy to maintain.
A good alternative is to convert all the transcripts to PDF format and use ChatPDF and do a folder search - it is equivalent to doing RAG over the set of video transcripts.
In my Text Analytics using LLMs course, I have a bonus chapter which explains how you can do each step and implement custom GPT search over your video transcripts.