Who said What? Coping with hundreds of hours of audio evidence

Recently, I have had multiple cases where a substantial portion of the evidence was from body-worn video or recorded phone calls. In one case, a dozen police officers spent several hours executing a search warrant and questioning the home's residents. In another case, the investigating officer interviewed twenty witnesses about different aspects of the case. The totality of audio evidence can make it a challenge to analyze efficiently, let alone keep all the facts straight.

In cases like these, artificial intelligence transcription services can be a game-changer. At Lucid Truth Technologies, we harness technology to help uncover the truth. This article is a high-level overview of how AI transcription works and some associated considerations.

Some companies offer AI transcription models as a service, typically charging $0.60 per recorded hour. Subscribers to these services can upload supported audio or video files to the service using an application programming interface (API) and receive the transcribed data back. APIs are how one program can communicate with another program in a standardized manner according to documented instructions. We have created a program to invoke the API and process hundreds of hours of audio overnight in a batch, keeping track of all the source file details and formatting it meaningfully for investigators and lawyers.

Speaker Diarization is the technical term for detecting different speakers and what each speaker said. This is particularly important in the case of witness interviews. Always remember that this technology is imperfect and may conflate speakers if everyone starts talking at once. Therefore, submitting a transcript generated by an AI model as evidence is inadvisable. Instead, use the transcript to identify portions of the recorded audio of the greatest evidentiary interest.

As two or more speakers take turns in a conversation, the statements made during each turn are called an "utterance." The transcriptions created by our software include the start and end timestamp of each utterance to make it easy to review that portion of the audio or video recording in context. A future version will include a hyperlink to that portion of the recording.

Of course, once you have the transcript, it can be searched, indexed, color-coded, and summarized using other text processing techniques like the rest of the case data. If AI transcription is something that may assist your case, reach out to us, and we will be glad to help.