AI assisted transcription and subtitle creation

All hardware and software related discussion topics here. Advice, discussion and opinions on either topic are welcome.
User avatar
bonmrmjp
Posts: 36
Likes:
Joined: Mon Jun 27, 2022 4:45 am

Re: AI assisted transcription and subtitle creation

Post by bonmrmjp »   0 likes

Its pretty much working. I put together an example of it in action. Here's a zip with the audio from "Tjejernas Omklädningsrum" along with the dummy subtitle file. You can run the commands in with Google Colaboratory

https://mega.nz/file/iVwWCLiI#vYduuuMIs ... lGBGpY2sBE
User avatar
ghost
Site Admin
Posts: 8602
Likes:
Joined: Sun Mar 07, 2004 1:00 am

Re: AI assisted transcription and subtitle creation

Post by ghost »   1 likes

bonmrmjp wrote: Mon May 01, 2023 12:19 pm Its pretty much working. I put together an example of it in action. Here's a zip with the audio from "Tjejernas Omklädningsrum" along with the dummy subtitle file. You can run the commands in with Google Colaboratory

https://mega.nz/file/iVwWCLiI#vYduuuMIs ... lGBGpY2sBE
I'll try it. Thanks a lot.

Just tell me: How did you create this dummy srt-file with SubtitleEdit? I can't find an option.
User avatar
bonmrmjp
Posts: 36
Likes:
Joined: Mon Jun 27, 2022 4:45 am

Re: AI assisted transcription and subtitle creation

Post by bonmrmjp »   0 likes

I did it manually.. Usually by dragging the mouse over the waveform to cover when someone was talking.
User avatar
ghost
Site Admin
Posts: 8602
Likes:
Joined: Sun Mar 07, 2004 1:00 am

Re: AI assisted transcription and subtitle creation

Post by ghost »   0 likes

bonmrmjp wrote: Mon May 01, 2023 2:22 pm I did it manually.. Usually by dragging the mouse over the waveform to cover when someone was talking.

That'll be a lot of work for 90 minutes movie. :o

I don't know if they changed someting, but I did a new transcription today using --word_timestamps True and nearly all timings were accurate. They only started some seconds to early (and were too long) when there was a longer break before.
User avatar
bonmrmjp
Posts: 36
Likes:
Joined: Mon Jun 27, 2022 4:45 am

Re: AI assisted transcription and subtitle creation

Post by bonmrmjp »   0 likes

you can do very long lines.. So if the timing is ok, you can do a single entry for an entire scene... Or start with everything selected, and just block out the sections without dialog. Though when there's a lot of noise, I've had good luck keeping clips to just the dialog.

I'll check out the word timestamps... maybe we'll just need this approach for special cases.
Like maybe run through the movie the first time, and see what it can do on itself, then use this to rescan the more challenging areas.
User avatar
pillowbaker
Posts: 2169
Likes:
Joined: Mon Mar 07, 2022 4:05 am

Re: AI assisted transcription and subtitle creation

Post by pillowbaker »   1 likes

So something funny happened to me while I was tinkering with the subtitles for Hilito de sangre. You can laugh at me. :D

Perhaps it was my imagination, but I thought I noticed some lines in the film that were not represented in the subtitles, and I am not talking about the lines in Chinese. So I decided to run the movie through whisper. I had recently upgraded to the new version of subtitleEdit, and I wanted to try it's whisper function again. I had David's upscale, but I was trying to focus on deadman's ISO file.

So I loaded it into SE and pushed it into playing the film part of the ISO (even though it's just vlc, I thought that would work) and then launched the whisper transcription. Then I went to bed, as I normally do when I run whisper through SE.

Can you guess what happened? - :icon_1idiot - :icon_1idiot

You know how DVDs often have the looping animated menu, right? Yeah, the whisper audio extraction never made it past the 40 second animated menu, which looped over and over during audio extraction, its first step. So it looped and extracted, over and over and over, while I slept the night away.

I checked it the next morning before having to leave and was bamboozled to find whisper still trying to extract, 0 space on the C drive, and a few resulting errors popping up. At first I didn't know these correlated, and wondered why the heck I would have no space left on the main drive. I quit all programs, including emule and qbittorrent, and it would have to wait till I got home in the evening.

I eventually found the culprit was the massive wav file that SE was trying to extract for whisper, which had reached a whopping 180gig. Of a 40 second loop.

Due to having zero drive space, my torrent client's "fastresume" files were mostly broken. I assume this is because the client announces to the tracker for every active torrent. This caused me to have over 600 torrents missing in my client when I relaunched it. It took me the evening to figure out how to reload them en masse (I don't believe I found them all, but good enough).

To top it off, my emule's "known.met" file had been corrupted and needed to be recreated. So I lost all most upload data for everything I'd been sourcing, and had to rehash 600 files. Phew!

I've since learned how to back these things up.

Question to those familiar with emule's internal workings. I do have a backup "known.met" file back from July. Should I replace it with the new one it created? Does it make much difference?
User avatar
Night457
Global Moderator
Posts: 5399
Likes:
Joined: Sat Dec 28, 2019 3:44 pm

Re: AI assisted transcription and subtitle creation

Post by Night457 »   0 likes

pillowbaker wrote: Mon Oct 16, 2023 4:45 am You can laugh at me. :D
I groaned! So much work to fix it. This is another reason I always convert an ISO to MKV: no menu to deal with.

What was the menu SAYING?
Question to those familiar with emule's internal workings. I do have a backup "known.met" file back from July. Should I replace it with the new one it created?
Backup the eMule/config folder regularly (it is small). Use an external drive to save it in case you do terrible things to your computer. :D When eMule gets messed up and there is nothing "known" anymore, close it first BEFORE you bother to fully rehash the files. Then copy the backup known.met into your active config folder to replace the newer tiny file. Restart eMule. This should give you hashes for everything up to the backup point. So instead of rehashing 600 files, it will only have to do the changes since then and save you lots of time.

Of course, do NOT do this if your files are already hashed and there is nothing to restore! You will just be going back to old information if you do.

So yes, you should backup the new "known.met" file. You have likely lost statistics such as how much of each file you have shared.

Naturally, I forget to keep my backups up-to-date. I have spent hours rehashing hundreds of files because the backup is so old.

Some day, I MIGHT learn what all the files in the config folder are FOR, so that I know what backups to use for what problems.

You might have lost a record of what files you were downloading. If there are still partial files in the Temp folder AND you happen to know what they were so you can reload the ed2k links from FLM, then MetFile Regenerator might help restore them and continue downloading from where you left off. If you were downloading so much that you can not remember them all, then that program might not help.
User avatar
ghost
Site Admin
Posts: 8602
Likes:
Joined: Sun Mar 07, 2004 1:00 am

Re: AI assisted transcription and subtitle creation

Post by ghost »   1 likes

Yes, I really recommend backing up your config folder regularly. So you can restore it, when a file gets corrupt. It helped me many times and my backup software does it automaticly every night.
Backup the eMule/config folder regularly (it is small).
??? Mine has a size of 1.91 GB. It would probably take a whole day to re-hash all the files I'm sharing.
User avatar
Night457
Global Moderator
Posts: 5399
Likes:
Joined: Sat Dec 28, 2019 3:44 pm

Re: AI assisted transcription and subtitle creation

Post by Night457 »   0 likes

pillowbaker wrote: Mon Oct 16, 2023 4:45 amYou can laugh at me. :D
Now it is YOUR turn to laugh. Guess what happened to me this morning? Guess how long it has been since my last manual backup?

I had to port over the old backups of known.met and sharedfiles.dat, so now I only have to rehash about HALF my current files. Of course this thread reminded me that I was overdue for a backup, but I figured "Oh, I'm busy. I'll do it tomorrow." CLEARLY ghost's method of automated daily backups is the best solution. Rehashing only one day of files is not so bad, but a couple months of them is tedious.
User avatar
Triela
Posts: 446
Likes:
Joined: Sun Jul 05, 2020 3:42 pm

Re: AI assisted transcription and subtitle creation

Post by Triela »   0 likes

Triela wrote: Mon Apr 10, 2023 1:13 am
David32441 wrote: Thu Apr 06, 2023 11:13 am
lsg1310 wrote: Thu Apr 06, 2023 6:09 am Sorry if this is a dumb question, but do any of these sites work to translate an SRT file you already have?
Not sure - but there's always been google translate you can copy and paste the text into. But it limits you to 1000 or 2000 chars so if you have a big SRT file you often need to copy in 1-80, 81-160, 161- ...
I used to have a whole Word Macro for this, designed to cut the SRT into parts of 5000 characters, so I could cut and paste that into the browser. But... these days, it does do that itself. if you paste more than 5000 characters in it, it just chops that off. It takes some wiggling and wriggling but it's a a fairly fast and easy process.
By the way, you can overcome the 5000 character limit in translate.google.com if you upload the .SRT as a text file.
The process is this:
1st, change the extension from .SRT to .TXT and convert the .TXT file to a .DOCX file (use the site convertio.co for that - not .com, but .co)
2nd, upload the .DOCX to translate google
3rd, download the translated .DOCX ,
4th, convert the .DOCX with the site convertio.co to a txt file and change the extension back to .SRT from .TXT
Done.
Post Reply