Necessary pre-processing: Separate voice and accompaniment with UVR (skip if no accompaniment) Cut audio input to shorter length with slicer, whisper takes input less than 30 seconds. Manually check ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results