What is VCTUBE?

VCTUBE is open-source Python library, that can automatically generate pair speech data from a given Youtube video URL.

Why We Need VCTUBE?

Recent studies have shown that Text-to-Speech (TTS) systems based on deep neural networks (e.g., Tacotron, Deep Voice, etc.) can generate human-like speech with high quality.
However, it has been reported that training such a deep learning model to generate human-like speech requires a large amount of speech data.
At least 10 hours of pair data to generate high quality speech. In practice, collecting and processing such a large amount of speech data is challenging.
For this reason, VCTUBE can solve this problem. There are many video in Youtube. And Many of these videos have subtitles.

An architecture of VCTUBE's overall processss.

How To Use VCTUBE?

Requirment for VCTUBE

Currently requires python >= 3.6

FFmpeg

At first you need to install VCTUBE library by pip install command

1
pip3 install vctube
cs

Command for VCTUBE

1
2
3
4
5
6
7
8
9
10
from vctube import VCtube
 
playlist_name = ""
playlist_url = ""
lang = ""   # ex) ko, en, fr, de ...
 
vc = VCtube(playlist_name, playlist_url, lang)
vc.download_audio()            #download audios from youtube
vc.download_captions()        #download captions from youtube
vc.audio_split()            #split audio with captions
cs

VCTUBE Example

Setting for VCTUBE

1
2
3
4
5
from vctube import VCtube
playlist_url = "https://www.youtube.com/watch?v=fj5BcN6Blks"
playlist_name="TEST"
lang = "en"   #ex) ko, en, fr, de...
vc = VCtube(playlist_name, playlist_url, lang)
cs

Result of this process

1
2
3
vc.download_audio()
vc.download_captions()
vc.audio_split()
cs

Audio file information

What is VCTUBE?

Why We Need VCTUBE?

How To Use VCTUBE?

Requirment for VCTUBE

VCTUBE Example

Paper URL

Our Lab Site

Code URL