VC-with-GAN

CS 753 ASR project

Usage Steps:

Run bash download.sh to prepare the VCC2018 dataset.
Run analyzer.py to extract features and write features into binary files. (This takes a few minutes.)
Run build.py to record some stats, such as spectral extrema and pitch.
To train a VAWGAN, for example, run

python main.py \
--model VAWGAN \
--trainer VAWGANTrainer \
--architecture architecture-vawgan-vcc2016.json

You can find your models in ./logdir/train/[timestamp]
To convert the voice, run

python convert.py \
--src VCC2SF1 \
--trg VCC2TM1 \
--model VAWGAN \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--file_pattern "./dataset/vcc2018/bin/Training Set/{}/[0-9]*.bin"

*Please fill in timestamp and model id.
7. You can find the converted wav files in ./logdir/output/[timestamp]
8. If you want to convert all the voices, run

./convert_all.sh \
--model VAWGAN \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--output_dir [directory to store converted audio]

Usage for Sentence Embeddings:

Ensure you have w_prob_dict.pkl and w_vec_dict.pkl in data directory.
1. For w_prob_dict.pkl you have two options. Either use get_word_prob_from_corpus this demands a corpus as an input. We used WikiText. Or you can get a csv file with unigram probabilities (we mentioned the source in the report http://norvig.com/ngrams/) and use the function get_w_prob_from_csv.
2. For w_vec_dict.pkl initialize a Sentence_Embedding object and then call the function prune_word_vec. This essentially keeps only those embeddings which are present in the transcriptions since it takes a lot more time (and ram) to get the parse the whole fasttext data.
3. All pickle files are shared here https://drive.google.com/drive/folders/1FWGGEQ9wTUewBDFq5ssT4BP4cMyt8lh1
Download the dataset using bash download.sh
Run python sentence_embedding.py. This should create sent_emb.pkl inside data directory.
Run analyzer.py to extract features, store them along with sentence embeddings.
Run build.py to find statistics about features.
To train with sentence embedding, run

python main.py \
--model VAWGAN_S \
--trainer VAWGAN_S \
--architecture architecture-vawgan-sent.json

For conversion, run

python convert.py \
--src VCC2SF1 \
--trg VCC2TM1 \
--model VAWGAN_S \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--file_pattern "./dataset/vcc2018/bin/Training Set/{}/[0-9]*.bin"

or

./convert_all.sh \
--model VAWGAN_S \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--output_dir [directory to store converted audio]

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Reference		Reference
Report		Report
Work_progress		Work_progress
data		data
etc		etc
images		images
model		model
trainer		trainer
util		util
DeepSpeech.py		DeepSpeech.py
README.md		README.md
analyzer.py		analyzer.py
architecture-vae-vcc2016.json		architecture-vae-vcc2016.json
architecture-vawgan-ivector.json		architecture-vawgan-ivector.json
architecture-vawgan-sent.json		architecture-vawgan-sent.json
architecture-vawgan-vcc2016.json		architecture-vawgan-vcc2016.json
build.py		build.py
convert.py		convert.py
convert_all.sh		convert_all.sh
download.sh		download.sh
environment.yml		environment.yml
i_vec_dict.pkl		i_vec_dict.pkl
i_vec_parse.py		i_vec_parse.py
main.py		main.py
phone_embedding.py		phone_embedding.py
sentence_embedding.py		sentence_embedding.py

TheShadow29/VC-with-GAN

Folders and files

Latest commit

History

Repository files navigation

VC-with-GAN

Usage Steps:

Usage for Sentence Embeddings:

About

Topics

Resources

Stars

Watchers

Forks

Languages