1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Creating speech embeddings

Time to encode an audio array into a speaker embedding! Speaker embeddings contain information about how to personalize generated audio to a given speaker, and are essential for generating fine-tuned audio.

The pretrained spkrec-xvect-voxceleb model (speaker_model) and VCTK dataset (dataset) have been loaded for you.

Instructions

100 XP
  • Complete the create_speaker_embedding() function definition by calculating the raw embedding from the waveform using the speaker_model.
  • Extract the audio array from the data point at index 10 of the dataset.
  • Calculate a speaker embedding from the audio array using the create_speaker_embedding() function.