The Whisper model quickly learns which bit of the pre-trained tokenizer to use when fine-tuning. If we use the pre-trained one, we can use all of the weights (and so all of the knowledge!). ![]() Why? Because then we can also leverage all of the pre-trained Whisper weights directly! If we build a new tokenizer, we have to randomly initialise some of the Whisper weights to work with our new tokenizer, meaning we lose some of the knowledge from pre-training. This means that the pre-trained tokenizer already has a vast vocabulary encompassing many thousands of words! I would recommend that you leverage this pre-trained tokenizer directly rather than training a new one. The Whisper model is pre-trained on 96 languages. ![]() Hey advice would be to follow this blog-post for fine-tuning the model:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |