ylacombe
/

mms-mar-finetuned-monospeaker

Transformers.js

Inference Endpoints

Model card Files Files and versions Community

mms-mar-finetuned-monospeaker / README.md

ylacombe's picture

ylacombe HF staff

Update README.md

a8ee254 11 months ago

|

2.22 kB

	---
	library_name: transformers
	pipeline_tag: text-to-speech
	tags:
	- transformers.js
	- mms
	- vits
	license: cc-by-nc-4.0
	datasets:
	- ylacombe/google-marathi
	language:
	- mr
	---

	## Model

	This is a finetuned version of the [Marathi version](https://huggingface.co/facebook/mms-tts-mar) of Massively Multilingual Speech (MMS) models, which are light-weight, low-latency TTS models based on the [VITS architecture](https://huggingface.co/docs/transformers/model_doc/vits).

	It was trained in around 20 minutes with as little as 80 to 150 samples, on this [Marathi dataset](https://huggingface.co/datasets/ylacombe/google-marathi).

	Training recipe available in this [github repository: ylacombe/finetune-hf-vits](https://github.com/ylacombe/finetune-hf-vits).


	## Usage

	### Transformers

	```python
	from transformers import pipeline
	import scipy

	model_id = "ylacombe/mms-mar-finetuned-monospeaker"
	synthesiser = pipeline("text-to-speech", model_id) # add device=0 if you want to use a GPU

	speech = synthesiser("Hola, ¿cómo estás hoy?")

	scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"])
	```

	### Transformers.js

	If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@xenova/transformers) using:
	```bash
	npm i @xenova/transformers
	```

	Example: Generate Marathi speech with `ylacombe/mms-mar-finetuned-monospeaker`.
	```js
	import { pipeline } from '@xenova/transformers';

	// Create a text-to-speech pipeline
	const synthesizer = await pipeline('text-to-speech', 'ylacombe/mms-mar-finetuned-monospeaker', {
	quantized: false, // Remove this line to use the quantized version (default)
	});

	// Generate speech
	const output = await synthesizer('Hola, ¿cómo estás hoy?');
	console.log(output);
	// {
	// audio: Float32Array(69888) [ ... ],
	// sampling_rate: 16000
	// }
	```

	Optionally, save the audio to a wav file (Node.js):
	```js
	import wavefile from 'wavefile';
	import fs from 'fs';

	const wav = new wavefile.WaveFile();
	wav.fromScratch(1, output.sampling_rate, '32f', output.audio);
	fs.writeFileSync('out.wav', wav.toBuffer());
	```