davidrrobinson commited on
Commit
99d8433
1 Parent(s): f048007

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - davidrrobinson/AnimalSpeak
4
+ ---
5
+
6
+ # Model card for BioLingual
7
+
8
+ Model card for BioLingual: Transferable Models for bioacoustics with Human Language Supervision
9
+
10
+ An audio-text model for bioacoustics based on contrastive language-audio pretraining.
11
+
12
+ # Usage
13
+
14
+ You can use this model for bioacoustic zero shot audio classification, or for fine-tuning on bioacoustic tasks.
15
+
16
+ # Uses
17
+
18
+ ## Perform zero-shot audio classification
19
+
20
+ ### Using `pipeline`
21
+
22
+ ```python
23
+ from datasets import load_dataset
24
+ from transformers import pipeline
25
+
26
+ dataset = load_dataset("ashraq/esc50")
27
+ audio = dataset["train"]["audio"][-1]["array"]
28
+
29
+ audio_classifier = pipeline(task="zero-shot-audio-classification", model="davidrrobinson/BioLingual")
30
+ output = audio_classifier(audio, candidate_labels=["Sound of a sperm whale", "Sound of a sea lion"])
31
+ print(output)
32
+ >>> [{"score": 0.999, "label": "Sound of a dog"}, {"score": 0.001, "label": "Sound of vaccum cleaner"}]
33
+ ```
34
+
35
+ ## Run the model:
36
+
37
+ You can also get the audio and text embeddings using `ClapModel`
38
+
39
+ ### Run the model on CPU:
40
+
41
+ ```python
42
+ from datasets import load_dataset
43
+ from transformers import ClapModel, ClapProcessor
44
+
45
+ librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
46
+ audio_sample = librispeech_dummy[0]
47
+
48
+ model = ClapModel.from_pretrained("laion/clap-htsat-unfused")
49
+ processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
50
+
51
+ inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt")
52
+ audio_embed = model.get_audio_features(**inputs)
53
+ ```
54
+
55
+ ### Run the model on GPU:
56
+
57
+ ```python
58
+ from datasets import load_dataset
59
+ from transformers import ClapModel, ClapProcessor
60
+
61
+ librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
62
+ audio_sample = librispeech_dummy[0]
63
+
64
+ model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(0)
65
+ processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
66
+
67
+ inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt").to(0)
68
+ audio_embed = model.get_audio_features(**inputs)