PyTorch
English
llama
sound language model
jan-hq commited on
Commit
3a4e909
1 Parent(s): aa894fe

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - homebrewltd/instruction-speech-whispervq-v2
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ tags:
8
+ - sound language model
9
+ ---
10
+
11
+ ## Model Details
12
+
13
+ We have developed and released the family [llama3s](https://huggingface.co/collections/homebrew-research/llama3-s-669df2139f0576abc6eb7405). This family is natively understanding audio and text input.
14
+
15
+ We continual pretrain on the expanded vocabulary [homebrewltd/llama3.1-s-whispervq-init](https://huggingface.co/homebrewltd/llama3.1-s-whispervq-init) with 900M tokens from [homebrewltd/raw-speech-whispervq-v1](https://huggingface.co/datasets/homebrewltd/raw-speech-whispervq-v1) dataset.
16
+
17
+ **Model developers** Homebrew Research.
18
+
19
+ **Input** Text and sound.
20
+
21
+ **Output** Text.
22
+
23
+ **Model Architecture** Llama-3.
24
+
25
+ **Language(s):** English.
26
+
27
+ ## Intended Use
28
+
29
+ **Intended Use Cases** This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.
30
+
31
+ **Out-of-scope** The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.
32
+
33
+ ## Training process
34
+ **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
35
+
36
+ ![train_log](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/iAbaP7SCoyZ8tz2hyK8k0.png)
37
+
38
+ ### Hardware
39
+
40
+ **GPU Configuration**: Cluster of 10x NVIDIA A6000-48GB.
41
+
42
+ **GPU Usage**:
43
+ - **Continual Training**: 30 hours.
44
+
45
+ ### Training Arguments
46
+
47
+ We utilize [torchtune](https://github.com/pytorch/torchtune) library for the latest FSDP2 training code implementation.
48
+
49
+ | Parameter | Continual Training |
50
+ |----------------------------|-------------------------|
51
+ | **Epoch** | 1 |
52
+ | **Global batch size** | 480 |
53
+ | **Learning Rate** | 2e-4 |
54
+ | **Learning Scheduler** | Cosine with warmup |
55
+ | **Optimizer** | AdamW fused |
56
+ | **Warmup Steps** | 50 |
57
+ | **Weight Decay** | 0.01 |
58
+ | **Max Sequence Length** | 512 |
59
+ | **Max Training Steps** | 2000 |
60
+
61
+ ## Citation Information
62
+
63
+ **BibTeX:**
64
+
65
+ ```
66
+ @article{Llama3-S: Sound Instruction Language Model 2024,
67
+ title={Llama3-S},
68
+ author={Homebrew Research},
69
+ year=2024,
70
+ month=August},
71
+ url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}
72
+ ```
73
+
74
+ ## Acknowledgement
75
+
76
+ - **[WhisperSpeech](https://github.com/collabora/WhisperSpeech)**
77
+
78
+ - **[Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)**