cataluna84
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -8,21 +8,37 @@ pinned: false
|
|
8 |
---
|
9 |
|
10 |
Multilingual language models are typically large, requiring significant computational resources.
|
|
|
11 |
|
12 |
-
Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds?
|
|
|
13 |
|
14 |
# Techniques:
|
|
|
15 |
- Pruning
|
16 |
-
-
|
17 |
-
-
|
18 |
-
-
|
19 |
-
|
20 |
-
-
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
- Minitron: Compact Language models via Pruning & Knowledge Distillation
|
23 |
- DistiLLM: Towards Streamlined Distillation for Large Language Models
|
|
|
24 |
- Quantization
|
25 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
- Fine-Tuning | [GitHub](https://github.com/rsk2327/DistAya/tree/track/fine-tuning)
|
27 |
|
28 |
# Datasets:
|
|
|
8 |
---
|
9 |
|
10 |
Multilingual language models are typically large, requiring significant computational resources.
|
11 |
+
![Deployment Challenges](DeploymentChallenges.png)
|
12 |
|
13 |
+
Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds running in production with huge batch sizes?
|
14 |
+
![MemoryVariations through time](MemoryVariations(Latency).png)
|
15 |
|
16 |
# Techniques:
|
17 |
+
|
18 |
- Pruning
|
19 |
+
- Unstructured Pruning
|
20 |
+
- Structured Pruning
|
21 |
+
- Semi-Structured Pruning
|
22 |
+
|
23 |
+
- Methods Used
|
24 |
+
- SparseGPT | [GitHub](https://github.com/VishnuVardhanSaiLanka/sparsegpt/tree/aya)
|
25 |
+
- ShortGPT | [KLDBasedPruning & Perplexity Sensivities](https://github.com/rsk2327/DistAya/tree/main)
|
26 |
+
|
27 |
+
- Knowledge Distillation
|
28 |
+
- Hidden State-Based Distillation ~ [DistillKit](https://arcee-ai-distillkit.my.canva.site/) | [GitHub](https://github.com/ShayekhBinIslam/DistillKit)
|
29 |
+
- Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
|
30 |
+
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
|
31 |
- Minitron: Compact Language models via Pruning & Knowledge Distillation
|
32 |
- DistiLLM: Towards Streamlined Distillation for Large Language Models
|
33 |
+
|
34 |
- Quantization
|
35 |
+
- Quantization Aware Training (QAT)
|
36 |
+
- Post Training Quantization (PTQ)
|
37 |
+
- KV Cache Quantization
|
38 |
+
- Weight & Activation Quantization
|
39 |
+
|
40 |
+
- Low-Rank Factorization
|
41 |
+
|
42 |
- Fine-Tuning | [GitHub](https://github.com/rsk2327/DistAya/tree/track/fine-tuning)
|
43 |
|
44 |
# Datasets:
|