aashish1904 commited on
Commit
3c990fd
1 Parent(s): 2049c9b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+
5
+ license: llama3.1
6
+ base_model: Llama-3.1-8B-Instruct
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+
10
+
11
+ ---
12
+
13
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
14
+
15
+
16
+ # QuantFactory/Theia-Llama-3.1-8B-v1-GGUF
17
+ This is quantized version of [Chainbase-Labs/Theia-Llama-3.1-8B-v1](https://huggingface.co/Chainbase-Labs/Theia-Llama-3.1-8B-v1) created using llama.cpp
18
+
19
+ # Original Model Card
20
+
21
+
22
+ # Theia-Llama-3.1-8B-v1
23
+
24
+ **Theia-Llama-3.1-8B-v1 is an open-source crypto LLM, trained with carefully-designed dataset from the crypto field.**
25
+
26
+ ## Technical Implementation
27
+
28
+ ### Crypto-Oriented Dataset
29
+
30
+ The training dataset is curated from two primary sources to create a comprehensive representation of blockchain
31
+ projects. The first source is data collected from **CoinMarketCap**, focusing on the top **2000 projects** ranked by
32
+ market capitalization. This includes a wide range of project-specific documents such as whitepapers, official blog
33
+ posts, and news articles. The second core component of the dataset comprises detailed research reports on these projects
34
+ gathered from various credible sources on the internet, providing in-depth insights into project fundamentals,
35
+ development progress, and market impact. After constructing the dataset, both manual and algorithmic filtering are
36
+ applied to ensure data accuracy and eliminate redundancy.
37
+
38
+ ### Model Fine-tuning and Quantization
39
+
40
+ The Theia-Llama-3.1-8B-v1 is fine-tuned from the base model (Llama-3.1-8B), specifically tailored for the cryptocurrency
41
+ domain. We employed LoRA (Low-Rank Adaptation) to fine-tune the model effectively, leveraging its ability to adapt large
42
+ pre-trained models to specific tasks with a smaller computational footprint. Our training methodology is further
43
+ enhanced through the use of LLaMA Factory, an open-source training framework. We integrate **DeepSpeed**, Microsoft's
44
+ distributed training engine, to optimize resource utilization and training efficiency. Techniques such as ZeRO (Zero
45
+ Redundancy Optimizer), offload, sparse attention, 1-bit Adam, and pipeline parallelism are employed to accelerate the
46
+ training process and reduce memory consumption. A fine-tuned model is also built using the
47
+ novel [D-DoRA](https://docs.chainbase.com/theia/Developers/Glossary/D2ORA), a decentralized training scheme, by our
48
+ Chainbase Labs. Since the LoRA version is much easier to deploy and play with for developers, we release the LoRA
49
+ version first for the Crypto AI community.
50
+
51
+ In addition to fine-tuning, we have quantized the model to optimize it for efficient deployment, specifically into the
52
+ Q8 GGUF format `Theia-Llama-3.1-8B-v1-Q8_0.gguf`. Model quantization is a process that reduces the precision of the
53
+ model's weights from floating-point (typically FP16 or FP32) to lower-bit representations, in this case, 8-bit
54
+ integers (Q8). The primary benefit of quantization is that it significantly reduces the model's memory footprint and
55
+ improves inference speed while maintaining an acceptable level of accuracy. This makes the model more accessible for use
56
+ in resource-constrained environments, such as on edge devices or lower-tier GPUs.
57
+
58
+ ## Benchmark
59
+
60
+ To evaluate the current LLMs in the crypto domain, we have proposed a benchmark for evaluating Crypto AI Models, which
61
+ is the first AI model benchmark tailored specifically for the crypto domain. The models are evaluated across seven
62
+ dimensions, including crypto knowledge comprehension and generation, knowledge coverage, and reasoning capabilities,
63
+ etc. A detailed paper will follow to elaborate on this benchmark. Here we initially release the results of benchmarking
64
+ the understanding and generation capabilities in the crypto domain on 11 open-source and close-source LLMs from OpenAI,
65
+ Google, Meta, Qwen, and DeepSeek. For the open-source LLMs, we choose the models with the similar parameter size as
66
+ ours (~8b). For the close-source LLMs, we choose the popular models with most end-users.
67
+
68
+ | Model | Perplexity ↓ | BERT ↑ |
69
+ |---------------------------|--------------|-----------|
70
+ | **Theia-Llama-3.1-8B-v1** | **1.184** | **0.861** |
71
+ | ChatGPT-4o | 1.256 | 0.837 |
72
+ | ChatGPT-4o-mini | 1.257 | 0.794 |
73
+ | ChatGPT-3.5-turbo | 1.233 | 0.838 |
74
+ | Claude-3-sonnet (~70b) | N.A. | 0.848 |
75
+ | Gemini-1.5-Pro | N.A. | 0.830 |
76
+ | Gemini-1.5-Flash | N.A. | 0.828 |
77
+ | Llama-3.1-8B-Instruct | 1.270 | 0.835 |
78
+ | Mistral-7B-Instruct-v0.3 | 1.258 | 0.844 |
79
+ | Qwen2.5-7B-Instruct | 1.392 | 0.832 |
80
+ | Gemma-2-9b | 1.248 | 0.832 |
81
+ | Deepseek-llm-7b-chat | 1.348 | 0.846 |
82
+
83
+ ## System Prompt
84
+
85
+ The system prompt used for training this model is:
86
+
87
+ ```
88
+ You are a helpful assistant who will answer crypto related questions.
89
+ ```
90
+
91
+ ## Chat Format
92
+
93
+ As mentioned above, the model uses the standard Llama 3.1 chat format. Here’s an example:
94
+
95
+ ```
96
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
97
+
98
+ Cutting Knowledge Date: December 2023
99
+ Today Date: 29 September 2024
100
+
101
+ You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
102
+
103
+ What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
104
+ ```
105
+
106
+ ## Tips for Performance
107
+
108
+ We are initially recommending a set of parameters.
109
+
110
+ ```
111
+ sequence length = 256
112
+ temperature = 0
113
+ top-k-sampling = -1
114
+ top-p = 1
115
+ context window = 39680
116
+ ```
117
+