kimhyeongjun commited on
Commit
5255cd3
โ€ข
1 Parent(s): f4fec9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -15,11 +15,16 @@ model-index:
15
 
16
  # kimhyeongjun/Hermes-3-Llama-3.1-8B-Ko-Finance-Advisors
17
 
 
 
18
  This model is a fine-tuned version of [NousResearch/Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) on the Korean_synthetic_financial_dataset_21K.
19
 
 
 
20
  ์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ_ํ•ฉ์„ฑ_๊ธˆ์œต_๋ฐ์ดํ„ฐ์…‹_21K์˜ [NousResearch/Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.
21
 
22
  ## Model description
 
23
  Based on finance PDF data collected directly from the web, we refined the raw data using the 'meta-llama/Meta-Llama-3.1-70B-Instruct' model.
24
  After generating synthetic data based on the cleaned data, we further evaluated the quality of the generated data using the 'meta-llama/Llama-Guard-3-8B' and 'RLHFlow/ArmoRM-Llama3-8B-v0.1' models.
25
  We then used 'Alibaba-NLP/gte-large-en-v1.5' to extract embeddings and applied Faiss to perform Jaccard distance-based nearest neighbor analysis to construct the final dataset of 21k, which is multidimensional and sophisticated.
@@ -28,6 +33,9 @@ We then used 'Alibaba-NLP/gte-large-en-v1.5' to extract embeddings and applied F
28
  ์ •์ œ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ ํ›„, 'meta-llama/Llama-Guard-3-8B' ๋ฐ 'RLHFlow/ArmoRM-Llama3-8B-v0.1' ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ์‹ฌ์ธต์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
29
  ์ด์–ด์„œ 'Alibaba-NLP/gte-large-en-v1.5'๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ์„ ์ถ”์ถœํ•˜๊ณ , Faiss๋ฅผ ์ ์šฉํ•˜์—ฌ ์ž์นด๋“œ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ๊ทผ์ ‘ ์ด์›ƒ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ๋‹ค์ฐจ์›์ ์ด๊ณ  ์ •๊ตํ•œ ์ตœ์ข… ๋ฐ์ดํ„ฐ์…‹ 21k์„ ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.
30
 
 
 
 
31
 
32
  ## sample
33
 
 
15
 
16
  # kimhyeongjun/Hermes-3-Llama-3.1-8B-Ko-Finance-Advisors
17
 
18
+ This is a toy project to appease the feeling of being free during Chuseok(Korean Thanksgiving Day).
19
+
20
  This model is a fine-tuned version of [NousResearch/Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) on the Korean_synthetic_financial_dataset_21K.
21
 
22
+ ์ถ”์„๊ธฐ๊ฐ„ ๋ฌด๋ฃŒํ•จ์„ ๋‹ฌ๋ž˜๊ธฐ์œ„ํ•œ ํ† ์ด ํ”„๋กœ์ ํŠธ ์ž…๋‹ˆ๋‹ค.
23
+
24
  ์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ_ํ•ฉ์„ฑ_๊ธˆ์œต_๋ฐ์ดํ„ฐ์…‹_21K์˜ [NousResearch/Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.
25
 
26
  ## Model description
27
+
28
  Based on finance PDF data collected directly from the web, we refined the raw data using the 'meta-llama/Meta-Llama-3.1-70B-Instruct' model.
29
  After generating synthetic data based on the cleaned data, we further evaluated the quality of the generated data using the 'meta-llama/Llama-Guard-3-8B' and 'RLHFlow/ArmoRM-Llama3-8B-v0.1' models.
30
  We then used 'Alibaba-NLP/gte-large-en-v1.5' to extract embeddings and applied Faiss to perform Jaccard distance-based nearest neighbor analysis to construct the final dataset of 21k, which is multidimensional and sophisticated.
 
33
  ์ •์ œ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ ํ›„, 'meta-llama/Llama-Guard-3-8B' ๋ฐ 'RLHFlow/ArmoRM-Llama3-8B-v0.1' ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ์‹ฌ์ธต์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
34
  ์ด์–ด์„œ 'Alibaba-NLP/gte-large-en-v1.5'๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ์„ ์ถ”์ถœํ•˜๊ณ , Faiss๋ฅผ ์ ์šฉํ•˜์—ฌ ์ž์นด๋“œ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ๊ทผ์ ‘ ์ด์›ƒ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ๋‹ค์ฐจ์›์ ์ด๊ณ  ์ •๊ตํ•œ ์ตœ์ข… ๋ฐ์ดํ„ฐ์…‹ 21k์„ ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.
35
 
36
+ ## Task duration
37
+ 3days (20240914-20240916)
38
+
39
 
40
  ## sample
41