kimhyeongjun
commited on
Commit
โข
9b7a5d6
1
Parent(s):
18eea68
Update README.md
Browse files
README.md
CHANGED
@@ -25,14 +25,19 @@ This model is a fine-tuned version of [NousResearch/Hermes-3-Llama-3.1-8B](https
|
|
25 |
|
26 |
## Model description
|
27 |
|
|
|
|
|
28 |
Based on finance PDF data collected directly from the web, we refined the raw data using the 'meta-llama/Meta-Llama-3.1-70B-Instruct' model.
|
29 |
After generating synthetic data based on the cleaned data, we further evaluated the quality of the generated data using the 'meta-llama/Llama-Guard-3-8B' and 'RLHFlow/ArmoRM-Llama3-8B-v0.1' models.
|
30 |
We then used 'Alibaba-NLP/gte-large-en-v1.5' to extract embeddings and applied Faiss to perform Jaccard distance-based nearest neighbor analysis to construct the final dataset of 21k, which is multidimensional and sophisticated.
|
31 |
|
|
|
|
|
32 |
์น์์ ์ง์ ์์งํ ๊ธ์ต ๊ด๋ จ PDF ๋ฐ์ดํฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก, 'meta-llama/Meta-Llama-3.1-70B-Instruct' ๋ชจ๋ธ์ ํ์ฉํ์ฌ ์์ ๋ฐ์ดํฐ๋ฅผ ์ ์ ํ์์ต๋๋ค.
|
33 |
์ ์ ๋ ๋ฐ์ดํฐ๋ฅผ ๋ฐํ์ผ๋ก ํฉ์ฑ ๋ฐ์ดํฐ๋ฅผ ์์ฑํ ํ, 'meta-llama/Llama-Guard-3-8B' ๋ฐ 'RLHFlow/ArmoRM-Llama3-8B-v0.1' ๋ชจ๋ธ์ ํตํด ์์ฑ๋ ๋ฐ์ดํฐ์ ํ์ง์ ์ฌ์ธต์ ์ผ๋ก ํ๊ฐํ์์ต๋๋ค.
|
34 |
์ด์ด์ 'Alibaba-NLP/gte-large-en-v1.5'๋ฅผ ์ฌ์ฉํ์ฌ ์๋ฒ ๋ฉ์ ์ถ์ถํ๊ณ , Faiss๋ฅผ ์ ์ฉํ์ฌ ์์นด๋ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ์ ๊ทผ์ ์ด์ ๋ถ์์ ์ํํจ์ผ๋ก์จ ๋ค์ฐจ์์ ์ด๊ณ ์ ๊ตํ ์ต์ข
๋ฐ์ดํฐ์
21k์ ์ง์ ๊ตฌ์ฑํ์์ต๋๋ค.
|
35 |
|
|
|
36 |
## Task duration
|
37 |
3days (20240914~20240916)
|
38 |
|
|
|
25 |
|
26 |
## Model description
|
27 |
|
28 |
+
Everything happened automatically without any user intervention.
|
29 |
+
|
30 |
Based on finance PDF data collected directly from the web, we refined the raw data using the 'meta-llama/Meta-Llama-3.1-70B-Instruct' model.
|
31 |
After generating synthetic data based on the cleaned data, we further evaluated the quality of the generated data using the 'meta-llama/Llama-Guard-3-8B' and 'RLHFlow/ArmoRM-Llama3-8B-v0.1' models.
|
32 |
We then used 'Alibaba-NLP/gte-large-en-v1.5' to extract embeddings and applied Faiss to perform Jaccard distance-based nearest neighbor analysis to construct the final dataset of 21k, which is multidimensional and sophisticated.
|
33 |
|
34 |
+
๋ชจ๋ ๊ณผ์ ์ ์ฌ์ฉ์์ ๊ฐ์
์์ด ์๋์ผ๋ก ์งํ๋์์ต๋๋ค.
|
35 |
+
|
36 |
์น์์ ์ง์ ์์งํ ๊ธ์ต ๊ด๋ จ PDF ๋ฐ์ดํฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก, 'meta-llama/Meta-Llama-3.1-70B-Instruct' ๋ชจ๋ธ์ ํ์ฉํ์ฌ ์์ ๋ฐ์ดํฐ๋ฅผ ์ ์ ํ์์ต๋๋ค.
|
37 |
์ ์ ๋ ๋ฐ์ดํฐ๋ฅผ ๋ฐํ์ผ๋ก ํฉ์ฑ ๋ฐ์ดํฐ๋ฅผ ์์ฑํ ํ, 'meta-llama/Llama-Guard-3-8B' ๋ฐ 'RLHFlow/ArmoRM-Llama3-8B-v0.1' ๋ชจ๋ธ์ ํตํด ์์ฑ๋ ๋ฐ์ดํฐ์ ํ์ง์ ์ฌ์ธต์ ์ผ๋ก ํ๊ฐํ์์ต๋๋ค.
|
38 |
์ด์ด์ 'Alibaba-NLP/gte-large-en-v1.5'๋ฅผ ์ฌ์ฉํ์ฌ ์๋ฒ ๋ฉ์ ์ถ์ถํ๊ณ , Faiss๋ฅผ ์ ์ฉํ์ฌ ์์นด๋ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ์ ๊ทผ์ ์ด์ ๋ถ์์ ์ํํจ์ผ๋ก์จ ๋ค์ฐจ์์ ์ด๊ณ ์ ๊ตํ ์ต์ข
๋ฐ์ดํฐ์
21k์ ์ง์ ๊ตฌ์ฑํ์์ต๋๋ค.
|
39 |
|
40 |
+
|
41 |
## Task duration
|
42 |
3days (20240914~20240916)
|
43 |
|