NTIS
/

KoRnDAlpaca-Polyglot-12.8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NTIS commited on Sep 19, 2023

Commit

9f95d79

•

1 Parent(s): e5d5649

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ print(output)
 - The process of building the dataset is as follows
   * A. Extract important texts related to technology, such as technology trends and technology definitions, from research reports.
   * B. Preprocess the extracted text
-  * C. Generate question and answer pairs (total 1.5 million)  based on the extracted text by using ChatGPT API(temporarily). Scheduled to be replaced with our own question&answer generation model(`23.11)
   * D. Reformat the dataset in the form of (Instruction, Output, Source). ‘Instruction’ is the user's question, ‘Output’ is the answer, and ‘Source’ is the research report identification code that the answer is based on.
   * E. Remove low-quality data by the data quality evaluation module. Use only high-quality Q&As for training. (1 million)
     * ※ In KoRnDAlpaca v2 (planned for `23.10), in addition to Q&A, the instruction dataset will be added to generate long-form technology trends.

 - The process of building the dataset is as follows
   * A. Extract important texts related to technology, such as technology trends and technology definitions, from research reports.
   * B. Preprocess the extracted text
+  * C. Generate question and answer pairs (total 1.5 million)  based on the extracted text by using ChatGPT API(temporarily), which scheduled to be replaced with our own question&answer generation model(`23.11)
   * D. Reformat the dataset in the form of (Instruction, Output, Source). ‘Instruction’ is the user's question, ‘Output’ is the answer, and ‘Source’ is the research report identification code that the answer is based on.
   * E. Remove low-quality data by the data quality evaluation module. Use only high-quality Q&As for training. (1 million)
     * ※ In KoRnDAlpaca v2 (planned for `23.10), in addition to Q&A, the instruction dataset will be added to generate long-form technology trends.