Update README_EN.md
Browse files- README_EN.md +3 -71
README_EN.md
CHANGED
@@ -124,75 +124,7 @@ Here [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC
|
|
124 |
|
125 |
|
126 |
|
127 |
-
# 4.
|
128 |
-
|
129 |
-
| Name | Download | Quantity | Description |
|
130 |
-
| ---------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
|
131 |
-
| InstructIE | [Google drive](https://drive.google.com/file/d/1raf0h98x3GgIhaDyNn1dLle9_HvwD6wT/view?usp=sharing) <br/> [Baidu Netdisk](https://pan.baidu.com/s/1-u8bD85H1Otbzk-gjLxaFw?pwd=c1i6) | 20w+ | InstrumentIE dataset (bilingual in Chinese and English) |
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
The `InstructIE` dataset contains two core files: `InstructIE-zh.json` and `InstructIE-en.json`. Both files cover a range of fields that provide detailed descriptions of different aspects of the dataset:
|
136 |
-
|
137 |
-
- `'id'`: A unique identifier for each data entry, ensuring the independence and traceability of the data items.
|
138 |
-
- `'cate'`: The text's subject category, which provides a high-level categorical label for the content (there are 12 categories in total).
|
139 |
-
-'text ': The text to be extracted.
|
140 |
-
- `'relation'`: Represent **relationship triples**, respectively. These fields allow users to freely construct instructions and expected outputs for information extraction.
|
141 |
-
|
142 |
-
|
143 |
-
|
144 |
-
<details>
|
145 |
-
<summary><b>Explanation of each field</b></summary>
|
146 |
-
|
147 |
-
|
148 |
-
| Field | Description |
|
149 |
-
| ----------- | ---------------------------------------------------------------- |
|
150 |
-
| id | The unique identifier for each data point. |
|
151 |
-
| cate | The category of the text's subject, with a total of 12 different thematic categories. |
|
152 |
-
| input | The input text for the model, with the goal of extracting all the involved relationship triples. |
|
153 |
-
| instruction | Instructions guiding the model to perform information extraction tasks. |
|
154 |
-
| output | The expected output result of the model. |
|
155 |
-
| relation | Describes the relationship triples contained in the text, i.e., the connections between entities (head, relation, tail). |
|
156 |
-
|
157 |
-
</details>
|
158 |
-
|
159 |
-
|
160 |
-
<details>
|
161 |
-
<summary><b>Example of data</b></summary>
|
162 |
-
|
163 |
-
|
164 |
-
```json
|
165 |
-
{
|
166 |
-
"id": "6e4f87f7f92b1b9bd5cb3d2c3f2cbbc364caaed30940a1f8b7b48b04e64ec403",
|
167 |
-
"cate": "Person",
|
168 |
-
"input": "Dionisio Pérez Gutiérrez (born 1872 in Grazalema (Cádiz) - died 23 February 1935 in Madrid) was a Spanish writer, journalist, and gastronome. He has been called \"one of Spain's most authoritative food writers\" and was an early adopter of the term Hispanidad.\nHis pen name, \"Post-Thebussem\", was chosen as a show of support for Mariano Pardo de Figueroa, who went by the handle \"Dr. Thebussem\".",
|
169 |
-
"entity": [
|
170 |
-
{"entity": "Dionisio Pérez Gutiérrez", "entity_type": "human"},
|
171 |
-
{"entity": "Post-Thebussem", "entity_type": "human"},
|
172 |
-
{"entity": "Grazalema", "entity_type": "geographic_region"},
|
173 |
-
{"entity": "Cádiz", "entity_type": "geographic_region"},
|
174 |
-
{"entity": "Madrid", "entity_type": "geographic_region"},
|
175 |
-
{"entity": "gastronome", "entity_type": "event"},
|
176 |
-
{"entity": "Spain", "entity_type": "geographic_region"},
|
177 |
-
{"entity": "Hispanidad", "entity_type": "architectural_structure"},
|
178 |
-
{"entity": "Mariano Pardo de Figueroa", "entity_type": "human"},
|
179 |
-
{"entity": "23 February 1935", "entity_type": "time"}
|
180 |
-
],
|
181 |
-
"relation": [
|
182 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "country of citizenship", "tail": "Spain"},
|
183 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "place of birth", "tail":"Grazalema"},
|
184 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "place of death", "tail": "Madrid"},
|
185 |
-
{"head": "Mariano Pardo de Figueroa", "relation": "country of citizenship", "tail": "Spain"},
|
186 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "alternative name", "tail": "Post-Thebussem"},
|
187 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "date of death", "tail": "23 February 1935"}
|
188 |
-
]
|
189 |
-
}
|
190 |
-
```
|
191 |
-
|
192 |
-
</details>
|
193 |
-
|
194 |
-
|
195 |
-
# 5.Convert script
|
196 |
|
197 |
**Training Data Transformation**
|
198 |
|
@@ -306,7 +238,7 @@ After data conversion, you will obtain structured data containing the `input` te
|
|
306 |
|
307 |
|
308 |
|
309 |
-
#
|
310 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
311 |
|
312 |
```bash
|
@@ -322,7 +254,7 @@ If GPU memory is not enough, you can use `--bits 8` or `--bits 4`.
|
|
322 |
|
323 |
|
324 |
|
325 |
-
#
|
326 |
|
327 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
328 |
|
|
|
124 |
|
125 |
|
126 |
|
127 |
+
# 4.Convert script
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
|
129 |
**Training Data Transformation**
|
130 |
|
|
|
238 |
|
239 |
|
240 |
|
241 |
+
# 5.Usage
|
242 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
243 |
|
244 |
```bash
|
|
|
254 |
|
255 |
|
256 |
|
257 |
+
# 6.Evaluate
|
258 |
|
259 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
260 |
|