Update README.md
Browse files
README.md
CHANGED
@@ -53,6 +53,27 @@ To use tensorboard to visualize the training loss curve:
|
|
53 |
pip install future tensorboard
|
54 |
```
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
## Data prepration
|
57 |
1. File structure
|
58 |
|
|
|
53 |
pip install future tensorboard
|
54 |
```
|
55 |
|
56 |
+
5. If your training process is killed during data preprocessing, you can modify the `map_num_proc` in xtuner/xtuner/dataset
|
57 |
+
/huggingface.py
|
58 |
+
```
|
59 |
+
def process(dataset,
|
60 |
+
do_dataset_tokenization=True,
|
61 |
+
tokenizer=None,
|
62 |
+
max_length=None,
|
63 |
+
dataset_map_fn=None,
|
64 |
+
template_map_fn=None,
|
65 |
+
max_dataset_length=None,
|
66 |
+
split='train',
|
67 |
+
remove_unused_columns=False,
|
68 |
+
rename_maps=[],
|
69 |
+
shuffle_before_pack=True,
|
70 |
+
pack_to_max_length=True,
|
71 |
+
use_varlen_attn=False,
|
72 |
+
input_ids_with_output=True,
|
73 |
+
with_image_token=False,
|
74 |
+
map_num_proc=32): # modify it to 1
|
75 |
+
```
|
76 |
+
|
77 |
## Data prepration
|
78 |
1. File structure
|
79 |
|