StarCycle commited on
Commit
eb37f82
1 Parent(s): df457f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -53,6 +53,27 @@ To use tensorboard to visualize the training loss curve:
53
  pip install future tensorboard
54
  ```
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Data prepration
57
  1. File structure
58
 
 
53
  pip install future tensorboard
54
  ```
55
 
56
+ 5. If your training process is killed during data preprocessing, you can modify the `map_num_proc` in xtuner/xtuner/dataset
57
+ /huggingface.py
58
+ ```
59
+ def process(dataset,
60
+ do_dataset_tokenization=True,
61
+ tokenizer=None,
62
+ max_length=None,
63
+ dataset_map_fn=None,
64
+ template_map_fn=None,
65
+ max_dataset_length=None,
66
+ split='train',
67
+ remove_unused_columns=False,
68
+ rename_maps=[],
69
+ shuffle_before_pack=True,
70
+ pack_to_max_length=True,
71
+ use_varlen_attn=False,
72
+ input_ids_with_output=True,
73
+ with_image_token=False,
74
+ map_num_proc=32): # modify it to 1
75
+ ```
76
+
77
  ## Data prepration
78
  1. File structure
79