ptuning微调报错'ChatGLMModel' object has no attribute 'prefix_encoder'

#19
by moumouliu - opened

使用ptuning方法对chatglm2-6b模型进行微调时候微调报错'ChatGLMModel' object has no attribute 'prefix_encoder'。

我也遇到了同样的问题

我也遇到了同样的问题

昨晚模型更新了 可以跟新模型代码和项目代码 重新试试 不过又出了新的报错
[WARNING|modeling_utils.py:3034] 2023-06-30 10:38:17,622 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at THUDM/chatglm2-6b and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2690] 2023-06-30 10:38:17,916 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
Traceback (most recent call last):
File "main.py", line 430, in
main()
File "main.py", line 248, in main
train_dataset = train_dataset.map(
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 578, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 543, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 3073, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 3449, in _map_single
batch = apply_function_on_filtered_inputs(
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 3330, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "main.py", line 219, in preprocess_function_train
context_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list

我也遇到了同样的问题

昨晚模型更新了 可以跟新模型代码和项目代码 重新试试 不过又出了新的报错
[WARNING|modeling_utils.py:3034] 2023-06-30 10:38:17,622 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at THUDM/chatglm2-6b and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2690] 2023-06-30 10:38:17,916 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
Traceback (most recent call last):
File "main.py", line 430, in
main()
File "main.py", line 248, in main
train_dataset = train_dataset.map(
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 578, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 543, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 3073, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 3449, in _map_single
batch = apply_function_on_filtered_inputs(
File "E:\ProgramData\anaconda3\envs\glm2\lib\site-packages\datasets\arrow_dataset.py", line 3330, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "main.py", line 219, in preprocess_function_train
context_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list

SPTokenizer 里面的 eos_token_id 是None ,需要手动添加成2,或者直接换成GLMtokenizer里的eos_id.
不过后面又报了新错 :“piece id is out of range." ,检查发现并没有超出词汇表的范围,望大神看到后解答

Sign up or log in to comment