Upload folder using huggingface_hub

c53d8e3 verified 8 days ago

7.68 kB

	---
	language: fr
	license: mit
	tags:
	- roberta
	- text-classification
	- nli
	base_model: almanach/camembertv2-base
	datasets:
	- FLUE-XNLI
	metrics:
	- accuracy
	pipeline_tag: text-classification
	library_name: transformers
	model-index:
	- name: almanach/camembertv2-base-xnli
	results:
	- task:
	type: text-classification
	name: Natural Language Inference
	dataset:
	type: flue-XNLI
	name: FLUE-XNLI
	metrics:
	- name: accuracy
	type: accuracy
	value: 0.82851
	verified: false
	---

	# Model Card for almanach/camembertv2-base-xnli

	almanach/camembertv2-base-xnli is a roberta model for text classification. It is trained on the FLUE-XNLI dataset for the task of Natural Language Inference. The model achieves an accuracy of 0.82851 on the FLUE-XNLI dataset.

	The model is part of the almanach/camembertv2-base family of model finetunes.

	## Model Details

	### Model Description

	- Developed by: Wissam Antoun (Phd Student at Almanach, Inria-Paris)
	- Model type: roberta
	- Language(s) (NLP): French
	- License: MIT
	- Finetuned from model [optional]: almanach/camembertv2-base

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/WissamAntoun/camemberta
	- Paper: https://arxiv.org/abs/2411.08868

	## Uses

	The model can be used for text classification tasks in French for Natural Language Inference.

	## Bias, Risks, and Limitations

	The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

	model = AutoModelForSequenceClassification.from_pretrained("almanach/camembertv2-base-xnli")
	tokenizer = AutoTokenizer.from_pretrained("almanach/camembertv2-base-xnli")

	classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

	classifier({
	"text": "Le livre est très intéressant et j'ai appris beaucoup de choses.",
	"text_pair": "Le livre est très ennuyeux et je n'ai rien appris.",
	})
	```


	## Training Details

	### Training Data

	The model is trained on the FLUE-XNLI dataset.

	- Dataset Name: FLUE-XNLI
	- Dataset Size:
	- Train: 49399
	- Dev: 1988
	- Test: 2000


	### Training Procedure

	Model trained with the run_xnli.py script from the huggingface repository.



	#### Training Hyperparameters

	```yml
	accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'':
	True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'':
	None}'
	adafactor: false
	adam_beta1: 0.9
	adam_beta2: 0.999
	adam_epsilon: 1.0e-08
	auto_find_batch_size: false
	base_model: camembertv2
	base_model_name: camembertv2-base-bf16-p2-17000
	batch_eval_metrics: false
	bf16: false
	bf16_full_eval: false
	data_seed: 666.0
	dataloader_drop_last: false
	dataloader_num_workers: 0
	dataloader_persistent_workers: false
	dataloader_pin_memory: true
	dataloader_prefetch_factor: .nan
	ddp_backend: .nan
	ddp_broadcast_buffers: .nan
	ddp_bucket_cap_mb: .nan
	ddp_find_unused_parameters: .nan
	ddp_timeout: 1800
	debug: '[]'
	deepspeed: .nan
	disable_tqdm: false
	dispatch_batches: .nan
	do_eval: true
	do_predict: false
	do_train: true
	epoch: 10.0
	eval_accumulation_steps: 4
	eval_accuracy: 0.8285140562248996
	eval_delay: 0
	eval_do_concat_batches: true
	eval_loss: 0.5347269773483276
	eval_on_start: false
	eval_runtime: 6.7497
	eval_samples: 2490
	eval_samples_per_second: 368.907
	eval_steps: .nan
	eval_steps_per_second: 46.224
	eval_strategy: epoch
	eval_use_gather_object: false
	evaluation_strategy: epoch
	fp16: false
	fp16_backend: auto
	fp16_full_eval: false
	fp16_opt_level: O1
	fsdp: '[]'
	fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'':
	False}'
	fsdp_min_num_params: 0
	fsdp_transformer_layer_cls_to_wrap: .nan
	full_determinism: false
	gradient_accumulation_steps: 4
	gradient_checkpointing: false
	gradient_checkpointing_kwargs: .nan
	greater_is_better: true
	group_by_length: false
	half_precision_backend: auto
	hub_always_push: false
	hub_model_id: .nan
	hub_private_repo: false
	hub_strategy: every_save
	hub_token: <HUB_TOKEN>
	ignore_data_skip: false
	include_inputs_for_metrics: false
	include_num_input_tokens_seen: false
	include_tokens_per_second: false
	jit_mode_eval: false
	label_names: .nan
	label_smoothing_factor: 0.0
	learning_rate: 1.0e-05
	length_column_name: length
	load_best_model_at_end: true
	local_rank: 0
	log_level: debug
	log_level_replica: warning
	log_on_each_node: true
	logging_dir: /scratch/camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1/SEED-666/logs
	logging_first_step: false
	logging_nan_inf_filter: true
	logging_steps: 100
	logging_strategy: steps
	lr_scheduler_kwargs: '{}'
	lr_scheduler_type: cosine
	max_grad_norm: 1.0
	max_steps: -1
	metric_for_best_model: accuracy
	mp_parameters: .nan
	name: camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1
	neftune_noise_alpha: .nan
	no_cuda: false
	num_train_epochs: 10.0
	optim: adamw_torch
	optim_args: .nan
	optim_target_modules: .nan
	output_dir: /scratch/camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1/SEED-666
	overwrite_output_dir: false
	past_index: -1
	per_device_eval_batch_size: 8
	per_device_train_batch_size: 8
	per_gpu_eval_batch_size: .nan
	per_gpu_train_batch_size: .nan
	prediction_loss_only: false
	push_to_hub: false
	push_to_hub_model_id: .nan
	push_to_hub_organization: .nan
	push_to_hub_token: <PUSH_TO_HUB_TOKEN>
	ray_scope: last
	remove_unused_columns: true
	report_to: '[''tensorboard'']'
	restore_callback_states_from_checkpoint: false
	resume_from_checkpoint: .nan
	run_name: /scratch/camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1/SEED-666
	save_on_each_node: false
	save_only_model: false
	save_safetensors: true
	save_steps: 500
	save_strategy: epoch
	save_total_limit: .nan
	seed: 666
	skip_memory_metrics: true
	split_batches: .nan
	tf32: .nan
	torch_compile: true
	torch_compile_backend: inductor
	torch_compile_mode: .nan
	torch_empty_cache_steps: .nan
	torchdynamo: .nan
	total_flos: 1.617427903829713e+17
	tpu_metrics_debug: false
	tpu_num_cores: .nan
	train_loss: 0.3309724763735177
	train_runtime: 41426.0671
	train_samples: 392702
	train_samples_per_second: 94.796
	train_steps_per_second: 2.962
	use_cpu: false
	use_ipex: false
	use_legacy_prediction_loop: false
	use_mps_device: false
	warmup_ratio: 0.1
	warmup_steps: 0
	weight_decay: 0.0

	```

	#### Results

	Accuracy: 0.82851

	## Technical Specifications

	### Model Architecture and Objective

	roberta for sequence classification.

	## Citation

	BibTeX:

	```bibtex
	@misc{antoun2024camembert20smarterfrench,
	title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
	author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
	year={2024},
	eprint={2411.08868},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2411.08868},
	}
	```