--- language: fr license: mit tags: - roberta - text-classification - nli base_model: almanach/camembertv2-base datasets: - FLUE-XNLI metrics: - accuracy pipeline_tag: text-classification library_name: transformers model-index: - name: almanach/camembertv2-base-xnli results: - task: type: text-classification name: Natural Language Inference dataset: type: flue-XNLI name: FLUE-XNLI metrics: - name: accuracy type: accuracy value: 0.82851 verified: false --- # Model Card for almanach/camembertv2-base-xnli almanach/camembertv2-base-xnli is a roberta model for text classification. It is trained on the FLUE-XNLI dataset for the task of Natural Language Inference. The model achieves an accuracy of 0.82851 on the FLUE-XNLI dataset. The model is part of the almanach/camembertv2-base family of model finetunes. ## Model Details ### Model Description - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris) - **Model type:** roberta - **Language(s) (NLP):** French - **License:** MIT - **Finetuned from model [optional]:** almanach/camembertv2-base ### Model Sources [optional] - **Repository:** https://github.com/WissamAntoun/camemberta - **Paper:** https://arxiv.org/abs/2411.08868 ## Uses The model can be used for text classification tasks in French for Natural Language Inference. ## Bias, Risks, and Limitations The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline model = AutoModelForSequenceClassification.from_pretrained("almanach/camembertv2-base-xnli") tokenizer = AutoTokenizer.from_pretrained("almanach/camembertv2-base-xnli") classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) classifier({ "text": "Le livre est très intéressant et j'ai appris beaucoup de choses.", "text_pair": "Le livre est très ennuyeux et je n'ai rien appris.", }) ``` ## Training Details ### Training Data The model is trained on the FLUE-XNLI dataset. - Dataset Name: FLUE-XNLI - Dataset Size: - Train: 49399 - Dev: 1988 - Test: 2000 ### Training Procedure Model trained with the run_xnli.py script from the huggingface repository. #### Training Hyperparameters ```yml accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'': True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'': None}' adafactor: false adam_beta1: 0.9 adam_beta2: 0.999 adam_epsilon: 1.0e-08 auto_find_batch_size: false base_model: camembertv2 base_model_name: camembertv2-base-bf16-p2-17000 batch_eval_metrics: false bf16: false bf16_full_eval: false data_seed: 666.0 dataloader_drop_last: false dataloader_num_workers: 0 dataloader_persistent_workers: false dataloader_pin_memory: true dataloader_prefetch_factor: .nan ddp_backend: .nan ddp_broadcast_buffers: .nan ddp_bucket_cap_mb: .nan ddp_find_unused_parameters: .nan ddp_timeout: 1800 debug: '[]' deepspeed: .nan disable_tqdm: false dispatch_batches: .nan do_eval: true do_predict: false do_train: true epoch: 10.0 eval_accumulation_steps: 4 eval_accuracy: 0.8285140562248996 eval_delay: 0 eval_do_concat_batches: true eval_loss: 0.5347269773483276 eval_on_start: false eval_runtime: 6.7497 eval_samples: 2490 eval_samples_per_second: 368.907 eval_steps: .nan eval_steps_per_second: 46.224 eval_strategy: epoch eval_use_gather_object: false evaluation_strategy: epoch fp16: false fp16_backend: auto fp16_full_eval: false fp16_opt_level: O1 fsdp: '[]' fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'': False}' fsdp_min_num_params: 0 fsdp_transformer_layer_cls_to_wrap: .nan full_determinism: false gradient_accumulation_steps: 4 gradient_checkpointing: false gradient_checkpointing_kwargs: .nan greater_is_better: true group_by_length: false half_precision_backend: auto hub_always_push: false hub_model_id: .nan hub_private_repo: false hub_strategy: every_save hub_token: ignore_data_skip: false include_inputs_for_metrics: false include_num_input_tokens_seen: false include_tokens_per_second: false jit_mode_eval: false label_names: .nan label_smoothing_factor: 0.0 learning_rate: 1.0e-05 length_column_name: length load_best_model_at_end: true local_rank: 0 log_level: debug log_level_replica: warning log_on_each_node: true logging_dir: /scratch/camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1/SEED-666/logs logging_first_step: false logging_nan_inf_filter: true logging_steps: 100 logging_strategy: steps lr_scheduler_kwargs: '{}' lr_scheduler_type: cosine max_grad_norm: 1.0 max_steps: -1 metric_for_best_model: accuracy mp_parameters: .nan name: camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1 neftune_noise_alpha: .nan no_cuda: false num_train_epochs: 10.0 optim: adamw_torch optim_args: .nan optim_target_modules: .nan output_dir: /scratch/camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1/SEED-666 overwrite_output_dir: false past_index: -1 per_device_eval_batch_size: 8 per_device_train_batch_size: 8 per_gpu_eval_batch_size: .nan per_gpu_train_batch_size: .nan prediction_loss_only: false push_to_hub: false push_to_hub_model_id: .nan push_to_hub_organization: .nan push_to_hub_token: ray_scope: last remove_unused_columns: true report_to: '[''tensorboard'']' restore_callback_states_from_checkpoint: false resume_from_checkpoint: .nan run_name: /scratch/camembertv2/runs/results/xnli/camembertv2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-cosine-warmup_steps-0.1/SEED-666 save_on_each_node: false save_only_model: false save_safetensors: true save_steps: 500 save_strategy: epoch save_total_limit: .nan seed: 666 skip_memory_metrics: true split_batches: .nan tf32: .nan torch_compile: true torch_compile_backend: inductor torch_compile_mode: .nan torch_empty_cache_steps: .nan torchdynamo: .nan total_flos: 1.617427903829713e+17 tpu_metrics_debug: false tpu_num_cores: .nan train_loss: 0.3309724763735177 train_runtime: 41426.0671 train_samples: 392702 train_samples_per_second: 94.796 train_steps_per_second: 2.962 use_cpu: false use_ipex: false use_legacy_prediction_loop: false use_mps_device: false warmup_ratio: 0.1 warmup_steps: 0 weight_decay: 0.0 ``` #### Results **Accuracy:** 0.82851 ## Technical Specifications ### Model Architecture and Objective roberta for sequence classification. ## Citation **BibTeX:** ```bibtex @misc{antoun2024camembert20smarterfrench, title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection}, author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah}, year={2024}, eprint={2411.08868}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.08868}, } ```