Adding the Open Portuguese LLM Leaderboard Evaluation Results

6b52074 verified 5 months ago

5.95 kB

	---
	language:
	- pt
	license: apache-2.0
	datasets:
	- nicholasKluge/Pt-Corpus
	model-index:
	- name: Mistral-7B-v0.2-Base_ptbr
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 64.94
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 53.96
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 45.42
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 90.11
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 72.51
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 69.04
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 79.62
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 58.52
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 62.32
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
	name: Open Portuguese LLM Leaderboard
	---

	É um modelo base pré-treinado com cerca de 1b tokens em portugues iniciado com os pesos oficiais do modelo, deve ser utilizado para fine tuning.

	Obs: Aguardando [resultados oficiais](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
	\| \| Mistral Base PTBR \| Mistral Base \| Melhoria \|
	\|------------------------------\|-------------------\|--------------\|----------\|
	\| assin2_rte \| 90,2 \| 87,74 \| 2,46 \|
	\| assin2_sts \| 72,45 \| 67,05 \| 5,4 \|
	\| bluex \| 53,27 \| 53,27 \| 0 \|
	\| enem \| 64,66 \| 62,42 \| 2,24 \|
	\| faquad_nli \| 68,11 \| 47,63 \| 20,48 \|
	\| hatebr_offensive_binary \| 79,65 \| 77,63 \| 2,02 \|
	\| oab_exams \| 45,42 \| 45,24 \| 0,18 \|
	\| portuguese_hate_speech_binary\| 59,18 \| 55,72 \| 3,46 \|
	# Open Portuguese LLM Leaderboard Evaluation Results
	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/JJhooww/Mistral-7B-v0.2-Base_ptbr) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|66.27\|
	\|ENEM Challenge (No Images)\| 64.94\|
	\|BLUEX (No Images) \| 53.96\|
	\|OAB Exams \| 45.42\|
	\|Assin2 RTE \| 90.11\|
	\|Assin2 STS \| 72.51\|
	\|FaQuAD NLI \| 69.04\|
	\|HateBR Binary \| 79.62\|
	\|PT Hate Speech Binary \| 58.52\|
	\|tweetSentBR \| 62.32\|