Spaces:

d-matrix
/

dmx_perplexity

Sleeping

App Files Files Community

dmx_perplexity / README.md

d-matrix-user

updated README

76e1a38 10 months ago

preview code

raw

history blame

No virus

2.49 kB

	---
	title: Perplexity
	emoji: 🌖
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: 4.7.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- evaluate
	- metric
	description: >-
	Perplexity metric implemented by d-Matrix.
	Perplexity (PPL) is one of the most common metrics for evaluating language models.
	It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.
	For more information, see https://huggingface.co/docs/transformers/perplexity
	---

	# Metric Card for Perplexity


	## Metric Description

	Perplexity metric implemented by d-Matrix.
	Perplexity (PPL) is one of the most common metrics for evaluating language models.
	It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.
	For more information, see https://huggingface.co/docs/transformers/perplexity

	## How to Use
	At minimum, this metric requires the model and data as inputs.
	```python
	>>> import evaluate
	>>> perplexity = evaluate.load("perplexity", module_type="metric")
	>>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
	>>> results = perplexity.compute(model='distilgpt2',data=input_texts)
	>>> print(results)
	{'accuracy': 1.0}
	```

	### Inputs
	- model (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity
	- data (`list` of `str`): input text, each separate text snippet is one list entry.
	- device (`str`): device to run on, defaults to 'cuda' when available.
	- max_length (`int`): maximum sequence length, defaults to 2048.

	### Output Values
	- loss (`float`): the loss of the model predictions compared to the reference
	- perplexity(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower.

	Output Example(s):
	```python
	{'accuracy': 1.0}
	```
	This metric outputs a dictionary, containing the loss and perplexity score.

	### Examples
	```python
	>>> import evaluate
	>>> from datasets import load_dataset
	>>> perplexity = evaluate.load("d-matrix/perplexity", module_type="metric")
	>>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10]
	>>> results = perplexity.compute(model='distilgpt2',data=input_texts)
	>>> print(list(results.keys()))
	['loss', 'perplexity']
	>>> print(results['loss'])
	3.8299286365509033
	>>> print(results['perplexity'])
	46.05925369262695
	```

	## Citation(s)
	https://huggingface.co/docs/transformers/perplexity