T5-Spanish-Efficient-TINY (NUEVA Versión Deep-Narrow en español - Marzo 2024)

T5-Efficient-TINY es una variación de Google's original T5 que sigue la arquitectura del modelo T5. Es una variación que ha sido entrenada por Javier Albarracín de Quantico AI. La versión original fue compartida en el paper Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers por Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler.

Esta versión del modelo ha sido entrenado desde cero usando un dataset en español. Esta versión NECESITA FINE-TUNE no ha sido entrenada en ninguna tarea. lo positivo del modelo es que está en español y puede servir para entrenar tareas simples. Por su relativa poca complejidad y su peso <29mb es ideal para uso en CPU.

Tiene su propio tokenizador español (solo letras minúsculas) con 5000 tokens de tamaño.

Detalles de arquitectura del modelo:

Este modelo - T5-spanish-efficient-tiny - es de tipo Tiny con variaciones en dimensión y tamaño de las capas feed forward. Tiene 17.94 milliones de parámetros y requiere 29 MB de memoria en full precision (fp32) o 15 MB de memoria en half precision (fp16 o bf16).

Este modelo en español ha sido creado con características más ligeras que el modelo Tiny original.

Modelo	nl (el/dl)	ff	dm	kv	nh	#Params
This	4/3	512	320	64	4	7M

Un resumen del modelo original T5 puede ser visto aquí:

Modelo	nl (el/dl)	ff	dm	kv	nh	#Params
Tiny	4/4	1024	256	32	4	16M
Mini	4/4	1536	384	32	8	31M
Small	6/6	2048	512	32	8	60M
Base	12/12	3072	768	64	12	220M
Large	24/24	4096	1024	64	16	738M
Xl	24/24	16384	1024	128	32	3B
XXl	24/24	65536	1024	128	128	11B

Las abreviaciones usadas:

Abreviación	Definición
nl	Number of transformer blocks (depth)
dm	Dimension of embedding vector (output vector of transformers block)
kv	Dimension of key/value projection matrix
nh	Number of attention heads
ff	Dimension of intermediate vector within transformer block (size of feed-forward projection matrix)
el	Number of transformer blocks in the encoder (encoder depth)
dl	Number of transformer blocks in the decoder (decoder depth)
sh	Signifies that attention heads are shared
skv	Signifies that key-values projection matrices are tied

If a model checkpoint has no specific, el or dl than both the number of encoder- and decoder layers correspond to nl.

Pre-Training

Ha sido pre entrenado con 2MM de registros random del dataset MSMARCO en idioma en español.

Fine-Tuning

Nota: Este modelo requiere fine tune para funcionar aquí algunos ejemplos de como hacerlo:

PyTorch:

Summarization
Question Answering
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

Tensorflow:

Summarization
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

JAX/Flax:

Summarization
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.