ESM joelniklaus/lextreme

Model Details

Model Description

ESM

  • Developed by: David Schulte
  • Model type: ESM
  • Base Model: bert-base-multilingual-uncased
  • Intermediate Task: joelniklaus/lextreme
  • ESM architecture: linear
  • Language(s) (NLP): [More Information Needed]
  • License: Apache-2.0 license

Training Details

Intermediate Task

  • Task ID: joelniklaus/lextreme
  • Subset [optional]: swiss_criticality_prediction_citation_considerations
  • Text Column: input
  • Label Column: label
  • Dataset Split: train
  • Sample size [optional]: 2523
  • Sample seed [optional]:

Training Procedure [optional]

Language Model Training Hyperparameters [optional]

  • Epochs: 3
  • Batch size: 32
  • Learning rate: 2e-05
  • Weight Decay: 0.01
  • Optimizer: AdamW

ESM Training Hyperparameters [optional]

  • Epochs: 10
  • Batch size: 32
  • Learning rate: 0.001
  • Weight Decay: 0.01
  • Optimizer: AdamW

Additional trainiung details [optional]

Model evaluation

Evaluation of fine-tuned language model [optional]

Evaluation of ESM [optional]

MSE:

Additional evaluation details [optional]

What are Embedding Space Maps?

Embedding Space Maps (ESMs) are neural networks that approximate the effect of fine-tuning a language model on a task. They can be used to quickly transform embeddings from a base model to approximate how a fine-tuned model would embed the the input text. ESMs can be used for intermediate task selection with the ESM-LogME workflow.

How can I use Embedding Space Maps for Intermediate Task Selection?

PyPI version

We release hf-dataset-selector, a Python package for intermediate task selection using Embedding Space Maps.

hf-dataset-selector fetches ESMs for a given language model and uses it to find the best dataset for applying intermediate training to the target task. ESMs are found by their tags on the Huggingface Hub.

from hfselect import Dataset, compute_task_ranking

# Load target dataset from the Hugging Face Hub
dataset = Dataset.from_hugging_face(
    name="stanfordnlp/imdb",
    split="train",
    text_col="text",
    label_col="label",
    is_regression=False,
    num_examples=1000,
    seed=42
)

# Fetch ESMs and rank tasks
task_ranking = compute_task_ranking(
    dataset=dataset,
    model_name="bert-base-multilingual-uncased"
)

# Display top 5 recommendations
print(task_ranking[:5])

For more information on how to use ESMs please have a look at the official Github repository.

Citation

If you are using this Embedding Space Maps, please cite our paper.

BibTeX:

@misc{schulte2024moreparameterefficientselectionintermediate,
      title={Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning}, 
      author={David Schulte and Felix Hamborg and Alan Akbik},
      year={2024},
      eprint={2410.15148},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.15148}, 
}

APA:

Schulte, D., Hamborg, F., & Akbik, A. (2024). Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. arXiv preprint arXiv:2410.15148.

Additional Information

Downloads last month
6
Safetensors
Model size
591k params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for davidschulte/ESM_joelniklaus__lextreme_swiss_criticality_prediction_citation_considerations

Finetuned
(1629)
this model

Dataset used to train davidschulte/ESM_joelniklaus__lextreme_swiss_criticality_prediction_citation_considerations