File size: 4,119 Bytes
e564c77
8447feb
 
 
 
e564c77
8447feb
 
e564c77
 
8447feb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
base_model: bert-base-multilingual-uncased
datasets:
- nyu-mll/multi_nli
license: apache-2.0
tags:
- embedding_space_map
- BaseLM:bert-base-multilingual-uncased
---

# ESM nyu-mll/multi_nli

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

ESM

- **Developed by:** David Schulte
- **Model type:** ESM
- **Base Model:** bert-base-multilingual-uncased
- **Intermediate Task:** nyu-mll/multi_nli
- **ESM architecture:** linear
- **Language(s) (NLP):** [More Information Needed]
- **License:** Apache-2.0 license

## Training Details

### Intermediate Task
- **Task ID:** nyu-mll/multi_nli
- **Subset [optional]:** default
- **Text Column:** ['premise', 'hypothesis']
- **Label Column:** label
- **Dataset Split:**  train
- **Sample size [optional]:** 10000
- **Sample seed [optional]:** 42

### Training Procedure [optional]

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Language Model Training Hyperparameters [optional]
- **Epochs:** 3
- **Batch size:** 32
- **Learning rate:** 2e-05
- **Weight Decay:** 0.01
- **Optimizer**: AdamW

### ESM Training Hyperparameters [optional]
- **Epochs:** 10
- **Batch size:** 32
- **Learning rate:** 0.001
- **Weight Decay:** 0.01
- **Optimizer**: AdamW


### Additional trainiung details [optional]


## Model evaluation

### Evaluation of fine-tuned language model [optional]


### Evaluation of ESM [optional]
MSE: 

### Additional evaluation details [optional]



## What are Embedding Space Maps?

<!-- This section describes the evaluation protocols and provides the results. -->
Embedding Space Maps (ESMs) are neural networks that approximate the effect of fine-tuning a language model on a task. They can be used to quickly transform embeddings from a base model to approximate how a fine-tuned model would embed the the input text.
ESMs can be used for intermediate task selection with the ESM-LogME workflow.

## How can I use Embedding Space Maps for Intermediate Task Selection?
[![PyPI version](https://img.shields.io/pypi/v/hf-dataset-selector.svg)](https://pypi.org/project/hf-dataset-selector)

We release **hf-dataset-selector**, a Python package for intermediate task selection using Embedding Space Maps.

**hf-dataset-selector** fetches ESMs for a given language model and uses it to find the best dataset for applying intermediate training to the target task. ESMs are found by their tags on the Huggingface Hub.

```python
from hfselect import Dataset, compute_task_ranking

# Load target dataset from the Hugging Face Hub
dataset = Dataset.from_hugging_face(
    name="stanfordnlp/imdb",
    split="train",
    text_col="text",
    label_col="label",
    is_regression=False,
    num_examples=1000,
    seed=42
)

# Fetch ESMs and rank tasks
task_ranking = compute_task_ranking(
    dataset=dataset,
    model_name="bert-base-multilingual-uncased"
)

# Display top 5 recommendations
print(task_ranking[:5])
```

For more information on how to use ESMs please have a look at the [official Github repository](https://github.com/davidschulte/hf-dataset-selector).

## Citation


<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you are using this Embedding Space Maps, please cite our [paper](https://arxiv.org/abs/2410.15148).

**BibTeX:**


```
@misc{schulte2024moreparameterefficientselectionintermediate,
      title={Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning}, 
      author={David Schulte and Felix Hamborg and Alan Akbik},
      year={2024},
      eprint={2410.15148},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.15148}, 
}
```


**APA:**

```
Schulte, D., Hamborg, F., & Akbik, A. (2024). Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. arXiv preprint arXiv:2410.15148.
```

## Additional Information