license: mit | |
language: | |
- en | |
pipeline_tag: feature-extraction | |
tags: | |
- sentiment-analysis | |
- text-classification | |
- generic | |
- sentiment-classification | |
datasets: | |
- Numind/C4_sentiment-analysis | |
## Model | |
The base version of [e5-v2](https://huggingface.co/intfloat/e5-base-v2) finetunned on an annotated subset of [C4](https://huggingface.co/datasets/Numind/C4_sentiment-analysis). This model provides generic embedding for sentiment analysis. Embeddings can be used out of the box or fine-tuned on specific datasets. | |
Blog post: https://www.numind.ai/blog/creating-task-specific-foundation-models-with-gpt-4 | |
## Usage | |
Below is an example to encode text and get embedding. | |
```python | |
import torch | |
from transformers import AutoTokenizer, AutoModel | |
model = AutoModel.from_pretrained("Numind/e5-base-sentiment_analysis") | |
tokenizer = AutoTokenizer.from_pretrained("Numind/e5-base-sentiment_analysis") | |
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') | |
model.to(device) | |
size = 256 | |
text = "This movie is amazing" | |
encoding = tokenizer( | |
text, | |
truncation=True, | |
padding='max_length', | |
max_length= size, | |
) | |
emb = model( | |
torch.reshape(torch.tensor(encoding.input_ids),(1,len(encoding.input_ids))).to(device),output_hidden_states=True | |
).hidden_states[-1].cpu().detach() | |
embText = torch.mean(emb,axis = 1) | |
``` |