Update README.md
Browse files
README.md
CHANGED
@@ -17,18 +17,25 @@ CryptoBERT was trained with a max sequence length of 128. Technically, it can ha
|
|
17 |
# Classification Example
|
18 |
```python
|
19 |
from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
|
20 |
-
from datasets import load_dataset
|
21 |
-
dataset_name = "ElKulako/stocktwits-crypto"
|
22 |
-
dataset = load_dataset(dataset_name)
|
23 |
model_name = "ElKulako/cryptobert"
|
24 |
-
|
25 |
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 3)
|
26 |
-
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer,
|
|
|
|
|
|
|
|
|
|
|
27 |
preds = pipe(df_posts)
|
|
|
28 |
|
29 |
|
30 |
```
|
31 |
|
|
|
|
|
|
|
|
|
32 |
## Training Corpus
|
33 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
34 |
|
|
|
17 |
# Classification Example
|
18 |
```python
|
19 |
from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
|
|
|
|
|
|
|
20 |
model_name = "ElKulako/cryptobert"
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
|
22 |
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 3)
|
23 |
+
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, max_length=64, truncation=True, padding = 'max_length')
|
24 |
+
# post_1 & post_3 = bullish, post_2 = bearish
|
25 |
+
post_1 = " see y'all tomorrow and can't wait to see ada in the morning, i wonder what price it is going to be at. 😎🐂🤠💯😴, bitcoin is looking good go for it and flash by that 45k. "
|
26 |
+
post_2 = " alright racers, it’s a race to the bottom! good luck today and remember there are no losers (minus those who invested in currency nobody really uses) take your marks... are you ready? go!!"
|
27 |
+
post_3 = " i'm never selling. the whole market can bottom out. i'll continue to hold this dumpster fire until the day i die if i need to."
|
28 |
+
df_posts = [post_1, post_2, post_3]
|
29 |
preds = pipe(df_posts)
|
30 |
+
print(preds)
|
31 |
|
32 |
|
33 |
```
|
34 |
|
35 |
+
```
|
36 |
+
[{'label': 'Bullish', 'score': 0.8734585642814636}, {'label': 'Bearish', 'score': 0.9889495372772217}, {'label': 'Bullish', 'score': 0.6595883965492249}]
|
37 |
+
```
|
38 |
+
|
39 |
## Training Corpus
|
40 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
41 |
|