jonsaadfalcon commited on
Commit
4b33c8a
1 Parent(s): e291f68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -16,8 +16,9 @@ Check out our [GitHub](https://github.com/HazyResearch/m2/tree/main) for instruc
16
 
17
  You can load this model using Hugging Face `AutoModel`:
18
  ```python
19
- from transformers import AutoModelForMaskedLM
20
- model = AutoModelForMaskedLM.from_pretrained("hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1", trust_remote_code=True)
 
21
  ```
22
 
23
  This model uses the Hugging Face `bert-base-uncased tokenizer`:
@@ -30,11 +31,12 @@ tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
30
 
31
  This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:
32
  ```
33
- from transformers import AutoTokenizer, AutoModelForMaskedLM
34
 
35
  max_seq_length = 32768
36
  testing_string = "Every morning, I make a cup of coffee to start my day."
37
- model = AutoModelForMaskedLM.from_pretrained("hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1", trust_remote_code=True)
 
38
 
39
  tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", model_max_length=max_seq_length)
40
  input_ids = tokenizer([testing_string], return_tensors="pt", padding="max_length", return_token_type_ids=False, truncation=True, max_length=max_seq_length)
@@ -50,6 +52,7 @@ This model requires `trust_remote_code=True` to be passed to the `from_pretraine
50
  ```python
51
  mlm = AutoModelForMaskedLM.from_pretrained(
52
  "hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1",
 
53
  trust_remote_code=True,
54
  )
55
  ```
 
16
 
17
  You can load this model using Hugging Face `AutoModel`:
18
  ```python
19
+ from transformers import AutoModelForMaskedLM, BertConfig
20
+ config = BertConfig.from_pretrained("hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1")
21
+ model = AutoModelForMaskedLM.from_pretrained("hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1", config=config, trust_remote_code=True)
22
  ```
23
 
24
  This model uses the Hugging Face `bert-base-uncased tokenizer`:
 
31
 
32
  This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:
33
  ```
34
+ from transformers import AutoTokenizer, AutoModelForMaskedLM, BertConfig
35
 
36
  max_seq_length = 32768
37
  testing_string = "Every morning, I make a cup of coffee to start my day."
38
+ config = BertConfig.from_pretrained("hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1")
39
+ model = AutoModelForMaskedLM.from_pretrained("hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1", config=config, trust_remote_code=True)
40
 
41
  tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", model_max_length=max_seq_length)
42
  input_ids = tokenizer([testing_string], return_tensors="pt", padding="max_length", return_token_type_ids=False, truncation=True, max_length=max_seq_length)
 
52
  ```python
53
  mlm = AutoModelForMaskedLM.from_pretrained(
54
  "hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1",
55
+ config=config,
56
  trust_remote_code=True,
57
  )
58
  ```