ychenNLP commited on
Commit
aa9e512
β€’
1 Parent(s): 15d2ec1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -13,4 +13,24 @@ Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned
13
 
14
  - πŸ“– Paper: [Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction](https://arxiv.org/abs/2305.13582)
15
  - πŸ€— Model: [GoLLIE-7B-TF](https://huggingface.co/ychenNLP/GoLLIE-7B-TF)
16
- - πŸš€ Example Jupyter Notebooks: [GoLLIE-TF Notebooks](notebooks/tf.ipynb)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  - πŸ“– Paper: [Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction](https://arxiv.org/abs/2305.13582)
15
  - πŸ€— Model: [GoLLIE-7B-TF](https://huggingface.co/ychenNLP/GoLLIE-7B-TF)
16
+ - πŸš€ Example Jupyter Notebooks: [GoLLIE-TF Notebooks](notebooks/tf.ipynb)
17
+
18
+
19
+ **Important**: This is based on GoLLIE README (Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface.
20
+ You must use the flag `trust_remote_code=True` or you will get inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE
21
+ pre-trained models on a CPU is not supported. We plan to address this in future releases. First, install flash attention 2:)
22
+ ```bash
23
+ pip install flash-attn --no-build-isolation
24
+ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
25
+ ```
26
+
27
+ Then you can load the model using
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoTokenizer, AutoModelForCausalLM
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained("HiTZ/GoLLIE-7B")
34
+ model = AutoModelForCausalLM.from_pretrained("HiTZ/GoLLIE-7B", trust_remote_code=True, torch_dtype=torch.bfloat16)
35
+ model.to("cuda")
36
+ ```