cfli commited on
Commit
73ef392
1 Parent(s): 66f9124

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,19 +1,24 @@
1
  ---
2
- {}
 
 
 
 
 
3
  ---
4
- # LLARA-7B-BEIR
5
 
6
- This model is fine-tuned from LLaMA-2-7B using LoRA and the embedding size is 4096.
7
 
8
- ## Training Data
9
 
10
- The model is fine-tuned on the training split of [MS MARCO Passage Ranking](https://microsoft.github.io/msmarco/Datasets) datasets for 1 epoch. Please check our paper for details.
 
 
11
 
12
- ## Usage
13
 
14
- Below is an example to encode a query and a passage, and then compute their similarity using their embedding.
15
 
16
- ```python
17
  import torch
18
  from transformers import AutoModel, AutoTokenizer, LlamaModel
19
 
@@ -64,8 +69,8 @@ def get_passage_inputs(passages, tokenizer, max_length=512):
64
  )
65
 
66
  # Load the tokenizer and model
67
- tokenizer = AutoTokenizer.from_pretrained('cfli/LLARA-beir')
68
- model = AutoModel.from_pretrained('cfli/LLARA-beir')
69
 
70
  # Define query and passage inputs
71
  query = "What is llama?"
@@ -92,6 +97,27 @@ with torch.no_grad():
92
  score = query_embedding @ passage_embeddings.T
93
  print(score)
94
 
 
 
 
 
 
 
 
 
 
 
 
95
 
 
96
 
 
 
 
 
 
 
 
 
 
97
  ```
 
1
  ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ license: mit
8
  ---
 
9
 
10
+ For more details please refer to our github repo: https://github.com/FlagOpen/FlagEmbedding
11
 
12
+ # LLARA ([paper](https://arxiv.org/pdf/2312.15503))
13
 
14
+ In this project, we introduce LLaRA:
15
+ - EBAE: Embedding-Based Auto-Encoding.
16
+ - EBAR: Embedding-Based Auto-Regression.
17
 
 
18
 
19
+ ## Usage
20
 
21
+ ```
22
  import torch
23
  from transformers import AutoModel, AutoTokenizer, LlamaModel
24
 
 
69
  )
70
 
71
  # Load the tokenizer and model
72
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/LLARA-beir')
73
+ model = AutoModel.from_pretrained('BAAI/LLARA-beir')
74
 
75
  # Define query and passage inputs
76
  query = "What is llama?"
 
97
  score = query_embedding @ passage_embeddings.T
98
  print(score)
99
 
100
+ ```
101
+
102
+
103
+ ## Acknowledgement
104
+
105
+ Thanks to the authors of open-sourced datasets, including MSMARCO, BEIR, etc.
106
+ Thanks to the open-sourced libraries like [Pyserini](https://github.com/castorini/pyserini).
107
+
108
+
109
+
110
+ ## Citation
111
 
112
+ If you find this repository useful, please consider giving a star :star: and citation
113
 
114
+ ```
115
+ @misc{li2023making,
116
+ title={Making Large Language Models A Better Foundation For Dense Retrieval},
117
+ author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
118
+ year={2023},
119
+ eprint={2312.15503},
120
+ archivePrefix={arXiv},
121
+ primaryClass={cs.CL}
122
+ }
123
  ```
model-00001-of-00006.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:01dd7ea5dfa7418b6f2d6a29c50d7a632a74459e5445f6d13d04b22f23201f13
3
  size 4840658560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6fccd8125fbe08012de41b19f254477b4fb7653016d54f63a4cf05d6344a058
3
  size 4840658560
model-00002-of-00006.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bde0b61c7e53a84ddc7432a2049fb125191da2a7d5dbae7fe4a592f35e2224fc
3
  size 4857206856
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ab32cbdc54b39e8f04666421c5e6ad181b59954cf257a9d131d55ebb16e1268
3
  size 4857206856
model-00003-of-00006.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:75c9c8c1a7ffdc359e7be1ec0e40c7598d2a2167c1ea1380ad67b5409c80c4f2
3
  size 4857206904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ca1bfeacefd35b9dd1c801c45fa0ad809071faccdd99253e4cf5255cf38ee9b
3
  size 4857206904
model-00004-of-00006.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b99f7143e6bb96b6d56fbda293f97e4aa7554a13691949e5aa0d3ccdd422029f
3
  size 4857206904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b027c43b746337990b12d3f0f6f7b2c2dc0524cd632738eab22557bc0446be0
3
  size 4857206904
model-00005-of-00006.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:728bfd285dd8bf48458715de351982e1fa9449a8cf8111f2e9da05e567e3c346
3
  size 4857206904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54890415310f56758951b79a88be1592fba4bd5168053f379c59c4a6e7f1bd25
3
  size 4857206904
model-00006-of-00006.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d5034cc2eafecc0c004dad0880a7e7d75a61bd8ecdf2b1e4bfdc6bd1b8f9e8b7
3
  size 2684734256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f8fb6aa36b09ed23b828f92500c76dc784a22511661bd2200e76cf79d9c52a7
3
  size 2684734256