Edit model card

This model is a generation model trained via semiparametric token-sequence co-supervision on top of Llama2-7B. The embedding model which constructs the nonparametric sequence embedding spaces is in here. The models are trained on information-seeking datasets provided by self-rag with co-supervision from next token prediction (NTP) and next sequence prediction (NSP). In the inference step, the model generates a response by retrieving relevant sequences. See full descriptions in our paper.

Usage

Here, we show an easy way to quickly download our model from HuggingFace. Make sure to install dependencies listed at requirements.txt. To run our full inference pipeline with embedding model, please use our code.

from transformers import AutoTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained(
    "kaist-ai/cosupervision-emb_seq-Llama2_7b",
    load_in_8bit=True if train_config.quantization else None,
    device_map="auto" if train_config.quantization else None,
)
Downloads last month
1
Safetensors
Model size
6.74B params
Tensor type
F32
·
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.