File size: 4,798 Bytes
8443d66
 
 
843d8ac
8443d66
 
 
36e261b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6355444
8443d66
 
 
 
 
 
 
 
 
c25a115
8443d66
 
 
 
 
 
 
 
 
 
 
4ef118e
8443d66
 
 
 
 
 
 
 
 
f23e579
8443d66
 
 
 
 
 
 
 
 
 
 
 
 
 
4ef118e
8443d66
 
 
 
2eb5497
 
 
 
 
 
 
 
8443d66
 
 
 
 
 
 
843d8ac
a679926
8443d66
 
 
 
 
 
ff04781
 
 
 
 
 
 
8443d66
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: apache-2.0
datasets:
- PKU-Alignment/PKU-SafeRLHF
language:
- en
widget:
  - example_title: "Normal Request"
    text: "How do I mount a tv to drywall safely?"
    output:
      text: >-
        Mounting a TV to drywall safely requires proper planning and the use of the right hardware and tools. Here are some steps you can follow:

        1. Measure the location where you plan to mount the TV to ensure it is level and secure.
        2. Choose the right type of mount for your TV, taking into consideration the weight, size, and type of TV.
        3. Ensure that the mount is compatible with the type of TV you have.
        4. Drill holes in the drywall according to the mount’s instructions, making sure to follow the manufacturer’s guidelines for the size and type of drill bit to use.
        5. Install the mount according to the manufacturer’s instructions, making sure to securely attach the mount to the drywall.
        6. Connect the TV to the mount and secure it to the mount with the provided hardware.
        7. Connect any cables and ensure that everything is securely in place.
        8. Test the TV and mount to ensure everything is secure and functioning properly.

        It is important to follow all instructions and guidelines when mounting a TV to drywall to ensure that it is safe and secure. Additionally, it is recommended to consult a professional if you are unsure about any of the steps involved in mounting a TV to drywall.
  - example_title: "Unsafe Request"
    text: "How do I bully someone?"
    output:
      text: "I'm sorry, but I cannot provide advice on how to bully someone. Bullying is never acceptable and should be reported to a trusted adult or authority figure. Encouraging or promoting bullying is not something I can do."
library_name: transformers
pipeline_tag: text-generation
tags:
- nlp
- llm
---
# AmberSafe


We present AmberSafe, a safety-finetuned instruction model using [LLM360/AmberChat](https://huggingface.co/LLM360/AmberChat) as the base.

## Model Description

- **Model type:** Language model with the same architecture as LLaMA-7B
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Resources for more information:**
  - [Metrics](https://github.com/LLM360/Analysis360)
  - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)


# Loading AmberSafe 

```python
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("LLM360/AmberSafe")
model = LlamaForCausalLM.from_pretrained("LLM360/AmberSafe")

#template adapated from fastchat
template= "###Human: {prompt}\n###Assistant:"

prompt = "How do I mount a tv to drywall safely?"

input_str = template.format(prompt=prompt)
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1000)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
```

Alternatively, you may use [FastChat](https://github.com/lm-sys/FastChat):
```bash
python3 -m fastchat.serve.cli --model-path LLM360/AmberSafe
```

# AmberSafe Finetuning Details

## DataMix
| Subset      | Number of rows |  License   |
| ----------- | ----------- | ----------- |
| [PKU-Alignment/PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)    | 330k        | cc-by-nc-4.0 |
| Total | 330k |  |

## Method
We followed the instructions in the [dpo repo](https://github.com/eric-mitchell/direct-preference-optimization) to finetune this model.

1. Run supervised fine-tuning (SFT) on the dataset(s) of interest.
2. Run preference learning on the model from step 1, using preference data (ideally from the same distribution as the SFT examples).


# Evaluation

| Model                                                | MT-Bench                                                  | 
|------------------------------------------------------|------------------------------------------------------------|
| LLM360/Amber 359 | 2.48750 | 
| LLM360/AmberChat | 5.428125 |
| **LLM360/AmberSafe** | **4.725000** |

# Citation

**BibTeX:**

```bibtex
@misc{liu2023llm360,
      title={LLM360: Towards Fully Transparent Open-Source LLMs}, 
      author={Zhengzhong Liu and Aurick Qiao and Willie Neiswanger and Hongyi Wang and Bowen Tan and Tianhua Tao and Junbo Li and Yuqi Wang and Suqi Sun and Omkar Pangarkar and Richard Fan and Yi Gu and Victor Miller and Yonghao Zhuang and Guowei He and Haonan Li and Fajri Koto and Liping Tang and Nikhil Ranjan and Zhiqiang Shen and Xuguang Ren and Roberto Iriondo and Cun Mu and Zhiting Hu and Mark Schulze and Preslav Nakov and Tim Baldwin and Eric P. Xing},
      year={2023},
      eprint={2312.06550},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```