File size: 3,339 Bytes
83eac59
 
 
 
 
 
 
 
 
 
 
 
 
 
10af5a1
 
 
 
83eac59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: llama3.1
language:
- en
inference: false
fine-tuning: false
tags:
- nvidia
- llama3.1
datasets:
- nvidia/HelpSteer2
base_model: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
pipeline_tag: text-generation
library_name: transformers
---

3bpw exl2 quant of: https://huggingface.co/huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated

---
# huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated

This is an uncensored version of [nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) created with abliteration (see [this article](https://huggingface.co/blog/mlabonne/abliteration) to know more about it).

Special thanks to [@FailSpy](https://huggingface.co/failspy) for the original code and technique. Please follow him if you're interested in abliterated models.


## Usage
You can use this model in your applications by loading it with Hugging Face's `transformers` library,   
If the desired result is not achieved, you can clear the conversation and try again:


```python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model_name = "huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize conversation context
initial_messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]
messages = initial_messages.copy()  # Copy the initial conversation context

# Enter conversation loop
while True:
    # Get user input
    user_input = input("User: ").strip()  # Strip leading and trailing spaces

    # If the user types '/exit', end the conversation
    if user_input.lower() == "/exit":
        print("Exiting chat.")
        break

    # If the user types '/clean', reset the conversation context
    if user_input.lower() == "/clean":
        messages = initial_messages.copy()  # Reset conversation context
        print("Chat history cleared. Starting a new conversation.")
        continue

    # If input is empty, prompt the user and continue
    if not user_input:
        print("Input cannot be empty. Please enter something.")
        continue

    # Add user input to the conversation
    messages.append({"role": "user", "content": user_input})

    # Build the chat template
    tokenized_message = tokenizer.apply_chat_template(
        messages,
        tokenize=True, 
        add_generation_prompt=True,
        return_tensors="pt", 
        return_dict=True
    )

    # Generate a response from the model
    response_token_ids = model.generate(
        tokenized_message['input_ids'].cuda(),
        attention_mask=tokenized_message['attention_mask'].cuda(),  
        max_new_tokens=4096, 
        pad_token_id = tokenizer.eos_token_id
    )

    # Extract model output, removing special tokens
    generated_tokens = response_token_ids[:, len(tokenized_message['input_ids'][0]):]
    generated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

    # Add the model's response to the conversation
    messages.append({"role": "assistant", "content": generated_text})

    # Print the model's response
    print(f"Response: {generated_text}")

```