huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned

This is a pruned version of the huihui-ai/Moonlight-16B-A3B-Instruct-abliterated, reduced from 64 experts to 32 experts. The pruned model is mainly used for code generation.

This is a test validation to see if we can prune the model according to professional requirements and still maintain acceptable performance. The model size has been reduced by about half, and no distortion has occurred.

This allows the model to be pruned according to one's needs.

This pruned model has a total parameter is equivalent to 8B.

This model has the same architecture as deepseek-ai/DeepSeek-V3 model, and we will try the pruned version of the deepseek-ai/DeepSeek-V3 model.

Use with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
NEW_MODEL_ID = "huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned"
model = AutoModelForCausalLM.from_pretrained(
    NEW_MODEL_ID, 
    device_map="auto", 
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

tokenizer.pad_token_id = tokenizer.eos_token_id

# Initialize conversation context
initial_messages = [
    {"role": "system", "content": "You are a helpful assistant provided by Moonshot-AI."}
]
messages = initial_messages.copy()  # Copy the initial conversation context

# Enter conversation loop
while True:
    # Get user input
    user_input = input("User: ").strip()  # Strip leading and trailing spaces

    # If the user types '/exit', end the conversation
    if user_input.lower() == "/exit":
        print("Exiting chat.")
        break

    # If the user types '/clean', reset the conversation context
    if user_input.lower() == "/clear":
        messages = initial_messages.copy()  # Reset conversation context
        print("Chat history cleared. Starting a new conversation.")
        continue

    # If input is empty, prompt the user and continue
    if not user_input:
        print("Input cannot be empty. Please enter something.")
        continue

    # Add user input to the conversation
    messages.append({"role": "user", "content": user_input})

    tokenized_message = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True)
    response_token_ids = model.generate(tokenized_message['input_ids'].to("cuda:0"), use_cache=False, pad_token_id=tokenizer.pad_token_id, max_new_tokens=8192)
    generated_tokens =response_token_ids[:, len(tokenized_message['input_ids'][0]):]
    response = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

    # Add the model's response to the conversation
    messages.append({"role": "assistant", "content": response})

    # Print the model's response
    print(f"Response: {response}")

Donation

If you like it, please click 'like' and follow us for more updates.
You can follow x.com/support_huihui to get the latest model information from huihui.ai.

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
Downloads last month
4
Safetensors
Model size
8.76B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned

Collection including huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned