moe-x33 / README.md
senseable's picture
Adding Evaluation Results (#3)
e34d4b0 verified
|
raw
history blame
5.89 kB
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - code
  - moe
datasets:
  - andersonbcdefg/synthetic_retrieval_tasks
  - ise-uiuc/Magicoder-Evol-Instruct-110K
metrics:
  - code_eval
model-index:
  - name: moe-x33
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 26.19
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 26.44
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 24.93
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 51.14
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 50.99
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
          name: Open LLM Leaderboard

33x Coding Model

33x-coder is a powerful Llama based model available on Hugging Face, designed to assist and augment coding tasks. Leveraging the capabilities of advanced language models, 33x-coder specializes in understanding and generating code. This model is trained on a diverse range of programming languages and coding scenarios, making it a versatile tool for developers looking to streamline their coding process. Whether you're debugging, seeking coding advice, or generating entire scripts, 33x-coder can provide relevant, syntactically correct code snippets and comprehensive programming guidance. Its intuitive understanding of coding languages and constructs makes it an invaluable asset for any coding project, helping to reduce development time and improve code quality.

Importing necessary libraries from transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

Initialize the tokenizer and model

tokenizer = AutoTokenizer.from_pretrained("senseable/33x-coder")
model = AutoModelForCausalLM.from_pretrained("senseable/33x-coder").cuda()

User's request for a quick sort algorithm in Python

messages = [
    {'role': 'user', 'content': "Write a Python function to check if a number is prime."}
]

Preparing the input for the model by encoding the messages and sending them to the same device as the model

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

Generating responses from the model with specific parameters for text generation

outputs = model.generate(
    inputs, 
    max_new_tokens=512,      # Maximum number of new tokens to generate
    do_sample=False,         # Disable random sampling to get the most likely next token
    top_k=50,                # The number of highest probability vocabulary tokens to keep for top-k-filtering
    top_p=0.95,              # Nucleus sampling: keeps the top p probability mass worth of tokens
    num_return_sequences=1,  # The number of independently computed returned sequences for each element in the batch
    eos_token_id=32021,      # End of sequence token id
    add_generation_prompt=True
)

Decoding and printing the generated response

start_index = len(inputs[0])
generated_output_tokens = outputs[0][start_index:]
decoded_output = tokenizer.decode(generated_output_tokens, skip_special_tokens=True)
print("Generated Code:\n", decoded_output)

license: apache-2.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 29.95
AI2 Reasoning Challenge (25-Shot) 26.19
HellaSwag (10-Shot) 26.44
MMLU (5-Shot) 24.93
TruthfulQA (0-shot) 51.14
Winogrande (5-shot) 50.99
GSM8k (5-shot) 0.00