Text Generation
Transformers
PyTorch
TensorBoard
English
llama
llama-2
code
Eval Results
Inference Endpoints
text-generation-inference
Edit model card

speechless-tora-code-7b-v1.0

Code: https://github.com/uukuguy/speechless

Use the following dataset to fine-tune llm_agents/tora-code-7b-v1.0 in order to improve the model's reasoning and planning abilities.

Total 201,981 samples.

  • jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 23,462 samples.
  • Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 74,440 samples.
  • garage-bAInd/Open-Platypus: 100%, 24,926 samples.
  • WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,185 samples
  • TokenBender/python_eval_instruct_51k: “python” in output .40,309 samples
  • Spider: 8,659 samples

How to Prompt the Model

This model accepts the Alpaca instruction format.

For example:

You are an intelligent programming assistant.

### Instruction:
Implement a linked list in C++

### Response:

HumanEval

Metric Value
humaneval-python 51.829

Big Code Models Leaderboard

CodeLlama-34B-Python: 53.29

CodeLlama-34B-Instruct: 50.79

CodeLlama-13B-Instruct: 50.6

CodeLlama-34B: 45.11

CodeLlama-13B-Python: 42.89

CodeLlama-13B: 35.07

LM-Evaluation-Harness

Open LLM Leaderboard

Metric Value
ARC 42.66
HellaSwag 65.16
MMLU 38.56
TruthfulQA 42.06
Average 47.11

Parameters

lr 2e-4
lr_scheduler_type cosine
weight_decay 0.0
optim paged_adamw_8bit
flash_attention True
rerope False
max_new_tokens 4096
num_train_epochs 2
bits 4
lora_r 64
lora_alpha 16
lora_dropout 0.05
double_quant True
quant_type nf4
dataset_format airoboros
mini_batch_size 2
grandient_accumulation_steps 32
bf16 True

A800-80G x 2

epoch 2.0
etrain_loss 0.5891
etrain_runtime 19:24:49.43
etrain_samples_per_second 5.664
etrain_steps_per_second 0.044
eeval_loss 0.5872
eeval_runtime 0:00:15.59
eeval_samples_per_second 12.822
eeval_steps_per_second 6.411

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 40.1
ARC (25-shot) 42.66
HellaSwag (10-shot) 65.16
MMLU (5-shot) 38.56
TruthfulQA (0-shot) 42.06
Winogrande (5-shot) 62.9
GSM8K (5-shot) 0.91
DROP (3-shot) 28.48
Downloads last month
1,849
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Datasets used to train uukuguy/speechless-tora-code-7b-v1.0

Collection including uukuguy/speechless-tora-code-7b-v1.0

Evaluation results