leaderboard-pr-bot's picture
Adding Evaluation Results
53cda72 verified
|
raw
history blame
6.43 kB
---
language:
- en
model-index:
- name: Dans-CreepingSenseOfDoom
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 53.33
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PocketDoc/Dans-CreepingSenseOfDoom
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 78.9
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PocketDoc/Dans-CreepingSenseOfDoom
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 48.09
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PocketDoc/Dans-CreepingSenseOfDoom
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 37.84
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PocketDoc/Dans-CreepingSenseOfDoom
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 73.32
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PocketDoc/Dans-CreepingSenseOfDoom
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 0.0
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PocketDoc/Dans-CreepingSenseOfDoom
name: Open LLM Leaderboard
---
### What is the model for?
This model is proficient in crafting text-based adventure games. It can both concise replies and more expansive, novel-like descriptions. The ability to alternate between these two response styles can be triggered by a distinct system message.
### What's in the sausage?
This model was trained on [Holodeck-1](https://huggingface.co/KoboldAI/LLAMA2-13B-Holodeck-1) using a deduped version of the skein text adventure dataset augmented with system messages using the 'Metharme' prompting format.
### PROMPT FORMAT:
Consistent with the Pygmalion Metharme format which is shown below.
```
<|system|>{system message here}<|user|>{user action here}<|model|>{model response}
<|system|>{system message here}<|model|>{model response}
<|system|>{system message here}<|user|>{user action here}<|model|>{model response}<|user|>{user action here}<|model|>{model response}
```
### EXAMPLES:
##### For shorter responses:
```
<|system|>Mode: Adventure
Theme: Science Fiction, cats, money, aliens, space, stars, siblings, future, trade
Tense: Second person present
Extra: Short response length<|user|>you look around<|model|>{CURSOR HERE}
```
```
<|system|>You are a dungeon master of sorts, guiding the reader through a story based on the following themes: Lovecraftian, Horror, city, research. Do not be afraid to get creative with your responses or to tell them they can't do something when it doesnt make sense for the situation. Narrate their actions and observations as they occur and drive the story forward.<|user|>you look around<|model|>{CURSOR HERE}
```
##### For longer novel like responses:
```
<|system|>You're tasked with creating an interactive story around the genres of historical, historical, RPG, serious. Guide the user through this tale, describing their actions and surroundings using second person present tense. Lengthy and descriptive responses will enhance the experience.<|user|>you look around<|model|>{CURSOR HERE}
```
##### With a model message first:
```
<|system|>Mode: Story
Theme: fantasy, female protagonist, grimdark
Perspective and Tense: Second person present
Directions: Write something to hook the user into the story then narrate their actions and observations as they occur while driving the story forward.<|model|>{CURSOR HERE}
```
### Some quick and dirty training details:
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="150" height="24"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
- Sequence length: 4096
- \# of epochs: 3
- Training time: 8 hours
- Hardware: 1x RTX 3090
- Training type: QLoRA
- PEFT R/A: 32/32
### Credits:
#### Holodeck-1:
Thank you to Mr. Seeker and the Kobold AI team for the wonderful model Holodeck-1
[Holodeck-1 Huggingface page](https://huggingface.co/KoboldAI/LLAMA2-13B-Holodeck-1)
#### Skein Text Adventure Data:
Thank you to the [Kobold AI](https://huggingface.co/KoboldAI) community for curating the Skein dataset, which is pivotal to this model's capabilities.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_PocketDoc__Dans-CreepingSenseOfDoom)
| Metric |Value|
|---------------------------------|----:|
|Avg. |48.58|
|AI2 Reasoning Challenge (25-Shot)|53.33|
|HellaSwag (10-Shot) |78.90|
|MMLU (5-Shot) |48.09|
|TruthfulQA (0-shot) |37.84|
|Winogrande (5-shot) |73.32|
|GSM8k (5-shot) | 0.00|