File size: 3,147 Bytes
7188134 e64c9e5 7188134 e64c9e5 7188134 6bcb1a7 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 b99505c e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 121c728 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 e64c9e5 7188134 c441942 7188134 5062560 7188134 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
library_name: peft
base_model: mistralai/Mistral-7B-v0.1
tags:
- axolotl
---
### Model Description
A model that can generate [Honeycomb Queries](https://www.honeycomb.io/blog/introducing-query-assistant).
_fine-tuned by [Hamel Husain](https://hamel.dev)_
## How to Get Started with the Model
Make sure you install all dependencies
```bash
pip install transformers==transformers==4.36.2 datasets==2.15.0 peft==0.6.0 accelerate==0.24.1 bitsandbytes==0.41.3.post2 safetensors==0.4.1 scipy==1.11.4 sentencepiece==0.1.99 protobuf==4.23.4 --upgrade
```
Next, load the dependencies.
```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_id='hamel/hc-mistral-qlora-6'
model = AutoPeftModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
```
Next, define a function that can help you with the prompt (alpaca style). **It is important that you follow this prompt exactly, because this is how the model was trained.**
```python
def prompt(nlq, cols):
return f"""[INST] <<SYS>>
Honeycomb AI suggests queries based on user input and candidate columns.
<</SYS>>
User Input: {nlq}
Candidate Columns: {cols}
[/INST]
"""
def prompt_tok(nlq, cols):
_p = prompt(nlq, cols)
input_ids = tokenizer(_p, return_tensors="pt", truncation=True).input_ids.cuda()
out_ids = model.generate(input_ids=input_ids, max_new_tokens=5000,
do_sample=False)
return tokenizer.batch_decode(out_ids.detach().cpu().numpy(),
skip_special_tokens=True)[0][len(_p):]
```
Next, make predictions
```python
nlq = "Exception count by exception and caller"
cols = ['error', 'exception.message', 'exception.type', 'exception.stacktrace', 'SampleRate', 'name', 'db.user', 'type', 'duration_ms', 'db.name', 'service.name', 'http.method', 'db.system', 'status_code', 'db.operation', 'library.name', 'process.pid', 'net.transport', 'messaging.system', 'rpc.system', 'http.target', 'db.statement', 'library.version', 'status_message', 'parent_name', 'aws.region', 'process.command', 'rpc.method', 'span.kind', 'serializer.name', 'net.peer.name', 'rpc.service', 'http.scheme', 'process.runtime.name', 'serializer.format', 'serializer.renderer', 'net.peer.port', 'process.runtime.version', 'http.status_code', 'telemetry.sdk.language', 'trace.parent_id', 'process.runtime.description', 'span.num_events', 'messaging.destination', 'net.peer.ip', 'trace.trace_id', 'telemetry.instrumentation_library', 'trace.span_id', 'span.num_links', 'meta.signal_type', 'http.route']
out = prompt_tok(nlq, cols)
print(out)
```
## Training Details
See [this wandb run](https://wandb.ai/hamelsmu/hc-axolotl-mistral/runs/et2e62s4/overview?workspace=user-hamelsmu)
### Training Data
~90k synthetically generated honeycomb queries. This data is located in [alpaca_synth_queries.jsonl](training_data/alpaca_synth_queries.jsonl).
### Training Procedure
Used [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl/tree/main), see [this config](configs/axolotl_config.yml).
|