File size: 2,939 Bytes

ea2c08d
1e4ac8b
 
 
 
 
 
 
 
 
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
 
 
 
 
ea2c08d
1e4ac8b
ea2c08d
6163030
ea2c08d
 
1e4ac8b
ea2c08d
 
1e4ac8b
 
 
 
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
 
 
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
 
ea2c08d
1e4ac8b
 
 
 
 
 
 
ea2c08d
1e4ac8b
ea2c08d
1e4ac8b
ea2c08d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e4ac8b

---
license: apache-2.0
language:
- en
pipeline_tag: summarization
widget:
- text: What is the peak phase of T-eV?
  example_title: Question Answering
tags:
- arxiv
---
#  Table of Contents

0. [TL;DR](#TL;DR)
1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Uses](#uses)
4. [Citation](#citation)

# TL;DR

This is a Phi-1_5 model trained on [camel-ai/physics](https://huggingface.co/datasets/camel-ai/physics). This model is for research purposes only and ***should not be used in production settings***.


## Model Description


- **Model type:** Language model
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Related Models:** [Phi-1_5](https://huggingface.co/microsoft/phi-1_5)

# Usage

Find below some example scripts on how to use the model in `transformers`:

## Using the Pytorch model

```python

from huggingface_hub import notebook_login
from datasets import load_dataset, Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = "ArtifactAI/phi-physics"

model = AutoModelForCausalLM.from_pretrained(base_model, trust_remote_code= True)
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

def generate(prompt):
  inputs = tokenizer(f'''Below is an instruction that describes a task. Write a response that appropriately completes the request If you are adding additional white spaces, stop writing".\n\n### Instruction:\n{prompt}.\n\n### Response:\n ''', return_tensors="pt", return_attention_mask=False)
  streamer = TextStreamer(tokenizer, skip_prompt= True)
  _ = model.generate(**inputs, streamer=streamer, max_new_tokens=500)
  
generate("What are the common techniques used in identifying a new species, and how can scientists accurately categorize it within the existing taxonomy system?")
```

## Training Data

The model was trained on [camel-ai/phi-physics](https://huggingface.co/datasets/camel-ai/physics), a dataset of question/answer pairs. 


## Training procedure


The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16

### Framework versions


- PEFT 0.6.2
## Training procedure


The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16

### Framework versions


- PEFT 0.6.2

# Citation

```
@misc{phi-math,
    title={phi-biology},
    author={Matthew Kenney},
    year={2023}
}
```