File size: 2,893 Bytes
37bc639
e873bd6
 
37bc639
 
1d390db
37bc639
 
 
1d390db
37bc639
1d390db
37bc639
e873bd6
37bc639
1d390db
37bc639
 
e873bd6
37bc639
e873bd6
1d390db
e873bd6
 
37bc639
 
e873bd6
 
37bc639
1d390db
 
37bc639
e873bd6
 
37bc639
e873bd6
 
 
37bc639
e873bd6
 
37bc639
e873bd6
 
 
 
 
 
 
 
37bc639
e873bd6
37bc639
e873bd6
37bc639
1d390db
e873bd6
 
 
 
 
 
 
 
 
 
 
37bc639
 
 
e873bd6
 
37bc639
e873bd6
1d390db
e873bd6
37bc639
e873bd6
37bc639
e873bd6
37bc639
e873bd6
37bc639
 
e873bd6
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: apache-2.0
inference: false
---

# SLIM-QA-GEN-PHI-3  

<!-- Provide a quick summary of what the model is/does. -->

**slim-qa-gen-phi-3** implements a specialized function-calling question and answer generation from a context passage, with output in the form of a python dictionary, e.g.,  

&nbsp;&nbsp;&nbsp;&nbsp;`{'question': ['What were earnings per share in the most recent quarter?'], 'answer': ['$2.39'] }  

This model is finetuned on top of phi-3-mini-4k-instruct base.  

For fast inference use, we would recommend the 'quantized tool' version, e.g.,  [**'slim-qa-gen-phi-3-tool'**](https://huggingface.co/llmware/slim-qa-gen-phi-3-tool).  


## Prompt format:

`function = "generate"`  
`params = "{'question, answer', 'boolean', or 'multiple choice'}"`  
`prompt = "<human> " + {text} + "\n" + `  
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;`"<{function}> " + {params} + "</{function}>" + "\n<bot>:"`  


<details>
<summary>Transformers Script </summary>

    model = AutoModelForCausalLM.from_pretrained("llmware/slim-qa-gen-phi-3")
    tokenizer = AutoTokenizer.from_pretrained("llmware/slim-qa-gen-phi-3")

    function = "generate"
    params = "boolean"

    text = "Tesla stock declined yesterday 8% in premarket trading after a poorly-received event in San Francisco yesterday, in which the company indicated a likely shortfall in revenue."  
    
    prompt = "<human>: " + text + "\n" + f"<{function}> {params} </{function}>\n<bot>:"

    inputs = tokenizer(prompt, return_tensors="pt")
    start_of_input = len(inputs.input_ids[0])

    outputs = model.generate(
        inputs.input_ids.to('cpu'),
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
        do_sample=True,
        temperature=0.7,
        max_new_tokens=200
    )

    output_only = tokenizer.decode(outputs[0][start_of_input:], skip_special_tokens=True)

    print("output only: ", output_only)  

    [OUTPUT]:  {'llm_response': {'question': ['Did Telsa stock decline more than 5% yesterday?'], 'answer':['yes'] } }  
    
    # here's the fun part
    try:
        output_only = ast.literal_eval(llm_string_output)
        print("success - converted to python dictionary automatically")
    except:
        print("fail - could not convert to python dictionary automatically - ", llm_string_output)
   
   </details>  
 
<details>  



    
<summary>Using as Function Call in LLMWare</summary>

    from llmware.models import ModelCatalog
    slim_model = ModelCatalog().load_model("llmware/slim-qa-gen-phi-3", sample=True, temperature=0.5)  
    response = slim_model.function_call(text,params=["boolean"], function="generate")  

    print("llmware - llm_response: ", response)  

</details>  

    
## Model Card Contact

Darren Oberst & llmware team  

[Join us on Discord](https://discord.gg/MhZn5Nc39h)