lewtun HF staff commited on
Commit
9763e46
1 Parent(s): 3dcc536

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -6
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: HuggingFaceH4/starcoder2-15b-ift
3
  tags:
4
  - alignment-handbook
5
  - generated_from_trainer
@@ -7,16 +7,92 @@ datasets:
7
  - HuggingFaceH4/ultrafeedback_binarized
8
  - HuggingFaceH4/orca_dpo_pairs
9
  model-index:
10
- - name: starcoder2-15b-dpo-v4.0
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # starcoder2-15b-dpo-v4.0
18
 
19
- This model is a fine-tuned version of [HuggingFaceH4/starcoder2-15b-ift](https://huggingface.co/HuggingFaceH4/starcoder2-15b-ift) on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.4347
22
  - Rewards/chosen: -0.9461
 
1
  ---
2
+ base_model: HuggingFaceH4/starchat2-15b-sft-v0.1
3
  tags:
4
  - alignment-handbook
5
  - generated_from_trainer
 
7
  - HuggingFaceH4/ultrafeedback_binarized
8
  - HuggingFaceH4/orca_dpo_pairs
9
  model-index:
10
+ - name: starchat2-15b-v0.1
11
  results: []
12
  ---
13
 
14
+ <img src="https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1/resolve/main/model_logo.png" alt="StarChat2 15B Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
15
 
16
+ # Model Card for StarChat2 15B
17
 
18
+ StarChat is a series of language models that are trained to act as helpful coding assistants. StarChat2 is the latest model in the series, and is a fine-tuned version of [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b) that was trained with SFT and DPO on a mix of synthetic datasets.
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+ - **Model type:** A 16B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
27
+ - **Language(s) (NLP):** Primarily English and 80+ programming languages.
28
+ - **License:** BigCode Open RAIL-M v1
29
+ - **Finetuned from model:** [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b)
30
+
31
+ ### Model Sources
32
+
33
+ <!-- Provide the basic links for the model. -->
34
+
35
+ - **Repository:** https://github.com/huggingface/alignment-handbook
36
+ - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
37
+
38
+
39
+ ## Intended uses & limitations
40
+
41
+ The model was fine-tuned on a blend of chat, code, math, and reasoning datasets. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground) to test its coding capabilities.
42
+
43
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
44
+
45
+ ```python
46
+ # pip install 'transformers @ git+https://github.com/huggingface/transformers.git@831bc25d8fdb85768402f772cf65cc3d7872b211'
47
+ # pip install accelerate
48
+
49
+ import torch
50
+ from transformers import pipeline
51
+
52
+ pipe = pipeline(
53
+ "text-generation",
54
+ model="HuggingFaceH4/starchat2-15b-v0.1",
55
+ device_map="auto",
56
+ torch_dtype=torch.bfloat16,
57
+ )
58
+ messages = [
59
+ {
60
+ "role": "system",
61
+ "content": "You are StarChat2, an expert programming assistant",
62
+ },
63
+ {"role": "user", "content": "Write a simple website in HTML. When a user clicks the button, it shows a random Chuck Norris joke."},
64
+ ]
65
+ outputs = pipe(
66
+ messages,
67
+ max_new_tokens=512,
68
+ do_sample=True,
69
+ temperature=0.7,
70
+ top_k=50,
71
+ top_p=0.95,
72
+ stop_sequence="<|im_end|>",
73
+ )
74
+ print(outputs[0]["generated_text"][-1]["content"])
75
+ ```
76
+
77
+ ## Bias, Risks, and Limitations
78
+
79
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
80
+
81
+ StarChat2 15B has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
82
+ Models trained primarily on code data will also have a more skewed demographic bias commensurate with the demographics of the GitHub community, for more on this see the [StarCoder2 dataset](https://huggingface.co/datasets/bigcode/the-stack-v2)
83
+
84
+ Since the base model was pretrained on a large corpus of code, it may produce code snippets that are syntactically valid but semantically incorrect.
85
+ For example, it may produce code that does not compile or that produces incorrect results.
86
+ It may also produce code that is vulnerable to security exploits.
87
+ We have observed the model also has a tendency to produce false URLs which should be carefully inspected before clicking.
88
+
89
+ StarChat2 15B was fine-tuned from the base model [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), please refer to its model card's [Limitations Section](https://huggingface.co/bigcode/starcoder2-15b#limitations) for relevant information.
90
+ In particular, the model was evaluated on some categories of gender biases, propensity for toxicity, and risk of suggesting code completions with known security flaws; these evaluations are reported in its [technical report](https://huggingface.co/papers/2402.19173).
91
+
92
+
93
+ ## Training details
94
+
95
+ This model is a fine-tuned version of [starchat2-15b-sft-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-sft-v0.1) on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets.
96
  It achieves the following results on the evaluation set:
97
  - Loss: 0.4347
98
  - Rewards/chosen: -0.9461