stojchet
/

cae853d7940781dc7e9d9554f584df5f

PEFT

Safetensors

trl

sft

Generated from Trainer

Model card Files Files and versions Community

stojchet commited on Jul 4

Commit

28f0dfa

•

1 Parent(s): 243d73d

End of training

Browse files

Files changed (1) hide show

README.md +296 -196

README.md CHANGED Viewed

@@ -1,199 +1,299 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+datasets:
+- generator
+library_name: peft
+license: other
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: cae853d7940781dc7e9d9554f584df5f
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/stojchets/huggingface/runs/cae853d7940781dc7e9d9554f584df5f)
+# cae853d7940781dc7e9d9554f584df5f
+This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.1733
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1.41e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.2662        | 0.0128 | 1    | 1.2383          |
+| 1.2249        | 0.0256 | 2    | 1.2367          |
+| 1.2567        | 0.0384 | 3    | 1.2352          |
+| 1.155         | 0.0512 | 4    | 1.2338          |
+| 1.2201        | 0.064  | 5    | 1.2324          |
+| 1.2077        | 0.0768 | 6    | 1.2310          |
+| 1.2095        | 0.0896 | 7    | 1.2296          |
+| 1.2579        | 0.1024 | 8    | 1.2283          |
+| 1.2189        | 0.1152 | 9    | 1.2271          |
+| 1.2382        | 0.128  | 10   | 1.2258          |
+| 1.2605        | 0.1408 | 11   | 1.2248          |
+| 1.1883        | 0.1536 | 12   | 1.2239          |
+| 1.1614        | 0.1664 | 13   | 1.2230          |
+| 1.2769        | 0.1792 | 14   | 1.2220          |
+| 1.2099        | 0.192  | 15   | 1.2211          |
+| 1.2414        | 0.2048 | 16   | 1.2201          |
+| 1.2291        | 0.2176 | 17   | 1.2192          |
+| 1.2381        | 0.2304 | 18   | 1.2184          |
+| 1.1776        | 0.2432 | 19   | 1.2175          |
+| 1.1788        | 0.256  | 20   | 1.2167          |
+| 1.2061        | 0.2688 | 21   | 1.2159          |
+| 1.1856        | 0.2816 | 22   | 1.2150          |
+| 1.2252        | 0.2944 | 23   | 1.2142          |
+| 1.2646        | 0.3072 | 24   | 1.2134          |
+| 1.1888        | 0.32   | 25   | 1.2126          |
+| 1.228         | 0.3328 | 26   | 1.2118          |
+| 1.1969        | 0.3456 | 27   | 1.2111          |
+| 1.1779        | 0.3584 | 28   | 1.2103          |
+| 1.1726        | 0.3712 | 29   | 1.2096          |
+| 1.1582        | 0.384  | 30   | 1.2089          |
+| 1.1643        | 0.3968 | 31   | 1.2083          |
+| 1.1878        | 0.4096 | 32   | 1.2076          |
+| 1.2315        | 0.4224 | 33   | 1.2070          |
+| 1.2022        | 0.4352 | 34   | 1.2063          |
+| 1.1669        | 0.448  | 35   | 1.2057          |
+| 1.1609        | 0.4608 | 36   | 1.2051          |
+| 1.1888        | 0.4736 | 37   | 1.2045          |
+| 1.2044        | 0.4864 | 38   | 1.2039          |
+| 1.2389        | 0.4992 | 39   | 1.2033          |
+| 1.1755        | 0.512  | 40   | 1.2027          |
+| 1.1997        | 0.5248 | 41   | 1.2021          |
+| 1.1997        | 0.5376 | 42   | 1.2015          |
+| 1.1511        | 0.5504 | 43   | 1.2009          |
+| 1.1689        | 0.5632 | 44   | 1.2004          |
+| 1.1654        | 0.576  | 45   | 1.1998          |
+| 1.2018        | 0.5888 | 46   | 1.1993          |
+| 1.1503        | 0.6016 | 47   | 1.1988          |
+| 1.1835        | 0.6144 | 48   | 1.1983          |
+| 1.1831        | 0.6272 | 49   | 1.1977          |
+| 1.1629        | 0.64   | 50   | 1.1972          |
+| 1.2002        | 0.6528 | 51   | 1.1967          |
+| 1.1467        | 0.6656 | 52   | 1.1963          |
+| 1.193         | 0.6784 | 53   | 1.1959          |
+| 1.1652        | 0.6912 | 54   | 1.1955          |
+| 1.1446        | 0.704  | 55   | 1.1950          |
+| 1.1657        | 0.7168 | 56   | 1.1946          |
+| 1.1865        | 0.7296 | 57   | 1.1941          |
+| 1.1803        | 0.7424 | 58   | 1.1936          |
+| 1.1562        | 0.7552 | 59   | 1.1931          |
+| 1.1881        | 0.768  | 60   | 1.1926          |
+| 1.2279        | 0.7808 | 61   | 1.1921          |
+| 1.2158        | 0.7936 | 62   | 1.1915          |
+| 1.1586        | 0.8064 | 63   | 1.1910          |
+| 1.2019        | 0.8192 | 64   | 1.1906          |
+| 1.155         | 0.832  | 65   | 1.1901          |
+| 1.1142        | 0.8448 | 66   | 1.1897          |
+| 1.2389        | 0.8576 | 67   | 1.1894          |
+| 1.1259        | 0.8704 | 68   | 1.1889          |
+| 1.1568        | 0.8832 | 69   | 1.1886          |
+| 1.1306        | 0.896  | 70   | 1.1882          |
+| 1.1814        | 0.9088 | 71   | 1.1877          |
+| 1.2137        | 0.9216 | 72   | 1.1873          |
+| 1.1884        | 0.9344 | 73   | 1.1868          |
+| 1.1446        | 0.9472 | 74   | 1.1863          |
+| 1.1979        | 0.96   | 75   | 1.1858          |
+| 1.2137        | 0.9728 | 76   | 1.1854          |
+| 1.1541        | 0.9856 | 77   | 1.1851          |
+| 1.1775        | 0.9984 | 78   | 1.1847          |
+| 1.1489        | 1.0112 | 79   | 1.1844          |
+| 1.131         | 1.024  | 80   | 1.1841          |
+| 1.1427        | 1.0368 | 81   | 1.1837          |
+| 1.2006        | 1.0496 | 82   | 1.1833          |
+| 1.1473        | 1.0624 | 83   | 1.1830          |
+| 1.1315        | 1.0752 | 84   | 1.1826          |
+| 1.1497        | 1.088  | 85   | 1.1823          |
+| 1.1845        | 1.1008 | 86   | 1.1820          |
+| 1.1845        | 1.1136 | 87   | 1.1817          |
+| 1.1167        | 1.1264 | 88   | 1.1814          |
+| 1.1639        | 1.1392 | 89   | 1.1811          |
+| 1.1952        | 1.152  | 90   | 1.1808          |
+| 1.1327        | 1.1648 | 91   | 1.1805          |
+| 1.0937        | 1.1776 | 92   | 1.1802          |
+| 1.1549        | 1.1904 | 93   | 1.1799          |
+| 1.1704        | 1.2032 | 94   | 1.1797          |
+| 1.1479        | 1.216  | 95   | 1.1794          |
+| 1.2221        | 1.2288 | 96   | 1.1792          |
+| 1.1193        | 1.2416 | 97   | 1.1789          |
+| 1.1259        | 1.2544 | 98   | 1.1786          |
+| 1.1816        | 1.2672 | 99   | 1.1784          |
+| 1.1566        | 1.28   | 100  | 1.1782          |
+| 1.1093        | 1.2928 | 101  | 1.1780          |
+| 1.1985        | 1.3056 | 102  | 1.1779          |
+| 1.1553        | 1.3184 | 103  | 1.1778          |
+| 1.1772        | 1.3312 | 104  | 1.1776          |
+| 1.1154        | 1.3440 | 105  | 1.1775          |
+| 1.1666        | 1.3568 | 106  | 1.1774          |
+| 1.1494        | 1.3696 | 107  | 1.1772          |
+| 1.1508        | 1.3824 | 108  | 1.1771          |
+| 1.201         | 1.3952 | 109  | 1.1770          |
+| 1.1919        | 1.408  | 110  | 1.1769          |
+| 1.1885        | 1.4208 | 111  | 1.1768          |
+| 1.2055        | 1.4336 | 112  | 1.1767          |
+| 1.1522        | 1.4464 | 113  | 1.1766          |
+| 1.1565        | 1.4592 | 114  | 1.1765          |
+| 1.1551        | 1.472  | 115  | 1.1764          |
+| 1.17          | 1.4848 | 116  | 1.1763          |
+| 1.1631        | 1.4976 | 117  | 1.1762          |
+| 1.1396        | 1.5104 | 118  | 1.1761          |
+| 1.1355        | 1.5232 | 119  | 1.1760          |
+| 1.1606        | 1.536  | 120  | 1.1760          |
+| 1.1594        | 1.5488 | 121  | 1.1759          |
+| 1.1783        | 1.5616 | 122  | 1.1758          |
+| 1.1592        | 1.5744 | 123  | 1.1758          |
+| 1.1159        | 1.5872 | 124  | 1.1757          |
+| 1.1807        | 1.6    | 125  | 1.1756          |
+| 1.2294        | 1.6128 | 126  | 1.1756          |
+| 1.1922        | 1.6256 | 127  | 1.1755          |
+| 1.1532        | 1.6384 | 128  | 1.1755          |
+| 1.1956        | 1.6512 | 129  | 1.1754          |
+| 1.1954        | 1.6640 | 130  | 1.1754          |
+| 1.1479        | 1.6768 | 131  | 1.1753          |
+| 1.1398        | 1.6896 | 132  | 1.1753          |
+| 1.1724        | 1.7024 | 133  | 1.1752          |
+| 1.1397        | 1.7152 | 134  | 1.1752          |
+| 1.2162        | 1.728  | 135  | 1.1751          |
+| 1.1854        | 1.7408 | 136  | 1.1751          |
+| 1.1411        | 1.7536 | 137  | 1.1751          |
+| 1.0747        | 1.7664 | 138  | 1.1750          |
+| 1.1727        | 1.7792 | 139  | 1.1750          |
+| 1.1701        | 1.792  | 140  | 1.1750          |
+| 1.1688        | 1.8048 | 141  | 1.1750          |
+| 1.1545        | 1.8176 | 142  | 1.1750          |
+| 1.1512        | 1.8304 | 143  | 1.1749          |
+| 1.203         | 1.8432 | 144  | 1.1749          |
+| 1.1665        | 1.8560 | 145  | 1.1749          |
+| 1.186         | 1.8688 | 146  | 1.1748          |
+| 1.1283        | 1.8816 | 147  | 1.1748          |
+| 1.1555        | 1.8944 | 148  | 1.1748          |
+| 1.1243        | 1.9072 | 149  | 1.1748          |
+| 1.1767        | 1.92   | 150  | 1.1747          |
+| 1.1505        | 1.9328 | 151  | 1.1747          |
+| 1.1012        | 1.9456 | 152  | 1.1747          |
+| 1.2098        | 1.9584 | 153  | 1.1747          |
+| 1.1476        | 1.9712 | 154  | 1.1746          |
+| 1.2055        | 1.984  | 155  | 1.1746          |
+| 1.1539        | 1.9968 | 156  | 1.1746          |
+| 1.176         | 2.0096 | 157  | 1.1745          |
+| 1.1357        | 2.0224 | 158  | 1.1745          |
+| 1.1943        | 2.0352 | 159  | 1.1745          |
+| 1.1447        | 2.048  | 160  | 1.1744          |
+| 1.123         | 2.0608 | 161  | 1.1744          |
+| 1.1638        | 2.0736 | 162  | 1.1744          |
+| 1.1551        | 2.0864 | 163  | 1.1744          |
+| 1.1409        | 2.0992 | 164  | 1.1743          |
+| 1.1071        | 2.112  | 165  | 1.1743          |
+| 1.1705        | 2.1248 | 166  | 1.1743          |
+| 1.2038        | 2.1376 | 167  | 1.1742          |
+| 1.1734        | 2.1504 | 168  | 1.1742          |
+| 1.1538        | 2.1632 | 169  | 1.1742          |
+| 1.179         | 2.176  | 170  | 1.1742          |
+| 1.1614        | 2.1888 | 171  | 1.1741          |
+| 1.1397        | 2.2016 | 172  | 1.1741          |
+| 1.1569        | 2.2144 | 173  | 1.1741          |
+| 1.1379        | 2.2272 | 174  | 1.1740          |
+| 1.1304        | 2.24   | 175  | 1.1740          |
+| 1.1855        | 2.2528 | 176  | 1.1740          |
+| 1.1763        | 2.2656 | 177  | 1.1740          |
+| 1.1194        | 2.2784 | 178  | 1.1739          |
+| 1.0971        | 2.2912 | 179  | 1.1739          |
+| 1.1566        | 2.304  | 180  | 1.1739          |
+| 1.1421        | 2.3168 | 181  | 1.1739          |
+| 1.1645        | 2.3296 | 182  | 1.1738          |
+| 1.1782        | 2.3424 | 183  | 1.1738          |
+| 1.1514        | 2.3552 | 184  | 1.1738          |
+| 1.175         | 2.368  | 185  | 1.1738          |
+| 1.1279        | 2.3808 | 186  | 1.1738          |
+| 1.1158        | 2.3936 | 187  | 1.1738          |
+| 1.202         | 2.4064 | 188  | 1.1737          |
+| 1.164         | 2.4192 | 189  | 1.1737          |
+| 1.1431        | 2.432  | 190  | 1.1737          |
+| 1.1271        | 2.4448 | 191  | 1.1737          |
+| 1.1746        | 2.4576 | 192  | 1.1736          |
+| 1.1126        | 2.4704 | 193  | 1.1736          |
+| 1.1652        | 2.4832 | 194  | 1.1736          |
+| 1.1692        | 2.496  | 195  | 1.1736          |
+| 1.1764        | 2.5088 | 196  | 1.1736          |
+| 1.1905        | 2.5216 | 197  | 1.1736          |
+| 1.1679        | 2.5344 | 198  | 1.1735          |
+| 1.1324        | 2.5472 | 199  | 1.1735          |
+| 1.124         | 2.56   | 200  | 1.1735          |
+| 1.1296        | 2.5728 | 201  | 1.1735          |
+| 1.1498        | 2.5856 | 202  | 1.1735          |
+| 1.1845        | 2.5984 | 203  | 1.1735          |
+| 1.0965        | 2.6112 | 204  | 1.1735          |
+| 1.1511        | 2.624  | 205  | 1.1735          |
+| 1.1703        | 2.6368 | 206  | 1.1734          |
+| 1.1948        | 2.6496 | 207  | 1.1734          |
+| 1.1688        | 2.6624 | 208  | 1.1734          |
+| 1.1528        | 2.6752 | 209  | 1.1734          |
+| 1.1261        | 2.6880 | 210  | 1.1734          |
+| 1.1662        | 2.7008 | 211  | 1.1734          |
+| 1.1596        | 2.7136 | 212  | 1.1734          |
+| 1.1474        | 2.7264 | 213  | 1.1734          |
+| 1.1813        | 2.7392 | 214  | 1.1734          |
+| 1.1624        | 2.752  | 215  | 1.1734          |
+| 1.1604        | 2.7648 | 216  | 1.1734          |
+| 1.1596        | 2.7776 | 217  | 1.1734          |
+| 1.2008        | 2.7904 | 218  | 1.1734          |
+| 1.1813        | 2.8032 | 219  | 1.1734          |
+| 1.2147        | 2.816  | 220  | 1.1734          |
+| 1.1821        | 2.8288 | 221  | 1.1734          |
+| 1.1476        | 2.8416 | 222  | 1.1734          |
+| 1.1416        | 2.8544 | 223  | 1.1734          |
+| 1.1228        | 2.8672 | 224  | 1.1733          |
+| 1.1908        | 2.88   | 225  | 1.1733          |
+| 1.1666        | 2.8928 | 226  | 1.1733          |
+| 1.0962        | 2.9056 | 227  | 1.1733          |
+| 1.1721        | 2.9184 | 228  | 1.1733          |
+| 1.1158        | 2.9312 | 229  | 1.1733          |
+| 1.1282        | 2.944  | 230  | 1.1733          |
+| 1.1401        | 2.9568 | 231  | 1.1733          |
+| 1.1897        | 2.9696 | 232  | 1.1733          |
+| 1.1395        | 2.9824 | 233  | 1.1733          |
+| 1.141         | 2.9952 | 234  | 1.1733          |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.43.0.dev0
+- Pytorch 2.2.2+cu121
+- Datasets 2.19.2
+- Tokenizers 0.19.1