Leon-Leee commited on
Commit
e88bb02
1 Parent(s): 517a948

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license name: deepseek
3
  library_name: transformers
4
  tags:
5
  - code
@@ -12,11 +11,13 @@ datasets:
12
  metrics:
13
  - code_eval
14
  pipeline_tag: text-generation
 
 
15
  ---
16
  ## AIGCodeGeek-DS-6.7B
17
 
18
  ### Introduction
19
- AIGCodeGeek-DS-6.7B is the first released version of our Code-LLM family with competitive performance on public and private benchmarks.
20
 
21
  ### Model Details
22
  #### Model Description
@@ -26,7 +27,7 @@ AIGCodeGeek-DS-6.7B is the first released version of our Code-LLM family with co
26
 
27
  ### Training data
28
  A mixture of samples from high-quality open-source (read *Acknowledgements*) and our private datasets.
29
- We have made contamination detection as Magicoder/Bigcode did.
30
 
31
  ### Evaluation
32
  results to be added.
@@ -48,11 +49,12 @@ attrdict
48
  ### QuickStart
49
 
50
  ```python
 
51
  from transformers import AutoTokenizer, AutoModelForCausalLM
52
  tokenizer = AutoTokenizer.from_pretrained("aigcode/AIGCodeGeek-DS-6.7B", trust_remote_code=True)
53
  model = AutoModelForCausalLM.from_pretrained("aigcode/AIGCodeGeek-DS-6.7B", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
54
  messages=[
55
- { 'role': 'user', 'content': "write a quick sort algorithm in python."}
56
  ]
57
  inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
58
  # tokenizer.eos_token_id is the id of <|EOT|> token
@@ -60,8 +62,6 @@ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50,
60
  print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
61
  ```
62
 
63
- ### Limits
64
-
65
 
66
  ### Acknowledgements
67
  We gain a lot of knowledge and resources from the open-source community:
 
1
  ---
 
2
  library_name: transformers
3
  tags:
4
  - code
 
11
  metrics:
12
  - code_eval
13
  pipeline_tag: text-generation
14
+ license: other
15
+ license name: deepseek
16
  ---
17
  ## AIGCodeGeek-DS-6.7B
18
 
19
  ### Introduction
20
+ AIGCodeGeek-DS-6.7B is our first released version of a Code-LLM family with competitive performance on public and private benchmarks.
21
 
22
  ### Model Details
23
  #### Model Description
 
27
 
28
  ### Training data
29
  A mixture of samples from high-quality open-source (read *Acknowledgements*) and our private datasets.
30
+ We have made contamination detection as Magicoder/Bigcode did (https://github.com/ise-uiuc/magicoder/blob/main/src/magicoder/decontamination/find_substrings.py).
31
 
32
  ### Evaluation
33
  results to be added.
 
49
  ### QuickStart
50
 
51
  ```python
52
+ import torch
53
  from transformers import AutoTokenizer, AutoModelForCausalLM
54
  tokenizer = AutoTokenizer.from_pretrained("aigcode/AIGCodeGeek-DS-6.7B", trust_remote_code=True)
55
  model = AutoModelForCausalLM.from_pretrained("aigcode/AIGCodeGeek-DS-6.7B", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
56
  messages=[
57
+ { 'role': 'user', 'content': "write a merge sort algorithm in python."}
58
  ]
59
  inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
60
  # tokenizer.eos_token_id is the id of <|EOT|> token
 
62
  print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
63
  ```
64
 
 
 
65
 
66
  ### Acknowledgements
67
  We gain a lot of knowledge and resources from the open-source community: