CHANGELOG.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ ## v1.0
2
+
3
+ - Add accelerated ChatGLM-6B model (from: https://huggingface.co/THUDM/chatglm-6b)
CHANGES.rst DELETED
@@ -1,10 +0,0 @@
1
- Changelog (lyraChatGLM)
2
-
3
- ## 2.0
4
- - rebuild whole system using modified Fastertransformer
5
- - add dynamic library & models for Volta architecture.
6
- - further acceleration, remove token generation limits.
7
-
8
- ## 1.0
9
-
10
- - add lyraChatGLM model, from original weights
 
 
 
 
 
 
 
 
 
 
 
Dockerfile ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM nvcr.io/nvidia/pytorch:23.02-py3
2
+
3
+ WORKDIR /workdir
4
+
5
+ COPY requirements.txt /workdir/
6
+
7
+ # since installing icetk will install protobuf 3.18.3, and we need protobuf==3.20.3
8
+ RUN pip install -r requirements.txt && \
9
+ pip install protobuf==3.20.3
10
+
11
+
LISENCE DELETED
@@ -1,420 +0,0 @@
1
- MIT License
2
-
3
- Copyright (c) 2023 Tencent Music Entertainment
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
22
-
23
-
24
- Other dependencies and licenses:
25
-
26
- Open Source Software Licensed under The ChatGLM-6B License and the Apache License Version 2.0 :
27
- --------------------------------------------------------------------
28
- 1. chatglm-6b
29
-
30
- File:https://github.com/THUDM/ChatGLM-6B
31
- License:The ChatGLM-6B License and Apache Licnese Version 2.0
32
- For details:https://github.com/THUDM/ChatGLM-6B/blob/main/MODEL_LICENSE
33
- https://github.com/THUDM/ChatGLM-6B/blob/main/LICENSE
34
-
35
- APPENDIX: How to apply the Apache License to your work.
36
-
37
- To apply the Apache License to your work, attach the following
38
- boilerplate notice, with the fields enclosed by brackets "[]"
39
- replaced with your own identifying information. (Don't include
40
- the brackets!) The text should be enclosed in the appropriate
41
- comment syntax for the file format. We also recommend that a
42
- file or class name and description of purpose be included on the
43
- same "printed page" as the copyright notice for easier
44
- identification within third-party archives.
45
-
46
- Copyright Zhengxiao Du
47
-
48
- Licensed under the Apache License, Version 2.0 (the "License");
49
- you may not use this file except in compliance with the License.
50
- You may obtain a copy of the License at
51
-
52
- http://www.apache.org/licenses/LICENSE-2.0
53
-
54
- Unless required by applicable law or agreed to in writing, software
55
- distributed under the License is distributed on an "AS IS" BASIS,
56
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
57
- See the License for the specific language governing permissions and
58
- limitations under the License.
59
-
60
- A copy of the Apache License Version 2.0 is included in this file.
61
-
62
-
63
- Terms of The ChatGLM-6B License:
64
- --------------------------------------------------------------------
65
-
66
- 一、定义
67
-
68
- “许可方”是指分发其软件的 ChatGLM-6B 模型团队。
69
-
70
- “软件”是指根据本许可提供的 ChatGLM-6B 模型参数。
71
-
72
- 2. 许可授予
73
-
74
- 根据本许可的条款和条件,许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可,仅用于您的非商业研究目的。
75
-
76
- 上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
77
-
78
- 3.限制
79
-
80
- 您不得出于任何商业、军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
81
-
82
- 您不得利用本软件从事任何危害国家安全和国家统一、危害社会公共利益、侵犯人身权益的行为。
83
-
84
- 4.免责声明
85
-
86
- 本软件“按原样”提供,不提供任何明示或暗示的保证,包括但不限于对适销性、特定用途的适用性和非侵权性的保证。 在任何情况下,作者或版权持有人均不对任何索赔、损害或其他责任负责,无论是在合同诉讼、侵权行为还是其他方面,由软件或软件的使用或其他交易引起、由软件引起或与之相关 软件。
87
-
88
- 5. 责任限制
89
-
90
- 除适用法律禁止的范围外,在任何情况下且根据任何法律理论,无论是基于侵权行为、疏忽、合同、责任或其他原因,任何许可方均不对您承担任何直接、间接、特殊、偶然、示范性、 或间接损害,或任何其他商业损失,即使许可人已被告知此类损害的可能性。
91
-
92
- 6.争议解决
93
-
94
- 本许可受中华人民共和国法律管辖并按其解释。 因本许可引起的或与本许可有关的任何争议应提交北京市海淀区人民法院。
95
-
96
- 请注意,许可证可能会更新到更全面的版本。 有关许可和版权的任何问题,请通过 glm-130b@googlegroups.com 与我们联系。
97
-
98
- 1. Definitions
99
-
100
- “Licensor” means the ChatGLM-6B Model Team that distributes its Software.
101
-
102
- “Software” means the ChatGLM-6B model parameters made available under this license.
103
-
104
- 2. License Grant
105
-
106
- Subject to the terms and conditions of this License, the Licensor hereby grants to you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license to use the Software solely for your non-commercial research purposes.
107
-
108
- The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
109
-
110
- 3. Restriction
111
-
112
- You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any commercial, military, or illegal purposes.
113
-
114
- You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
115
-
116
- 4. Disclaimer
117
-
118
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
119
-
120
- 5. Limitation of Liability
121
-
122
- EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
123
-
124
- 6. Dispute Resolution
125
-
126
- This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
127
-
128
- Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at glm-130b@googlegroups.com.
129
-
130
-
131
- Open Source Software Licensed under the Apache License Version 2.0:
132
- --------------------------------------------------------------------
133
- 1. huggingface/transformers
134
- Copyright 2018- The Hugging Face team. All rights reserved.
135
-
136
-
137
- Terms of the Apache License Version 2.0:
138
- --------------------------------------------------------------------
139
- Apache License
140
-
141
- Version 2.0, January 2004
142
-
143
- http://www.apache.org/licenses/
144
-
145
- TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
146
- 1. Definitions.
147
-
148
- "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
149
-
150
- "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
151
-
152
- "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
153
-
154
- "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
155
-
156
- "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
157
-
158
- "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
159
-
160
- "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
161
-
162
- "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
163
-
164
- "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
165
-
166
- "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
167
-
168
- 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
169
-
170
- 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
171
-
172
- 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
173
-
174
- You must give any other recipients of the Work or Derivative Works a copy of this License; and
175
-
176
- You must cause any modified files to carry prominent notices stating that You changed the files; and
177
-
178
- You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
179
-
180
- If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
181
-
182
- You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
183
-
184
- 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
185
-
186
- 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
187
-
188
- 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
189
-
190
- 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
191
-
192
- 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
193
-
194
- END OF TERMS AND CONDITIONS
195
-
196
-
197
- Open Source Software Licensed under the Modified BSD License:
198
- --------------------------------------------------------------------
199
- 1. pytorch
200
-
201
- From PyTorch:
202
-
203
- Copyright (c) 2016- Facebook, Inc (Adam Paszke)
204
- Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
205
- Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
206
- Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
207
- Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
208
- Copyright (c) 2011-2013 NYU (Clement Farabet)
209
- Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
210
- Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
211
- Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
212
-
213
- From Caffe2:
214
-
215
- Copyright (c) 2016-present, Facebook Inc. All rights reserved.
216
-
217
- All contributions by Facebook:
218
- Copyright (c) 2016 Facebook Inc.
219
-
220
- All contributions by Google:
221
- Copyright (c) 2015 Google Inc.
222
- All rights reserved.
223
-
224
- All contributions by Yangqing Jia:
225
- Copyright (c) 2015 Yangqing Jia
226
- All rights reserved.
227
-
228
- All contributions by Kakao Brain:
229
- Copyright 2019-2020 Kakao Brain
230
-
231
- All contributions by Cruise LLC:
232
- Copyright (c) 2022 Cruise LLC.
233
- All rights reserved.
234
-
235
- All contributions from Caffe:
236
- Copyright(c) 2013, 2014, 2015, the respective contributors
237
- All rights reserved.
238
-
239
- All other contributions:
240
- Copyright(c) 2015, 2016 the respective contributors
241
- All rights reserved.
242
-
243
- Caffe2 uses a copyright model similar to Caffe: each contributor holds
244
- copyright over their contributions to Caffe2. The project versioning records
245
- all such contribution and copyright details. If a contributor wants to further
246
- mark their specific copyright on a particular contribution, they should
247
- indicate their copyright solely in the commit message of the change when it is
248
- committed.
249
-
250
- All rights reserved.
251
-
252
-
253
- Terms of the Modified BSD License:
254
- -------------------------------------------------------------------
255
- This project is licensed under the terms of the Modified BSD License, as follows:
256
-
257
- Redistribution and use in source and binary forms, with or without
258
- modification, are permitted provided that the following conditions are met:
259
-
260
- 1. Redistributions of source code must retain the above copyright
261
- notice, this list of conditions and the following disclaimer.
262
-
263
- 2. Redistributions in binary form must reproduce the above copyright
264
- notice, this list of conditions and the following disclaimer in the
265
- documentation and/or other materials provided with the distribution.
266
-
267
- 3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
268
- and IDIAP Research Institute nor the names of its contributors may be
269
- used to endorse or promote products derived from this software without
270
- specific prior written permission.
271
-
272
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
273
- AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
274
- IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
275
- ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
276
- LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
277
- CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
278
- SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
279
- INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
280
- CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
281
- ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
282
- POSSIBILITY OF SUCH DAMAGE.
283
-
284
-
285
- Open Source Software Licensed under the Python Software Foundation License Version 2:
286
- --------------------------------------------------------------------------
287
- 1. Python/cpython
288
- Copyright © 2001-2023 Python Software Foundation. All rights reserved
289
-
290
-
291
- A. HISTORY OF THE SOFTWARE
292
- ==========================
293
-
294
- Python was created in the early 1990s by Guido van Rossum at Stichting
295
- Mathematisch Centrum (CWI, see https://www.cwi.nl) in the Netherlands
296
- as a successor of a language called ABC. Guido remains Python's
297
- principal author, although it includes many contributions from others.
298
-
299
- In 1995, Guido continued his work on Python at the Corporation for
300
- National Research Initiatives (CNRI, see https://www.cnri.reston.va.us)
301
- in Reston, Virginia where he released several versions of the
302
- software.
303
-
304
- In May 2000, Guido and the Python core development team moved to
305
- BeOpen.com to form the BeOpen PythonLabs team. In October of the same
306
- year, the PythonLabs team moved to Digital Creations, which became
307
- Zope Corporation. In 2001, the Python Software Foundation (PSF, see
308
- https://www.python.org/psf/) was formed, a non-profit organization
309
- created specifically to own Python-related Intellectual Property.
310
- Zope Corporation was a sponsoring member of the PSF.
311
-
312
- All Python releases are Open Source (see https://opensource.org for
313
- the Open Source Definition). Historically, most, but not all, Python
314
- releases have also been GPL-compatible; the table below summarizes
315
- the various releases.
316
-
317
- Release Derived Year Owner GPL-
318
- from compatible? (1)
319
-
320
- 0.9.0 thru 1.2 1991-1995 CWI yes
321
- 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes
322
- 1.6 1.5.2 2000 CNRI no
323
- 2.0 1.6 2000 BeOpen.com no
324
- 1.6.1 1.6 2001 CNRI yes (2)
325
- 2.1 2.0+1.6.1 2001 PSF no
326
- 2.0.1 2.0+1.6.1 2001 PSF yes
327
- 2.1.1 2.1+2.0.1 2001 PSF yes
328
- 2.1.2 2.1.1 2002 PSF yes
329
- 2.1.3 2.1.2 2002 PSF yes
330
- 2.2 and above 2.1.1 2001-now PSF yes
331
-
332
- Footnotes:
333
-
334
- (1) GPL-compatible doesn't mean that we're distributing Python under
335
- the GPL. All Python licenses, unlike the GPL, let you distribute
336
- a modified version without making your changes open source. The
337
- GPL-compatible licenses make it possible to combine Python with
338
- other software that is released under the GPL; the others don't.
339
-
340
- (2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
341
- because its license has a choice of law clause. According to
342
- CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
343
- is "not incompatible" with the GPL.
344
-
345
- Thanks to the many outside volunteers who have worked under Guido's
346
- direction to make these releases possible.
347
-
348
-
349
- B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
350
- ===============================================================
351
-
352
- Python software and documentation are licensed under the
353
- Python Software Foundation License Version 2.
354
-
355
- Starting with Python 3.8.6, examples, recipes, and other code in
356
- the documentation are dual licensed under the PSF License Version 2
357
- and the Zero-Clause BSD license.
358
-
359
- Some software incorporated into Python is under different licenses.
360
- The licenses are listed with code falling under that license.
361
-
362
-
363
- PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
364
- --------------------------------------------
365
-
366
- 1. This LICENSE AGREEMENT is between the Python Software Foundation
367
- ("PSF"), and the Individual or Organization ("Licensee") accessing and
368
- otherwise using this software ("Python") in source or binary form and
369
- its associated documentation.
370
-
371
- 2. Subject to the terms and conditions of this License Agreement, PSF hereby
372
- grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
373
- analyze, test, perform and/or display publicly, prepare derivative works,
374
- distribute, and otherwise use Python alone or in any derivative version,
375
- provided, however, that PSF's License Agreement and PSF's notice of copyright,
376
- i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
377
- 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation;
378
- All Rights Reserved" are retained in Python alone or in any derivative version
379
- prepared by Licensee.
380
-
381
- 3. In the event Licensee prepares a derivative work that is based on
382
- or incorporates Python or any part thereof, and wants to make
383
- the derivative work available to others as provided herein, then
384
- Licensee hereby agrees to include in any such work a brief summary of
385
- the changes made to Python.
386
-
387
- 4. PSF is making Python available to Licensee on an "AS IS"
388
- basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
389
- IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
390
- DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
391
- FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
392
- INFRINGE ANY THIRD PARTY RIGHTS.
393
-
394
- 5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
395
- FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
396
- A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
397
- OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
398
-
399
- 6. This License Agreement will automatically terminate upon a material
400
- breach of its terms and conditions.
401
-
402
- 7. Nothing in this License Agreement shall be deemed to create any
403
- relationship of agency, partnership, or joint venture between PSF and
404
- Licensee. This License Agreement does not grant permission to use PSF
405
- trademarks or trade name in a trademark sense to endorse or promote
406
- products or services of Licensee, or any third party.
407
-
408
- 8. By copying, installing or otherwise using Python, Licensee
409
- agrees to be bound by the terms and conditions of this License
410
- Agreement.
411
-
412
-
413
- Open Source Software:
414
- --------------------------------------------------------------------
415
- 1. icetk
416
- File:https://github.com/THUDM/icetk
417
-
418
-
419
-
420
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,153 +1,120 @@
1
  ---
2
- license: mit
3
- language: en
 
4
  tags:
5
  - LLM
6
- - ChatGLM6B
 
7
  ---
8
- ## Breakings!
9
 
10
- **We know what you want, and here you go!**
11
 
12
- - Newly released lyraChatGLM model, suitable for Ampere (A100/A10) as well as Volta (V100)
13
- - lyraChatGLM has been further optimized, reaching **9000 tokens/s** on A100 and **3900 tokens/s** on V100, about **5.5x** faster than the up-to-date official version (2023/6/1).
14
- - The memory usage was optimized too, now we can set batch_size up to **256** on A100!
15
- - INT8 weight only PTQ is supported
16
 
17
- **Note that the code was fully updated too, you need to use the new API, see `Uses` below**
18
 
19
- If you like our work and consider to join us, feel free to drop a line to benbinwu@tencent.com.
20
-
21
- P.S. Recently we have received a lot of inquiries on accelerating customized models. Actually, we **do not have plan** to release the convertion tool at this moment, nor do we think it would be possible to apply your customized models based on our current release.
22
- ****
23
- ## Model Card for lyraChatGLM
24
-
25
- lyraChatGLM is currently the **fastest ChatGLM-6B** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
26
-
27
- The inference speed of lyraChatGLM has achieved **300x** acceleration upon the early original version. We are still working hard to further improve the performance.
28
-
29
- Among its main features are (updated on 2023-06-20):
30
- - weights: original ChatGLM-6B weights released by THUDM.
31
- - device: Nvidia GPU with Amperer architecture or Volta architecture (A100, A10, V100...).
32
- - batch_size: compiled with dynamic batch size, maximum depends on device.
33
- - We now support cuda version of both 11.X and 12.X
34
- - lyraChatGLM has been further optimized, with faster model load speed from few minutes to less than 10s for non-int8 mode, and around 1 min for int8 mode!
35
 
36
  ## Speed
37
- - orginal version(fixed batch infer): commit id 1d240ba
38
-
39
- ### test on A100 40G
40
- 1. The maximum batch size and maximum speed table for each version of the model.
41
- |version|max_batch_size|max_speed|
42
- |:-:|:-:|:-:|
43
- |original|1|30 tokens/s|
44
- |original(fxied batch infer)|192|1638.52 tokens/s|
45
- |lyraChatGLM(current)|256|9082.60 tokens/s|
46
- 2. The speed table for the same batch size.
47
- |version|1 batch_size|8 batch_size| 64 batch_size | 128 batch_size |
48
- |:-:|:-:|:-:|:-:|:-:|
49
- |original|30 tokens/s| - | - | - |
50
- |original(fxied batch infer)|34.48 tokens/s|356.29 tokens/s|1638.52 tokens/s|1338.45 tokens/s|
51
- |lyraChatGLM(current)|110.05 tokens/s|843.60 tokens/s|4926.92 tokens/s|7235.04 tokens/s|
52
-
53
- ### test on V100
54
- 1. The maximum batch size and maximum speed table for each version of the model.
55
- |version|max_batch_size|max_speed|
56
- |:-:|:-:|:-:|
57
- |original|1|17.83 tokens/s|
58
- |original(fxied batch infer)|128|992.20 tokens/s|
59
- |lyraChatGLM(current)|192|3958.39 tokens/s|
60
- 2. The speed table for the same batch size.
61
- |version|1 batch_size|8 batch_size| 64 batch_size | 128 batch_size |
62
- |:-:|:-:|:-:|:-:|:-:|
63
- |original|17.83 tokens/s| - | - | - |
64
- |original(fxied batch infer)|17.83 tokens/s|228.95 tokens/s|889.7 tokens/s|922.20 tokens/s|
65
- |lyraChatGLM(current)|59.33 tokens/s|514.15 tokens/s|2849.88 tokens/s|3958.39 tokens/s|
66
-
67
- ## Model Sources
68
-
69
- - **Repository:** https://huggingface.co/THUDM/chatglm-6b
70
-
71
- ## Docker Environment Recommendation
72
-
73
- - For Cuda 11.X: we recommend ```nvcr.io/nvidia/pytorch:22.12-py3```
74
- - For Cuda 12.0: we recommend ```nvcr.io/nvidia/pytorch:23.02-py3```
75
-
76
- ```bash
77
- docker pull nvcr.io/nvidia/pytorch:23.02-py3
78
- docker run --rm -it --gpus all -v ./:/lyraChatGLM nvcr.io/nvidia/pytorch:23.02-py3
79
-
80
- pip install -r requirements.txt
81
- python demo.py
82
- ```
83
 
84
- ## Uses
85
 
86
- ```python
87
- from lyraChatGLM import LyraChatGLM6B
88
 
89
- model_path = "./models/1-gpu-fp16.bin"
90
- tokenizer_path = "./models"
91
- data_type = "fp16"
92
- int8_mode = 0 # 1 for INT8 WEIGHT ONLY PTQ
93
- max_output_length = 150
94
- arch = "Ampere" # Ampere or Volta
95
- cuda_version = 12
96
 
97
- model = LyraChatGLM6B(model_path, tokenizer_path, data_type, int8_mode, arch, cuda_version)
98
- prompt = "列出3个不同的机器学习算��,并说明它们的适用范围."
99
- test_batch_size = 256
100
 
101
- prompts = [prompt, ]
102
 
103
- # If you want to get different output in same batch, you can set do_sample to True
104
- output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=False)
 
 
105
 
106
- print(output_texts)
107
 
108
- ```
109
- ## Demo output
110
-
111
- ### input
112
- 列出3个不同的机器学习算法,并说明它们的适用范围.
113
 
114
- ### output
115
- 以下是三个常见的机器学习算法及其适用范围:
116
 
117
- 1. 决策树(Decision Tree):决策树是一种基于分类和回归问题的朴素贝叶斯模型。它通过构建一系列逐步分裂的分支来预测结果。适用于那些具有简单特征、大量数据且数据集大小在可接受范围内的情况。
118
 
119
- 2. 随机森林(Random Forest):随机森林是一种集成学习算法,由多个决策树组成。它的优点是能够处理大规模数据和高维度的特征。适用于需要对多个变量进行建模的场景,例如医疗诊断、金融风险评估等。
 
 
 
120
 
121
- 3. 支持向量机(Support Vector Machine):支持向量机是一种监督学习方法,通常用于分类问题。它可以处理高维数据,并且具有较高的准确性。适用于需要对高维数据进行分类或回归的问题,例如图像识别、自然语言处理等。
 
 
122
 
123
- ## INT8
124
 
125
- **Int8 usage**:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
 
127
- Our current version supports INT8 weight only PTQ. To enable this mode, simply modify the `int8_mode` to `1` in the demo.py file.
 
128
 
129
- **In this mode, gpu memory can be further reduced by about half and the speed can be doubled.**
 
130
 
131
- This solves the issue mentioned in https://github.com/THUDM/ChatGLM-6B/issues/1042.
132
 
133
- However, the speed gain is best achieved with a batch size of no more than 128. If you don't use A100 GPU, you can adjust the
134
- batch size to reduce it and get the benefits. We recommend a batch size of 64.This mode is very suitable for GPUs with
135
- limited VRAM or scenarios where it is difficult to use larger batch sizes in real-time services.
136
 
137
- It should be noted that although we have aligned the accuracy in our test cases, there may be slight differences
138
- in accuracy in some untested scenarios with int8. Please be aware of this.
139
 
 
140
 
141
- ## Citation
142
- ``` bibtex
143
  @Misc{lyraChatGLM2023,
144
-   author =       {Kangjian Wu, Zhengtao Wang, Yibo Lu, Bin Wu},
145
-   title =        {lyraChatGLM: Accelerating ChatGLM to 9000+ tokens/s},
146
-   howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
147
-   year =         {2023}
148
  }
149
  ```
150
 
151
- ## Report bug
152
- - start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraChatGLM/discussions
153
- - report bug with a `[bug]` mark in the title.
 
1
  ---
2
+ license: creativeml-openrail-m
3
+ language:
4
+ - en
5
  tags:
6
  - LLM
7
+ - tensorRT
8
+ - ChatGLM
9
  ---
10
+ ## Model Card for lyraChatGLM
11
 
12
+ lyraChatGLM is currently the **fastest ChatGLM-6B** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
13
 
14
+ The inference speed of lyraChatGLM has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
 
 
 
15
 
16
+ Among its main features are:
17
 
18
+ - weights: original ChatGLM-6B weights released by THUDM.
19
+ - device: lyraChatGLM is mainly based on TensorRT compiled for SM=80 (A100, for example).
20
+ - batch_size: compiled with dynamic batch size, max batch_size = 8
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Speed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ### test environment
25
 
26
+ - device: Nvidia A100 40G
27
+ - batch size: 8
28
 
29
+ **Since early chatGLM version didn't suport batch inference, `original` in below table was measured on batch_size=1**
 
 
 
 
 
 
30
 
 
 
 
31
 
32
+ **According to [this discussion](https://huggingface.co/TMElyralab/lyraChatGLM/discussions/6), this bug has been fixed and the speed on batch_size=8 reachs up to 137 tokens/s. We will evaluate and update the latest performance.**
33
 
34
+ |version|speed|
35
+ |:-:|:-:|
36
+ |original|30 tokens/s|
37
+ |lyraChatGLM|310 tokens/s|
38
 
 
39
 
40
+ ## Model Sources
 
 
 
 
41
 
42
+ - **Repository:** https://huggingface.co/THUDM/chatglm-6b
 
43
 
44
+ ## Try Demo in 2 fast steps
45
 
46
+ ``` bash
47
+ #step 1
48
+ git clone https://huggingface.co/TMElyralab/lyraChatGLM
49
+ cd lyraChatGLM
50
 
51
+ #step 2
52
+ docker run --gpus=1 --rm --net=host -v ${PWD}:/workdir yibolu96/lyra-chatglm-env:0.0.1 python3 /workdir/demo.py
53
+ ```
54
 
55
+ ## Uses
56
 
57
+ ```python
58
+ from transformers import AutoTokenizer
59
+ from lyraChatGLM import GLM6B, FasterChatGLM
60
+ import os
61
+
62
+ current_workdir = os.path.dirname(__file__)
63
+
64
+ MAX_OUT_LEN = 100
65
+ chatglm6b_dir = os.path.join(current_workdir, "models")
66
+ tokenizer = AutoTokenizer.from_pretrained(chatglm6b_dir, trust_remote_code=True)
67
+ input_str = ["为什么我们需要对深度学习模型加速?", ]
68
+ inputs = tokenizer(input_str, return_tensors="pt", padding=True)
69
+ input_ids = inputs.input_ids.to('cuda:0')
70
+
71
+ plan_path = os.path.join(current_workdir, "models/glm6b-bs8.ftm")
72
+
73
+ # kernel for chat model.
74
+ kernel = GLM6B(plan_path=plan_path,
75
+ batch_size=1,
76
+ num_beams=1,
77
+ use_cache=True,
78
+ num_heads=32,
79
+ emb_size_per_heads=128,
80
+ decoder_layers=28,
81
+ vocab_size=150528,
82
+ max_seq_len=MAX_OUT_LEN)
83
+
84
+ chat = FasterChatGLM(model_dir=chatglm6b_dir, kernel=kernel).half().cuda()
85
+
86
+ # generate
87
+ sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
88
+ # de-tokenize model output to text
89
+ res = tokenizer.decode(sample_output[0], skip_special_tokens=True)
90
+ print(res)
91
+ ```
92
+ ## Demo output
93
 
94
+ ### input
95
+ 为什么我们需要对深度学习模型加速? 。
96
 
97
+ ### output
98
+ 为什么我们需要对深度学习模型加速? 深度学习模型的训练需要大量计算资源,特别是在训练模型时,需要大量的内存、GPU(图形处理器)和其他计算资源。因此,训练深度学习模型需要一定的时间,并且如果模型不能快速训练,则可能会导致训练进度缓慢或无法训练。
99
 
100
+ 以下是一些原因我们需要对深度学习模型加速:
101
 
102
+ 1. 训练深度神经网络需要大量的计算资源,特别是在训练深度神经网络时,需要更多的计算资源,因此需要更快的训练速度。
 
 
103
 
104
+ ### TODO:
 
105
 
106
+ We plan to implement a FasterTransformer version to publish a much faster release. Stay tuned!
107
 
108
+ ## Citation
109
+ ``` bibtex
110
  @Misc{lyraChatGLM2023,
111
+ author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Bin Wu},
112
+ title = {lyraChatGLM: Accelerating ChatGLM by 10x+},
113
+ howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
114
+ year = {2023}
115
  }
116
  ```
117
 
118
+ ## Report bug
119
+ - start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraChatGLM/discussions
120
+ - report bug with a `[bug]` mark in the title.
demo.py CHANGED
@@ -1,22 +1,35 @@
1
- from lyraChatGLM import LyraChatGLM6B
2
- import numpy as np
3
 
4
- model_path = "./models/1-gpu-fp16.bin"
5
- tokenizer_path = "./models"
6
- inference_data_type = "fp16"
7
- int8_mode = 0
8
- max_output_length = 150
9
- arch = "Volta" # Ampere or Volta
10
- cuda_version = 11 # cuda version, we currently support 11 and 12
11
 
12
- model = LyraChatGLM6B(model_path, tokenizer_path, inference_data_type, int8_mode, arch, cuda_version)
13
 
14
- prompt = "今天天气大概 25度,有点小雨,吹着风,我想去户外散步,应该穿什么样的衣服裤子鞋子搭配。"
15
- # test_batch_size = 256
 
 
 
 
16
 
17
- prompts = [prompt, ]
18
 
19
- # # If you want to get different output in same batch, you can set do_sample to True
20
- output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=False)
 
 
 
 
 
 
 
 
21
 
22
- print(output_texts)
 
 
 
 
 
 
 
1
+ # coding=utf-8
 
2
 
3
+ from transformers import AutoTokenizer
4
+ from lyraChatGLM import GLM6B, FasterChatGLM
5
+ import os
 
 
 
 
6
 
7
+ current_workdir = os.path.dirname(__file__)
8
 
9
+ MAX_OUT_LEN = 100
10
+ chatglm6b_dir = os.path.join(current_workdir, "models")
11
+ tokenizer = AutoTokenizer.from_pretrained(chatglm6b_dir, trust_remote_code=True)
12
+ input_str = ["为什么我们需要对深度学习模型加速?", ]
13
+ inputs = tokenizer(input_str, return_tensors="pt", padding=True)
14
+ input_ids = inputs.input_ids.to('cuda:0')
15
 
16
+ plan_path = os.path.join(current_workdir, "models/glm6b-bs8.ftm")
17
 
18
+ # kernel for chat model.
19
+ kernel = GLM6B(plan_path=plan_path,
20
+ batch_size=1,
21
+ num_beams=1,
22
+ use_cache=True,
23
+ num_heads=32,
24
+ emb_size_per_heads=128,
25
+ decoder_layers=28,
26
+ vocab_size=150528,
27
+ max_seq_len=MAX_OUT_LEN)
28
 
29
+ chat = FasterChatGLM(model_dir=chatglm6b_dir, kernel=kernel).half().cuda()
30
+
31
+ # generate
32
+ sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
33
+ # de-tokenize model output to text
34
+ res = tokenizer.decode(sample_output[0], skip_special_tokens=True)
35
+ print(res)
lyraChatGLM/__init__.py CHANGED
@@ -1 +1,10 @@
1
- from .lyra_glm import LyraChatGLM6B
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import ctypes
3
+
4
+ current_workdir = os.path.dirname(__file__)
5
+ ctypes.cdll.LoadLibrary(os.path.join(current_workdir, "libnvinfer_plugin.so"))
6
+ os.environ["TORCH_USE_RTLD_GLOBAL"]="YES"
7
+
8
+ import torch
9
+ from .glm import GLM6B
10
+ from .model import FasterChatGLM
lyraChatGLM/config.py DELETED
@@ -1,31 +0,0 @@
1
- import dataclasses
2
- from typing import Optional
3
-
4
-
5
- @dataclasses.dataclass
6
- class ChatGLM6BParam:
7
- num_heads: int = 32
8
- size_per_head: int = 128
9
- inter_size: int = 16384
10
- num_layers: int = 28
11
- vocab_size: int = 130528
12
- start_id: Optional[int] = 130004
13
- end_id: Optional[int] = 130005
14
- tensor_para_size: int = 1
15
- pipeline_para_size: int = 1
16
- remove_padding: bool = True
17
- shared_contexts_ratio: float = 0.0
18
- layernorm_eps: float = 1e-5
19
- weights_data_type: str = "fp16"
20
-
21
- def __post_init__(self):
22
- if not 0.0 <= self.shared_contexts_ratio <= 1.0:
23
- raise ValueError(
24
- f'Got an invalid value of shared_context_ratio '
25
- f'{self.shared_contexts_ratio} - range: [0.0, 1.0]')
26
-
27
- def asdict(self):
28
- return dataclasses.asdict(self)
29
-
30
-
31
- CHATGLM_6B_PARAM = ChatGLM6BParam()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lyraChatGLM/ftlib/libth_transformer_sm70_cu11.so DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0826346c748380e8e9fdd7e1f7130bad0f2485a65a8ecd4beb33d19e85c4d79e
3
- size 114280392
 
 
 
 
lyraChatGLM/{ftlib/libth_transformer_sm70_cu12.so → glm.cpython-38-x86_64-linux-gnu.so} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2d9829541f5edccf8d59e275e1259404168750e3419902fc4c88f789baad3f20
3
- size 114203064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:feaeb19a7b780cdb669066bb096726d23f0c3ed401fe2f71adf12c66960c0d07
3
+ size 188432
lyraChatGLM/{ftlib/libth_transformer_sm80_cu11.so → libnvinfer_plugin.so} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:60a06f87ca10c5d556f965a5178aac50cbcbcec0265a7bcf18751e6ef73a807c
3
- size 200894104
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a87eb31795009c545422ef978f607d97be5454c68f09cb829352c0529d1ba8b
3
+ size 235256088
lyraChatGLM/lyra_glm.py DELETED
@@ -1,177 +0,0 @@
1
- from __future__ import annotations
2
-
3
- import configparser
4
- import pathlib
5
- import typing
6
-
7
- import torch
8
- import transformers
9
-
10
- from .config import CHATGLM_6B_PARAM
11
- from .model import ChatGLM6BModel
12
-
13
- class LyraChatGLM6B:
14
- def __init__(self, model_path, tokenizer_path=None, dtype='fp16', int8_mode=0, arch="Ampere", cuda_version="11") -> None:
15
- self.model_path = model_path
16
- self.tokenizer_path = tokenizer_path
17
- self.dtype = dtype
18
- self.arch=arch
19
- # if dtype != 'int8':
20
- # int8_mode = 0
21
- self.cuda_version = cuda_version
22
- self.int8_mode = int8_mode
23
-
24
- self.model, self.tokenizer = self.load_model_and_tokenizer()
25
- if not (arch in ["Ampere", "Volta"]):
26
- raise ValueError("Only support GPU device Ampere(A100,A10) or Volta(V100)")
27
-
28
- print("Got model and tokenizer")
29
-
30
- def load_model_and_tokenizer(self):
31
- if self.tokenizer_path is None:
32
- tokenizer_path = self.model_path
33
- else:
34
- tokenizer_path = self.tokenizer_path
35
-
36
- print(f'Loading tokenizer from {pathlib.Path(tokenizer_path).parent}')
37
- tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
38
-
39
- checkpoint_path = pathlib.Path(self.model_path)
40
-
41
- config_path = checkpoint_path.parent / 'config.ini'
42
-
43
- if config_path.exists():
44
- # Read model params from config.
45
- cfg = configparser.ConfigParser()
46
- cfg.read(config_path)
47
- model_name = 'glm6b'
48
- inference_data_type = self.dtype
49
- if inference_data_type == None:
50
- inference_data_type = cfg.get(model_name, "weight_data_type")
51
- model_args = dict(
52
- head_num=cfg.getint(model_name, 'head_num'),
53
- size_per_head=cfg.getint(model_name, "size_per_head"),
54
- layer_num=cfg.getint(model_name, "num_layer"),
55
- tensor_para_size=cfg.getint(model_name, "tensor_para_size"),
56
- vocab_size=cfg.getint(model_name, "vocab_size"),
57
- start_id=cfg.getint(model_name, "start_id"),
58
- end_id=cfg.getint(model_name, "end_id"),
59
- weights_data_type=cfg.get(model_name, "weight_data_type"),
60
- layernorm_eps=cfg.getfloat(model_name, 'layernorm_eps'),
61
- inference_data_type=inference_data_type)
62
- else:
63
- inference_data_type = self.dtype
64
- if inference_data_type == None:
65
- inference_data_type = CHATGLM_6B_PARAM.weights_data_type
66
- model_args = dict(head_num=CHATGLM_6B_PARAM.num_heads,
67
- size_per_head=CHATGLM_6B_PARAM.size_per_head,
68
- vocab_size=CHATGLM_6B_PARAM.vocab_size,
69
- start_id=CHATGLM_6B_PARAM.start_id or tokenizer.bos_token_id,
70
- end_id=CHATGLM_6B_PARAM.end_id or tokenizer.eos_token_id,
71
- layer_num=CHATGLM_6B_PARAM.num_layers,
72
- tensor_para_size=CHATGLM_6B_PARAM.tensor_para_size,
73
- weights_data_type=CHATGLM_6B_PARAM.weights_data_type,
74
- layernorm_eps=CHATGLM_6B_PARAM.layernorm_eps,
75
- inference_data_type=inference_data_type,
76
- )
77
-
78
- # update common parameters
79
- model_args.update(dict(
80
- rotary_embedding_dim=64,
81
- max_seq_len=0, # for position seq embedding
82
- pipeline_para_size=CHATGLM_6B_PARAM.pipeline_para_size,
83
- shared_contexts_ratio=CHATGLM_6B_PARAM.shared_contexts_ratio,
84
- int8_mode=self.int8_mode,
85
- model_path=self.model_path,
86
- cuda_version=self.cuda_version,
87
- ))
88
-
89
- print('[INFO] Load Our Highly Optimized LyraChatGLM6B model')
90
- for k, v in model_args.items():
91
- print(f' - {k.ljust(25, ".")}: {v}')
92
-
93
- # Check sanity and consistency between the model and tokenizer.
94
- checklist = ['head_num', 'size_per_head', 'vocab_size', 'layer_num',
95
- 'tensor_para_size', 'tensor_para_size', 'weights_data_type']
96
- if None in [model_args[k] for k in checklist]:
97
- none_params = [p for p in checklist if model_args[p] is None]
98
- print(f'[WARNING] Found None parameters {none_params}. They must '
99
- f'be provided either by config file or CLI arguments.')
100
- if model_args['start_id'] != tokenizer.bos_token_id:
101
- print('[WARNING] Given start_id is not matched with the bos token '
102
- 'id of the pretrained tokenizer.')
103
- if model_args['end_id'] not in (tokenizer.pad_token_id, tokenizer.eos_token_id):
104
- print('[WARNING] Given end_id is not matched with neither pad '
105
- 'token id nor eos token id of the pretrained tokenizer.')
106
-
107
- print(f'Loading tokenizer from {self.model_path}')
108
- model = ChatGLM6BModel(arch=self.arch,**model_args)
109
-
110
- return model, tokenizer
111
-
112
- def generate(self, prompts: typing.List[str] | str,
113
- output_length: int = 512,
114
- beam_width: int = 1,
115
- top_k: typing.Optional[torch.IntTensor] = 1,
116
- top_p: typing.Optional[torch.FloatTensor] = 1.0,
117
- beam_search_diversity_rate: typing.Optional[torch.FloatTensor] = 0.0,
118
- temperature: typing.Optional[torch.FloatTensor] = 1.0,
119
- len_penalty: typing.Optional[torch.FloatTensor] = 0.0,
120
- repetition_penalty: typing.Optional[torch.FloatTensor] = 1.0,
121
- presence_penalty: typing.Optional[torch.FloatTensor] = None,
122
- min_length: typing.Optional[torch.IntTensor] = None,
123
- bad_words_list: typing.Optional[torch.IntTensor] = None,
124
- do_sample: bool = False,
125
- return_output_length: bool = False,
126
- return_cum_log_probs: int = 0):
127
- #
128
- if isinstance(prompts, str):
129
- prompts = [prompts, ]
130
-
131
- inputs = prompts
132
-
133
- batch_size = len(inputs)
134
- ones_int = torch.ones(size=[batch_size], dtype=torch.int32)
135
- ones_float = torch.ones(size=[batch_size], dtype=torch.float32)
136
-
137
- # input_token_ids = self.tokenizer(prompts, return_tensors="pt", padding=True).input_ids.int()
138
- raw_input_token_ids = self.tokenizer(prompts, padding=True)
139
- input_token_ids = torch.tensor (raw_input_token_ids["input_ids"],dtype=torch.int32)
140
-
141
- input_lengths = torch.IntTensor([len(ids) for ids in input_token_ids])
142
- mask_positions = torch.IntTensor([seq.index(130001) for seq in input_token_ids.tolist()])
143
-
144
- random_seed = None
145
- if do_sample:
146
- random_seed = torch.randint(0, 262144, (batch_size,), dtype=torch.long)
147
-
148
- outputs = self.model(start_ids=input_token_ids,
149
- start_lengths=input_lengths,
150
- mask_positions=mask_positions,
151
- output_len=output_length,
152
- beam_width=beam_width,
153
- top_k=top_k*ones_int,
154
- top_p=top_p*ones_float,
155
- beam_search_diversity_rate=beam_search_diversity_rate*ones_float,
156
- temperature=temperature*ones_float,
157
- len_penalty=len_penalty*ones_float,
158
- repetition_penalty=repetition_penalty*ones_float,
159
- presence_penalty=presence_penalty,
160
- min_length=min_length,
161
- random_seed=random_seed,
162
- bad_words_list=bad_words_list,
163
- return_output_length=return_output_length,
164
- return_cum_log_probs=return_cum_log_probs)
165
-
166
- if return_cum_log_probs > 0:
167
- outputs = outputs[0] # output_token_ids.
168
-
169
- # Slice the generated token ids of the 1st beam result.
170
- # output = input tokens + generated tokens.
171
- output_token_ids = [out[0, length:].cpu()
172
- for out, length in zip(outputs, input_lengths)]
173
-
174
- output_texts = self.tokenizer.batch_decode(
175
- output_token_ids, skip_special_tokens=False)
176
-
177
- return output_texts
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lyraChatGLM/model.py CHANGED
@@ -1,195 +1,131 @@
1
- import os
2
- import h5py
3
- import pathlib
4
- import typing
5
-
6
- import numpy as np
7
  import torch
8
- import torch.distributed as dist
9
- import torch.nn as nn
10
-
11
- class ChatGLM6BModel(nn.Module):
12
- def __init__(self,
13
- head_num, size_per_head,
14
- vocab_size,
15
- rotary_embedding_dim,
16
- start_id, end_id, layer_num,
17
- arch,
18
- max_seq_len: int,
19
- tensor_para_size: int,
20
- pipeline_para_size: int,
21
- inference_data_type: str,
22
- model_path,
23
- cuda_version,
24
- inter_size: int = 0,
25
- # glm_variant_params
26
- layernorm_eps: float = 1e-5,
27
- layernorm_type: typing.Literal['pre_layernorm', 'post_layernorm'] = "pre_layernorm",
28
- activation_type: str = "Gelu",
29
- gpt_with_moe: bool = False,
30
- expert_num: int = 0,
31
- moe_k: int = 0,
32
- moe_layer_index: typing.List = [],
33
- has_positional_encoding: bool = False,
34
- has_pre_decoder_layernorm: bool = False,
35
- has_post_decoder_layernorm: bool = True,
36
- has_adapters: bool = False,
37
- adapter_inter_size: int = 0,
38
- use_attention_linear_bias: bool = False,
39
- int8_mode: int = 0,
40
- weights_data_type: typing.Union[str, np.dtype] = np.float32,
41
- shared_contexts_ratio: float = 1.0):
42
- super().__init__()
43
- self.head_num = head_num
44
- self.size_per_head = size_per_head
45
- self.vocab_size = vocab_size
46
- self.rotary_embedding_dim = rotary_embedding_dim
47
- self.start_id = start_id
48
- self.end_id = end_id
49
- self.layer_num = layer_num
50
- self.inter_size = inter_size if inter_size != 0 else 4 * self.head_num * self.size_per_head
51
- self.arch = arch
52
- self.model_path = model_path
53
- # gpt_variant_params
54
- self.layernorm_eps = layernorm_eps
55
- self.layernorm_type = layernorm_type
56
- self.activation_type = activation_type
57
- self.gpt_with_moe = gpt_with_moe
58
- self.expert_num = expert_num
59
- self.moe_k = moe_k
60
- self.moe_layer_index = moe_layer_index
61
- self.has_positional_encoding = has_positional_encoding
62
- self.has_pre_decoder_layernorm = has_pre_decoder_layernorm
63
- self.has_post_decoder_layernorm = has_post_decoder_layernorm
64
- self.has_adapters = has_adapters
65
- self.adapter_inter_size = adapter_inter_size
66
- self.use_attention_linear_bias = use_attention_linear_bias
67
-
68
- # multi-gpu params
69
- self.tensor_para_size = tensor_para_size
70
- self.pipeline_para_size = pipeline_para_size
71
- self.use_sparse_gemm = False
72
- self.build_model = False
73
- self.int8_mode = int8_mode
74
- self.weights_data_type = weights_data_type
75
- self.shared_contexts_ratio = shared_contexts_ratio
76
-
77
- assert torch.cuda.is_available(), "CUDA is required for this model."
78
-
79
- assert head_num % tensor_para_size == 0, "head_num must be a multiple of tensor_para_size."
80
- assert layer_num % pipeline_para_size == 0, "layer_num must be a multiple of pipeline_para_size."
81
-
82
- self.device = 0
83
-
84
- # Load the C++ model into Pytorch model.
85
- sm = "sm80"
86
-
87
- if arch == "Ampere":
88
- sm = "sm80"
89
- elif arch == "Volta":
90
- sm = "sm70"
91
- else:
92
- raise Exception(f"unsupported arch: {arch}")
93
 
94
- cu = 'cu11'
95
- if cuda_version == 11:
96
- cu = 'cu11'
97
- elif cuda_version == 12:
98
- cu = 'cu12'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  else:
100
- raise Exception(f"unsupported cuda version: {cuda_version}")
101
-
102
- lib_path = pathlib.Path(__file__).parent / "ftlib" / f"libth_transformer_{sm}_{cu}.so"
103
- torch.classes.load_library(os.path.abspath(lib_path))
104
-
105
- self.model = torch.classes.FasterTransformer.GlmOp(
106
- self.head_num, self.size_per_head, self.inter_size,
107
- self.layer_num,
108
- self.expert_num,
109
- self.moe_k,
110
- self.moe_layer_index,
111
- self.vocab_size,
112
- self.rotary_embedding_dim,
113
- self.start_id, self.end_id,
114
- self.tensor_para_size, self.pipeline_para_size, self.int8_mode,
115
- # GLM variant parameters
116
- self.layernorm_eps,
117
- self.layernorm_type,
118
- self.activation_type,
119
- self.has_positional_encoding,
120
- self.has_pre_decoder_layernorm,
121
- self.has_post_decoder_layernorm,
122
- self.has_adapters,
123
- self.adapter_inter_size,
124
- self.use_attention_linear_bias,
125
- self.model_path,
126
- self.weights_data_type,
127
- inference_data_type,
128
- self.shared_contexts_ratio)
129
- self.build_model = True
130
-
131
- def forward(self,
132
- start_ids: torch.IntTensor,
133
- start_lengths: torch.IntTensor,
134
- mask_positions: torch.IntTensor,
135
- output_len: int,
136
- beam_width: int = 1,
137
- top_k: typing.Optional[torch.IntTensor] = None,
138
- top_p: typing.Optional[torch.FloatTensor] = None,
139
- beam_search_diversity_rate: typing.Optional[torch.FloatTensor] = None,
140
- temperature: typing.Optional[torch.FloatTensor] = None,
141
- len_penalty: typing.Optional[torch.FloatTensor] = None,
142
- repetition_penalty: typing.Optional[torch.FloatTensor] = None,
143
- presence_penalty: typing.Optional[torch.FloatTensor] = None,
144
- min_length: typing.Optional[torch.IntTensor] = None,
145
- random_seed: typing.Optional[torch.LongTensor] = None,
146
- bad_words_list: typing.Optional[torch.IntTensor] = None,
147
- return_output_length: bool = False,
148
- return_cum_log_probs: int = 0):
149
-
150
- input_len = start_ids.size(1)
151
- assert input_len > 0, "input len must be larger than zero. For an unconditional case, use start_id as the first token."
152
-
153
- # Inputs to device
154
- start_ids = start_ids.cuda(self.device)
155
- start_lengths = start_lengths.cuda(self.device)
156
- mask_positions = mask_positions.cuda(self.device)
157
-
158
- # outputs: output_ids, output_lengths, output_cum_log_probs (optional)
159
- outputs = self.model.forward(start_ids,
160
- start_lengths,
161
- mask_positions,
162
- output_len,
163
- beam_width, # optional, can be None
164
- top_k, # optional, can be None
165
- top_p, # optional, can be None
166
- beam_search_diversity_rate, # optional, can be None
167
- temperature, # optional, can be None
168
- len_penalty, # optional, can be None
169
- repetition_penalty, # optional, can be None
170
- presence_penalty, # optional, can be None
171
- min_length, # optional, can be None
172
- random_seed, # optional, can be None
173
- bad_words_list, # optional, can be None
174
- return_cum_log_probs) # optional, can be None
175
- if return_cum_log_probs == 0:
176
- output_ids, output_lengths = outputs
177
  else:
178
- output_ids, output_lengths, output_cum_log_probs = outputs
179
- if return_output_length:
180
- if return_cum_log_probs > 0:
181
- return output_ids, output_lengths, output_cum_log_probs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
  else:
183
- return output_ids, output_lengths
 
 
184
  else:
185
- return output_ids
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
- def set_input_tensor(self, input_tensor):
188
- """Set input tensor to be used instead of forward()'s input.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
 
190
- When doing pipeline parallelism the input from the previous
191
- stage comes from communication, not from the input, so the
192
- model's forward_step_func won't have it. This function is thus
193
- used by internal code to bypass the input provided by the
194
- forward_step_func"""
195
- self.input_tensor = input_tensor
 
 
 
 
 
 
 
1
  import torch
2
+ from transformers.modeling_outputs import CausalLMOutputWithPast
3
+ from transformers.modeling_utils import PreTrainedModel
4
+ from transformers import AutoConfig
5
+ from typing import Dict, List, Tuple, Union, Optional
6
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
+ class FasterChatGLM(PreTrainedModel):
9
+ def __init__(self, model_dir, kernel, *inputs, **kwargs):
10
+ config = AutoConfig.from_pretrained(model_dir, trust_remote_code=True)
11
+ config.n_head = config.num_attention_heads
12
+ config.n_embd = config.hidden_size
13
+ config.n_layer = config.num_layers
14
+ super().__init__(config, *inputs, **kwargs)
15
+ self.kernel = kernel
16
+ self.fake_reg = torch.nn.Linear(2, 2)
17
+ self.position_encoding_2d = True
18
+
19
+ def forward(self, input_ids, position_ids, attention_mask, past_key_values, *args, **kwargs):
20
+ inputs_values = [input_ids, position_ids, attention_mask]
21
+ if past_key_values is not None:
22
+ inputs_values = inputs_values + past_key_values
23
+
24
+ computed = self.kernel.infer(inputs_values)
25
+ logits = computed[0]
26
+ if len(computed) == 1:
27
+ present_key_values = None
28
  else:
29
+ present_key_values = computed[1:]
30
+
31
+ return CausalLMOutputWithPast(logits=logits, past_key_values=present_key_values)
32
+
33
+ def get_masks_and_position_ids(self, seq, mask_position, context_length, device, gmask=False):
34
+ attention_mask = torch.ones((1, context_length, context_length), device=device)
35
+ attention_mask.tril_()
36
+ attention_mask[..., :context_length - 1] = 1
37
+ attention_mask.unsqueeze_(1)
38
+ attention_mask = (attention_mask < 0.5).bool()
39
+
40
+ if self.position_encoding_2d:
41
+ seq_length = seq.index(150004)
42
+ position_ids = torch.arange(context_length, dtype=torch.long, device=device)
43
+ if not gmask:
44
+ position_ids[seq_length:] = mask_position
45
+ block_position_ids = torch.cat((
46
+ torch.zeros(seq_length, dtype=torch.long, device=device),
47
+ torch.arange(context_length - seq_length, dtype=torch.long, device=device) + 1
48
+ ))
49
+ position_ids = torch.stack((position_ids, block_position_ids), dim=0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  else:
51
+ position_ids = torch.arange(context_length, dtype=torch.long, device=device)
52
+ if not gmask:
53
+ position_ids[context_length - 1:] = mask_position
54
+
55
+ position_ids = position_ids.unsqueeze(0)
56
+
57
+ return attention_mask, position_ids
58
+
59
+ def prepare_one_sample(self, input_id, mask_token, past, past_key_values, use_gmask):
60
+
61
+ seq = input_id.tolist()
62
+ mask_position = seq.index(mask_token)
63
+
64
+ if mask_token not in seq:
65
+ raise ValueError("You have to add either [MASK] or [gMASK] in your input")
66
+
67
+ # only last token for input_ids if past is not None
68
+ if past is not None or past_key_values is not None:
69
+ context_length = seq.index(150004)
70
+ last_token = input_id[-1].unsqueeze(-1).unsqueeze(0) # 2 dim
71
+ proc_input_id = last_token
72
+ if self.position_encoding_2d:
73
+ position_ids = torch.tensor([[[mask_position], [len(seq) - context_length]]], dtype=torch.long,
74
+ device=input_id.device)
75
  else:
76
+ position_ids = torch.tensor([[mask_position]], dtype=torch.long, device=input_id.device)
77
+
78
+ attention_mask = torch.zeros(1, 1, 1, 1, device=input_id.device)
79
  else:
80
+ proc_input_id = input_id.unsqueeze(0)
81
+ attention_mask, position_ids = self.get_masks_and_position_ids(
82
+ seq=seq,
83
+ mask_position=mask_position,
84
+ context_length=len(seq),
85
+ device=input_id.device,
86
+ gmask=use_gmask
87
+ )
88
+
89
+ return (proc_input_id.to(torch.int32), position_ids.to(torch.int32),
90
+ attention_mask.to(torch.bool))
91
+
92
+ def prepare_inputs_for_generation(
93
+ self,
94
+ input_ids: torch.LongTensor,
95
+ past: Optional[torch.Tensor] = None,
96
+ past_key_values: Optional[torch.Tensor] = None,
97
+ attention_mask: Optional[torch.Tensor] = None,
98
+ use_cache: bool = None,
99
+ **kwargs
100
+ ) -> dict:
101
 
102
+ MASK, gMASK = 150000, 150001
103
+ mask_token = MASK if MASK in input_ids else gMASK
104
+ use_gmask = False if MASK in input_ids else gMASK
105
+
106
+ batch_input_ids, batch_position_ids, batch_attention_mask = [], [], []
107
+ for input_id in input_ids:
108
+ proc_input_id, position_id, attention_mask = self.prepare_one_sample(
109
+ input_id, mask_token, past, past_key_values, use_gmask)
110
+ batch_input_ids.append(proc_input_id)
111
+ batch_position_ids.append(position_id)
112
+ batch_attention_mask.append(attention_mask)
113
+
114
+ batch_input_ids = torch.vstack(batch_input_ids)
115
+ batch_position_ids = torch.vstack(batch_position_ids)
116
+ batch_attention_mask = torch.vstack(batch_attention_mask)
117
+
118
+ if past is None:
119
+ past = past_key_values
120
+
121
+ if past is not None or past_key_values is not None:
122
+ self.kernel.set_context_mode(False)
123
+ else:
124
+ self.kernel.set_context_mode(self.config.use_cache)
125
 
126
+ return {
127
+ "input_ids": batch_input_ids,
128
+ "past_key_values": past_key_values,
129
+ "position_ids": batch_position_ids,
130
+ "attention_mask": batch_attention_mask
131
+ }
models/1-gpu-fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9bab22c98c57766bc31410c819858fa704490ca76dc04df7331d188c56fba1b1
3
- size 12346572800
 
 
 
 
models/config.ini DELETED
@@ -1,13 +0,0 @@
1
- [glm6b]
2
- model_name = chatglm-6b
3
- head_num = 32
4
- size_per_head = 128
5
- inter_size = 16384
6
- max_pos_seq_len = 2048
7
- num_layer = 28
8
- vocab_size = 130528
9
- start_id = 130004
10
- end_id = 130005
11
- weight_data_type = fp16
12
- tensor_para_size = 1
13
- layernorm_eps = 1e-5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
models/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "THUDM/chatglm-6b",
3
+ "architectures": [
4
+ "ChatGLMModel"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "configuration_chatglm.ChatGLMConfig",
8
+ "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
9
+ "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
10
+ },
11
+ "bos_token_id": 150004,
12
+ "eos_token_id": 150005,
13
+ "hidden_size": 4096,
14
+ "inner_hidden_size": 16384,
15
+ "layernorm_epsilon": 1e-05,
16
+ "max_sequence_length": 2048,
17
+ "model_type": "chatglm",
18
+ "num_attention_heads": 32,
19
+ "num_layers": 28,
20
+ "position_encoding_2d": true,
21
+ "torch_dtype": "float16",
22
+ "transformers_version": "4.23.1",
23
+ "use_cache": true,
24
+ "vocab_size": 150528
25
+ }
models/configuration_chatglm.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """ ChatGLM model configuration """
2
+
3
+ from transformers.configuration_utils import PretrainedConfig
4
+ from transformers.utils import logging
5
+
6
+ logger = logging.get_logger(__name__)
7
+
8
+
9
+ class ChatGLMConfig(PretrainedConfig):
10
+ r"""
11
+ This is the configuration class to store the configuration of a [`~ChatGLMModel`].
12
+ It is used to instantiate an ChatGLM model according to the specified arguments, defining the model
13
+ architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
14
+ the ChatGLM-6B [THUDM/ChatGLM-6B](https://huggingface.co/THUDM/chatglm-6b) architecture.
15
+
16
+ Configuration objects inherit from [`PretrainedConfig`] and can be used
17
+ to control the model outputs. Read the documentation from [`PretrainedConfig`]
18
+ for more information.
19
+
20
+
21
+ Args:
22
+ vocab_size (`int`, *optional*, defaults to 150528):
23
+ Vocabulary size of the ChatGLM-6B model. Defines the number of different tokens that can be represented by the
24
+ `inputs_ids` passed when calling [`~ChatGLMModel`] or
25
+ [`~TFChatGLMModel`].
26
+ hidden_size (`int`, *optional*, defaults to 4096):
27
+ Dimension of the encoder layers and the pooler layer.
28
+ num_hidden_layers (`int`, *optional*, defaults to 28):
29
+ Number of hidden layers in the Transformer encoder.
30
+ num_attention_heads (`int`, *optional*, defaults to 32):
31
+ Number of attention heads for each attention layer in the Transformer encoder.
32
+ inner_hidden_size (`int`, *optional*, defaults to 16384):
33
+ Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
34
+ max_sequence_length (`int`, *optional*, defaults to 512):
35
+ The maximum sequence length that this model might ever be used with.
36
+ Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
37
+ layernorm_epsilon (`float`, *optional*, defaults to 1e-5):
38
+ The epsilon used by the layer normalization layers.
39
+ use_cache (`bool`, *optional*, defaults to `True`):
40
+ Whether the model should return the last key/values attentions (not used by all models).
41
+ Example:
42
+
43
+ ```python
44
+ >>> from configuration_chatglm import ChatGLMConfig
45
+ >>> from modeling_chatglm import ChatGLMModel
46
+
47
+ >>> # Initializing a ChatGLM-6B THUDM/ChatGLM-6B style configuration
48
+ >>> configuration = ChatGLMConfig()
49
+
50
+ >>> # Initializing a model from the THUDM/ChatGLM-6B style configuration
51
+ >>> model = ChatGLMModel(configuration)
52
+
53
+ >>> # Accessing the model configuration
54
+ >>> configuration = model.config
55
+ ```
56
+ """
57
+ model_type = "chatglm"
58
+
59
+ def __init__(
60
+ self,
61
+ vocab_size=150528,
62
+ hidden_size=4096,
63
+ num_layers=28,
64
+ num_attention_heads=32,
65
+ layernorm_epsilon=1e-5,
66
+ use_cache=False,
67
+ bos_token_id=150004,
68
+ eos_token_id=150005,
69
+ pad_token_id=0,
70
+ max_sequence_length=2048,
71
+ inner_hidden_size=16384,
72
+ position_encoding_2d=True,
73
+ **kwargs
74
+ ):
75
+ self.num_layers = num_layers
76
+ self.vocab_size = vocab_size
77
+ self.hidden_size = hidden_size
78
+ self.num_attention_heads = num_attention_heads
79
+ self.max_sequence_length = max_sequence_length
80
+ self.layernorm_epsilon = layernorm_epsilon
81
+ self.inner_hidden_size = inner_hidden_size
82
+ self.use_cache = use_cache
83
+ self.bos_token_id = bos_token_id
84
+ self.eos_token_id = eos_token_id
85
+ self.pad_token_id = pad_token_id
86
+ self.position_encoding_2d = position_encoding_2d
87
+ super().__init__(
88
+ pad_token_id=pad_token_id,
89
+ bos_token_id=bos_token_id,
90
+ eos_token_id=eos_token_id,
91
+ **kwargs
92
+ )
lyraChatGLM/ftlib/libth_transformer_sm80_cu12.so → models/glm6b-bs8.ftm RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:146841b4ef362048507a576d20cb1e5bb02e0d67f3fcfce351ce25f00989dfbd
3
- size 200980552
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54e97fb542110a3a226058eb76b6019bbaf91d3165da6ac95aa3976ee75b0421
3
+ size 14706031108
models/ice_text.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5e974d9a69c242ce014c88c2b26089270f6198f3c0b700a887666cd3e816f17e
3
- size 2706249
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99871e0c85db81ad7af1028854fd091cd5778c8414ae9d94bbbc10d02c831c21
3
+ size 2699926
models/tokenization_chatglm.py CHANGED
@@ -1,13 +1,17 @@
1
  """Tokenization classes for ChatGLM."""
 
 
2
  from typing import List, Optional, Union
 
3
  import os
 
 
4
 
5
  from transformers.tokenization_utils import PreTrainedTokenizer
6
- from transformers.utils import logging, PaddingStrategy
7
- from transformers.tokenization_utils_base import EncodedInput, BatchEncoding
8
- from typing import Dict
9
- import sentencepiece as spm
10
- import numpy as np
11
 
12
  logger = logging.get_logger(__name__)
13
 
@@ -16,55 +20,61 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
16
  }
17
 
18
 
19
- class TextTokenizer:
20
- def __init__(self, model_path):
21
- self.sp = spm.SentencePieceProcessor()
22
- self.sp.Load(model_path)
23
- self.num_tokens = self.sp.vocab_size()
24
-
25
- def encode(self, text):
26
- return self.sp.EncodeAsIds(text)
27
-
28
- def decode(self, ids: List[int]):
29
- return self.sp.DecodeIds(ids)
30
-
31
- def tokenize(self, text):
32
- return self.sp.EncodeAsPieces(text)
33
-
34
- def convert_tokens_to_string(self, tokens):
35
- return self.sp.DecodePieces(tokens)
36
-
37
- def convert_tokens_to_ids(self, tokens):
38
- return [self.sp.PieceToId(token) for token in tokens]
39
-
40
- def convert_token_to_id(self, token):
41
- return self.sp.PieceToId(token)
42
-
43
- def convert_id_to_token(self, idx):
44
- return self.sp.IdToPiece(idx)
45
-
46
- def __len__(self):
47
- return self.num_tokens
48
-
49
-
50
  class SPTokenizer:
51
  def __init__(
52
- self,
53
- vocab_file,
54
- num_image_tokens=20000,
55
- max_blank_length=80,
56
- byte_fallback=True,
57
  ):
58
  assert vocab_file is not None
59
  self.vocab_file = vocab_file
60
- self.num_image_tokens = num_image_tokens
61
  self.special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "<unused_0>", "<sop>", "<eop>", "<ENC>", "<dBLOCK>"]
62
  self.max_blank_length = max_blank_length
63
  self.byte_fallback = byte_fallback
64
- self.text_tokenizer = TextTokenizer(vocab_file)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
- def _get_text_tokenizer(self):
67
- return self.text_tokenizer
 
 
 
68
 
69
  @staticmethod
70
  def get_blank_token(length: int):
@@ -75,6 +85,10 @@ class SPTokenizer:
75
  def get_tab_token():
76
  return f"<|tab|>"
77
 
 
 
 
 
78
  @property
79
  def num_text_tokens(self):
80
  return self.text_tokenizer.num_tokens
@@ -98,7 +112,7 @@ class SPTokenizer:
98
  return text
99
 
100
  def encode(
101
- self, text: str, linebreak=True, whitespaces=True, add_dummy_prefix=True
102
  ) -> List[int]:
103
  """
104
  @param text: Text to encode.
@@ -110,31 +124,22 @@ class SPTokenizer:
110
  text = self._preprocess(text, linebreak, whitespaces)
111
  if not add_dummy_prefix:
112
  text = "<n>" + text
113
- tmp = self._get_text_tokenizer().encode(text)
114
  tokens = [x + self.num_image_tokens for x in tmp]
115
  return tokens if add_dummy_prefix else tokens[2:]
116
 
117
- def postprocess(self, text):
 
 
 
118
  text = text.replace("<n>", "\n")
119
  text = text.replace(SPTokenizer.get_tab_token(), "\t")
120
  for i in range(2, self.max_blank_length + 1):
121
  text = text.replace(self.get_blank_token(i), " " * i)
122
  return text
123
 
124
- def decode(self, text_ids: List[int]) -> str:
125
- ids = [int(_id) - self.num_image_tokens for _id in text_ids]
126
- ids = [_id for _id in ids if _id >= 0]
127
- text = self._get_text_tokenizer().decode(ids)
128
- text = self.postprocess(text)
129
- return text
130
-
131
- def decode_tokens(self, tokens: List[str]) -> str:
132
- text = self._get_text_tokenizer().convert_tokens_to_string(tokens)
133
- text = self.postprocess(text)
134
- return text
135
-
136
  def tokenize(
137
- self, text: str, linebreak=True, whitespaces=True, add_dummy_prefix=True
138
  ) -> List[str]:
139
  """
140
  @param text: Text to encode.
@@ -146,7 +151,7 @@ class SPTokenizer:
146
  text = self._preprocess(text, linebreak, whitespaces)
147
  if not add_dummy_prefix:
148
  text = "<n>" + text
149
- tokens = self._get_text_tokenizer().tokenize(text)
150
  return tokens if add_dummy_prefix else tokens[2:]
151
 
152
  def __getitem__(self, x: Union[int, str]):
@@ -175,36 +180,25 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
175
 
176
  vocab_files_names = {"vocab_file": "ice_text.model"}
177
  max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
178
- model_input_names = ["input_ids", "attention_mask", "position_ids"]
179
 
180
  def __init__(
181
  self,
182
  vocab_file,
183
  do_lower_case=False,
184
  remove_space=False,
185
- bos_token='<sop>',
186
- eos_token='<eop>',
187
- end_token='</s>',
188
  mask_token='[MASK]',
189
  gmask_token='[gMASK]',
190
  padding_side="left",
191
- pad_token="<pad>",
192
- unk_token="<unk>",
193
- num_image_tokens=20000,
194
  **kwargs
195
  ) -> None:
196
  super().__init__(
197
  do_lower_case=do_lower_case,
198
  remove_space=remove_space,
199
  padding_side=padding_side,
200
- bos_token=bos_token,
201
- eos_token=eos_token,
202
- end_token=end_token,
203
- mask_token=mask_token,
204
- gmask_token=gmask_token,
205
- pad_token=pad_token,
206
- unk_token=unk_token,
207
- num_image_tokens=num_image_tokens,
208
  **kwargs
209
  )
210
 
@@ -214,29 +208,23 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
214
 
215
  self.bos_token = bos_token
216
  self.eos_token = eos_token
217
- self.end_token = end_token
218
  self.mask_token = mask_token
219
- self.gmask_token = gmask_token
220
 
221
- self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
222
 
223
  """ Initialisation """
224
 
225
  @property
226
- def gmask_token_id(self) -> Optional[int]:
227
- if self.gmask_token is None:
228
- return None
229
- return self.convert_tokens_to_ids(self.gmask_token)
230
-
231
- @property
232
- def end_token_id(self) -> Optional[int]:
233
  """
234
- `Optional[int]`: Id of the end of context token in the vocabulary. Returns `None` if the token has not been
235
  set.
236
  """
237
- if self.end_token is None:
238
  return None
239
- return self.convert_tokens_to_ids(self.end_token)
240
 
241
  @property
242
  def vocab_size(self):
@@ -268,21 +256,25 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
268
 
269
  return seq
270
 
271
- def convert_tokens_to_string(self, tokens: List[str]) -> str:
272
- return self.sp_tokenizer.decode_tokens(tokens)
273
-
274
- def _decode(
275
  self,
276
- token_ids: Union[int, List[int]],
 
 
 
277
  **kwargs
278
  ) -> str:
279
- if isinstance(token_ids, int):
280
- token_ids = [token_ids]
281
- if len(token_ids) == 0:
282
- return ""
283
- if self.pad_token_id in token_ids: # remove pad
284
- token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
285
- return super()._decode(token_ids, **kwargs)
 
 
 
 
286
 
287
  def _convert_token_to_id(self, token):
288
  """ Converts a token (str) in an id using the vocab. """
@@ -307,7 +299,7 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
307
  """
308
  if os.path.isdir(save_directory):
309
  vocab_file = os.path.join(
310
- save_directory, self.vocab_files_names["vocab_file"]
311
  )
312
  else:
313
  vocab_file = save_directory
@@ -339,105 +331,16 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
339
  Returns:
340
  `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
341
  """
342
- gmask_id = self.sp_tokenizer[self.gmask_token]
343
- eos_id = self.sp_tokenizer[self.eos_token]
344
- token_ids_0 = token_ids_0 + [gmask_id, self.sp_tokenizer[self.bos_token]]
345
  if token_ids_1 is not None:
346
- token_ids_0 = token_ids_0 + token_ids_1 + [eos_id]
347
- return token_ids_0
 
 
 
348
 
349
- def _pad(
350
- self,
351
- encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
352
- max_length: Optional[int] = None,
353
- padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
354
- pad_to_multiple_of: Optional[int] = None,
355
- return_attention_mask: Optional[bool] = None,
356
- ) -> dict:
357
- """
358
- Pad encoded inputs (on left/right and up to predefined length or max length in the batch)
359
 
360
- Args:
361
- encoded_inputs:
362
- Dictionary of tokenized inputs (`List[int]`) or batch of tokenized inputs (`List[List[int]]`).
363
- max_length: maximum length of the returned list and optionally padding length (see below).
364
- Will truncate by taking into account the special tokens.
365
- padding_strategy: PaddingStrategy to use for padding.
366
-
367
- - PaddingStrategy.LONGEST Pad to the longest sequence in the batch
368
- - PaddingStrategy.MAX_LENGTH: Pad to the max length (default)
369
- - PaddingStrategy.DO_NOT_PAD: Do not pad
370
- The tokenizer padding sides are defined in self.padding_side:
371
-
372
- - 'left': pads on the left of the sequences
373
- - 'right': pads on the right of the sequences
374
- pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
375
- This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
376
- `>= 7.5` (Volta).
377
- return_attention_mask:
378
- (optional) Set to False to avoid returning attention mask (default: set to model specifics)
379
- """
380
- # Load from model defaults
381
- bos_token_id = self.sp_tokenizer[self.bos_token]
382
- mask_token_id = self.sp_tokenizer[self.mask_token]
383
- gmask_token_id = self.sp_tokenizer[self.gmask_token]
384
- assert self.padding_side == "left"
385
-
386
- required_input = encoded_inputs[self.model_input_names[0]]
387
- seq_length = len(required_input)
388
-
389
- if padding_strategy == PaddingStrategy.LONGEST:
390
- max_length = len(required_input)
391
-
392
- if max_length is not None and pad_to_multiple_of is not None and (max_length % pad_to_multiple_of != 0):
393
- max_length = ((max_length // pad_to_multiple_of) + 1) * pad_to_multiple_of
394
-
395
- needs_to_be_padded = padding_strategy != PaddingStrategy.DO_NOT_PAD and len(required_input) != max_length
396
-
397
- # Initialize attention mask if not present.
398
- if max_length is not None:
399
- if "attention_mask" not in encoded_inputs:
400
- if bos_token_id in required_input:
401
- context_length = required_input.index(bos_token_id)
402
- else:
403
- context_length = seq_length
404
- attention_mask = np.ones((1, seq_length, seq_length))
405
- attention_mask = np.tril(attention_mask)
406
- attention_mask[:, :, :context_length] = 1
407
- attention_mask = np.bool_(attention_mask < 0.5)
408
- encoded_inputs["attention_mask"] = attention_mask
409
-
410
- if "position_ids" not in encoded_inputs:
411
- if bos_token_id in required_input:
412
- context_length = required_input.index(bos_token_id)
413
- else:
414
- context_length = seq_length
415
- position_ids = np.arange(seq_length, dtype=np.int64)
416
- mask_token = mask_token_id if mask_token_id in required_input else gmask_token_id
417
- if mask_token in required_input:
418
- mask_position = required_input.index(mask_token)
419
- position_ids[context_length:] = mask_position
420
- block_position_ids = np.concatenate(
421
- [np.zeros(context_length, dtype=np.int64),
422
- np.arange(1, seq_length - context_length + 1, dtype=np.int64)])
423
- encoded_inputs["position_ids"] = np.stack([position_ids, block_position_ids], axis=0)
424
-
425
- if needs_to_be_padded:
426
- difference = max_length - len(required_input)
427
-
428
- if "attention_mask" in encoded_inputs:
429
- encoded_inputs["attention_mask"] = np.pad(encoded_inputs["attention_mask"],
430
- pad_width=[(0, 0), (difference, 0), (difference, 0)],
431
- mode='constant', constant_values=True)
432
- if "token_type_ids" in encoded_inputs:
433
- encoded_inputs["token_type_ids"] = [self.pad_token_type_id] * difference + encoded_inputs[
434
- "token_type_ids"
435
- ]
436
- if "special_tokens_mask" in encoded_inputs:
437
- encoded_inputs["special_tokens_mask"] = [1] * difference + encoded_inputs["special_tokens_mask"]
438
- if "position_ids" in encoded_inputs:
439
- encoded_inputs["position_ids"] = np.pad(encoded_inputs["position_ids"],
440
- pad_width=[(0, 0), (difference, 0)])
441
- encoded_inputs[self.model_input_names[0]] = [self.pad_token_id] * difference + required_input
442
-
443
- return encoded_inputs
 
1
  """Tokenization classes for ChatGLM."""
2
+ import sys
3
+ import unicodedata
4
  from typing import List, Optional, Union
5
+ from functools import lru_cache
6
  import os
7
+ import collections
8
+ import re
9
 
10
  from transformers.tokenization_utils import PreTrainedTokenizer
11
+ from icetk.text_tokenizer import TextTokenizer
12
+ from icetk.utils import auto_create
13
+ import icetk.sentencepiece_model_pb2 as sp_model
14
+ from transformers.utils import logging
 
15
 
16
  logger = logging.get_logger(__name__)
17
 
 
20
  }
21
 
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  class SPTokenizer:
24
  def __init__(
25
+ self,
26
+ vocab_file,
27
+ max_blank_length=80,
28
+ byte_fallback=True,
 
29
  ):
30
  assert vocab_file is not None
31
  self.vocab_file = vocab_file
 
32
  self.special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "<unused_0>", "<sop>", "<eop>", "<ENC>", "<dBLOCK>"]
33
  self.max_blank_length = max_blank_length
34
  self.byte_fallback = byte_fallback
35
+ self.text_tokenizer = self._build_text_tokenizer(encode_special_tokens=False)
36
+ self.special_text_tokenizer = self._build_text_tokenizer(encode_special_tokens=True)
37
+
38
+ @staticmethod
39
+ def _configure_tokenizer(
40
+ text_tokenizer: TextTokenizer,
41
+ special_tokens: List[str],
42
+ max_blank_length: int,
43
+ byte_fallback: bool,
44
+ encode_special_tokens=False,
45
+ ):
46
+ # special token
47
+ special_token_type = 4 if encode_special_tokens else 3 # 3 - CONTROL, 4 - USER_DEFINE
48
+ for token in special_tokens:
49
+ text_tokenizer.proto.pieces.append(
50
+ sp_model.ModelProto.SentencePiece(piece=token, score=0.0, type=special_token_type)
51
+ )
52
+ # whitespaces
53
+ for token in [SPTokenizer.get_tab_token()] + [
54
+ SPTokenizer.get_blank_token(i) for i in range(2, max_blank_length + 1)
55
+ ]:
56
+ text_tokenizer.proto.pieces.append(sp_model.ModelProto.SentencePiece(piece=token, score=0.0, type=4))
57
+ # byte fallback
58
+ if byte_fallback:
59
+ text_tokenizer.proto.trainer_spec.byte_fallback = True
60
+ for i in range(256):
61
+ text_tokenizer.proto.pieces.append(
62
+ sp_model.ModelProto.SentencePiece(piece="<0x{:02X}>".format(i), score=0.0, type=6)
63
+ )
64
+ text_tokenizer.refresh()
65
+
66
+ def _build_text_tokenizer(self, encode_special_tokens=False):
67
+ tokenizer = TextTokenizer(self.vocab_file)
68
+ self._configure_tokenizer(
69
+ tokenizer, self.special_tokens, self.max_blank_length, self.byte_fallback, encode_special_tokens
70
+ )
71
+ return tokenizer
72
 
73
+ def _get_text_tokenizer(self, encode_special_tokens=False):
74
+ if encode_special_tokens:
75
+ return self.special_text_tokenizer
76
+ else:
77
+ return self.text_tokenizer
78
 
79
  @staticmethod
80
  def get_blank_token(length: int):
 
85
  def get_tab_token():
86
  return f"<|tab|>"
87
 
88
+ @property
89
+ def num_image_tokens(self):
90
+ return 20000
91
+
92
  @property
93
  def num_text_tokens(self):
94
  return self.text_tokenizer.num_tokens
 
112
  return text
113
 
114
  def encode(
115
+ self, text: str, linebreak=True, whitespaces=True, special_tokens=False, add_dummy_prefix=True
116
  ) -> List[int]:
117
  """
118
  @param text: Text to encode.
 
124
  text = self._preprocess(text, linebreak, whitespaces)
125
  if not add_dummy_prefix:
126
  text = "<n>" + text
127
+ tmp = self._get_text_tokenizer(encode_special_tokens=special_tokens).encode(text)
128
  tokens = [x + self.num_image_tokens for x in tmp]
129
  return tokens if add_dummy_prefix else tokens[2:]
130
 
131
+ def decode(self, text_ids: List[int], special_tokens=False) -> str:
132
+ ids = [int(_id) - self.num_image_tokens for _id in text_ids]
133
+ ids = [_id for _id in ids if _id >= 0]
134
+ text = self._get_text_tokenizer(encode_special_tokens=special_tokens).decode(ids)
135
  text = text.replace("<n>", "\n")
136
  text = text.replace(SPTokenizer.get_tab_token(), "\t")
137
  for i in range(2, self.max_blank_length + 1):
138
  text = text.replace(self.get_blank_token(i), " " * i)
139
  return text
140
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  def tokenize(
142
+ self, text: str, linebreak=True, whitespaces=True, special_tokens=False, add_dummy_prefix=True
143
  ) -> List[str]:
144
  """
145
  @param text: Text to encode.
 
151
  text = self._preprocess(text, linebreak, whitespaces)
152
  if not add_dummy_prefix:
153
  text = "<n>" + text
154
+ tokens = self._get_text_tokenizer(encode_special_tokens=special_tokens).tokenize(text)
155
  return tokens if add_dummy_prefix else tokens[2:]
156
 
157
  def __getitem__(self, x: Union[int, str]):
 
180
 
181
  vocab_files_names = {"vocab_file": "ice_text.model"}
182
  max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
183
+ model_input_names = ["input_ids"]
184
 
185
  def __init__(
186
  self,
187
  vocab_file,
188
  do_lower_case=False,
189
  remove_space=False,
190
+ bos_token='sop',
191
+ eos_token='eos',
192
+ eop_token='eop',
193
  mask_token='[MASK]',
194
  gmask_token='[gMASK]',
195
  padding_side="left",
 
 
 
196
  **kwargs
197
  ) -> None:
198
  super().__init__(
199
  do_lower_case=do_lower_case,
200
  remove_space=remove_space,
201
  padding_side=padding_side,
 
 
 
 
 
 
 
 
202
  **kwargs
203
  )
204
 
 
208
 
209
  self.bos_token = bos_token
210
  self.eos_token = eos_token
211
+ self.eop_token = eop_token
212
  self.mask_token = mask_token
213
+ self.gMASK_token = gmask_token
214
 
215
+ self.sp_tokenizer = SPTokenizer(vocab_file)
216
 
217
  """ Initialisation """
218
 
219
  @property
220
+ def eop_token_id(self) -> Optional[int]:
 
 
 
 
 
 
221
  """
222
+ `Optional[int]`: Id of the end of sentence token in the vocabulary. Returns `None` if the token has not been
223
  set.
224
  """
225
+ if self.eop_token is None:
226
  return None
227
+ return self.convert_tokens_to_ids(self.eop_token)
228
 
229
  @property
230
  def vocab_size(self):
 
256
 
257
  return seq
258
 
259
+ def decode(
 
 
 
260
  self,
261
+ token_ids: Union[List[int], List[List[int]]],
262
+ skip_special_tokens: bool = False,
263
+ clean_up_tokenization_spaces: bool = True,
264
+ spaces_between_special_tokens: bool = True,
265
  **kwargs
266
  ) -> str:
267
+ if isinstance(token_ids[0], list):
268
+ tokens = []
269
+ for single_token_ids in token_ids:
270
+ if self.pad_token_id in single_token_ids: # remove pad
271
+ single_token_ids = list(filter((self.pad_token_id).__ne__, single_token_ids))
272
+ tokens.append(self.sp_tokenizer.decode(single_token_ids))
273
+ return (tokens)
274
+ else:
275
+ if self.pad_token_id in token_ids: # remove pad
276
+ token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
277
+ return self.sp_tokenizer.decode(token_ids)
278
 
279
  def _convert_token_to_id(self, token):
280
  """ Converts a token (str) in an id using the vocab. """
 
299
  """
300
  if os.path.isdir(save_directory):
301
  vocab_file = os.path.join(
302
+ save_directory, VOCAB_FILES_NAMES["vocab_file"]
303
  )
304
  else:
305
  vocab_file = save_directory
 
331
  Returns:
332
  `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
333
  """
 
 
 
334
  if token_ids_1 is not None:
335
+ token_ids_0 += token_ids_1
336
+ mask_ids = self.sp_tokenizer[self.mask_token]
337
+ gmask_ids = self.sp_tokenizer[self.gMASK_token]
338
+ if mask_ids not in token_ids_0 and gmask_ids not in token_ids_0:
339
+ token_ids_0 += [gmask_ids]
340
 
341
+ if token_ids_0[-1] != mask_ids and token_ids_0[-1] != gmask_ids:
342
+ token_ids_0 += [self.sp_tokenizer[self.eos_token]]
 
 
 
 
 
 
 
 
343
 
344
+ token_ids_0 += [self.sp_tokenizer[self.bos_token]]
345
+
346
+ return token_ids_0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
models/tokenizer_config.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "name_or_path": "THUDM/chatglm-6b",
3
  "bos_token": "<sop>",
4
- "eos_token": "<eop>",
5
- "end_token": "</s>",
6
  "gmask_token": "[gMASK]",
7
  "mask_token": "[MASK]",
8
  "pad_token": "<pad>",
@@ -10,7 +10,6 @@
10
  "remove_space": false,
11
  "do_lower_case": false,
12
  "tokenizer_class": "ChatGLMTokenizer",
13
- "num_image_tokens": 0,
14
  "auto_map": {
15
  "AutoTokenizer": [
16
  "tokenization_chatglm.ChatGLMTokenizer",
 
1
  {
2
  "name_or_path": "THUDM/chatglm-6b",
3
  "bos_token": "<sop>",
4
+ "eop_token": "<eop>",
5
+ "eos_token": "</s>",
6
  "gmask_token": "[gMASK]",
7
  "mask_token": "[MASK]",
8
  "pad_token": "<pad>",
 
10
  "remove_space": false,
11
  "do_lower_case": false,
12
  "tokenizer_class": "ChatGLMTokenizer",
 
13
  "auto_map": {
14
  "AutoTokenizer": [
15
  "tokenization_chatglm.ChatGLMTokenizer",
requirements.txt CHANGED
@@ -1,9 +1,4 @@
1
  icetk
2
- cpm_kernels
3
- transformers
4
- huggingface_hub
5
- numpy
6
- setuptools
7
  torch
8
- h5py
9
- protobuf==3.20.3
 
1
  icetk
 
 
 
 
 
2
  torch
3
+ transformers
4
+