torotoki commited on
Commit
36dbf43
1 Parent(s): 496294f

Upload folder using huggingface_hub

Browse files
LICENSE-cc-by-nc-4.txt ADDED
@@ -0,0 +1,622 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2023- Preferred Networks, Inc. All rights reserved.
2
+
3
+
4
+ Attribution-NonCommercial 4.0 International
5
+
6
+ =======================================================================
7
+
8
+ Creative Commons Corporation ("Creative Commons") is not a law firm and
9
+ does not provide legal services or legal advice. Distribution of
10
+ Creative Commons public licenses does not create a lawyer-client or
11
+ other relationship. Creative Commons makes its licenses and related
12
+ information available on an "as-is" basis. Creative Commons gives no
13
+ warranties regarding its licenses, any material licensed under their
14
+ terms and conditions, or any related information. Creative Commons
15
+ disclaims all liability for damages resulting from their use to the
16
+ fullest extent possible.
17
+
18
+ Using Creative Commons Public Licenses
19
+
20
+ Creative Commons public licenses provide a standard set of terms and
21
+ conditions that creators and other rights holders may use to share
22
+ original works of authorship and other material subject to copyright
23
+ and certain other rights specified in the public license below. The
24
+ following considerations are for informational purposes only, are not
25
+ exhaustive, and do not form part of our licenses.
26
+
27
+ Considerations for licensors: Our public licenses are
28
+ intended for use by those authorized to give the public
29
+ permission to use material in ways otherwise restricted by
30
+ copyright and certain other rights. Our licenses are
31
+ irrevocable. Licensors should read and understand the terms
32
+ and conditions of the license they choose before applying it.
33
+ Licensors should also secure all rights necessary before
34
+ applying our licenses so that the public can reuse the
35
+ material as expected. Licensors should clearly mark any
36
+ material not subject to the license. This includes other CC-
37
+ licensed material, or material used under an exception or
38
+ limitation to copyright. More considerations for licensors:
39
+ wiki.creativecommons.org/Considerations_for_licensors
40
+
41
+ Considerations for the public: By using one of our public
42
+ licenses, a licensor grants the public permission to use the
43
+ licensed material under specified terms and conditions. If
44
+ the licensor's permission is not necessary for any reason--for
45
+ example, because of any applicable exception or limitation to
46
+ copyright--then that use is not regulated by the license. Our
47
+ licenses grant only permissions under copyright and certain
48
+ other rights that a licensor has authority to grant. Use of
49
+ the licensed material may still be restricted for other
50
+ reasons, including because others have copyright or other
51
+ rights in the material. A licensor may make special requests,
52
+ such as asking that all changes be marked or described.
53
+ Although not required by our licenses, you are encouraged to
54
+ respect those requests where reasonable. More considerations
55
+ for the public:
56
+ wiki.creativecommons.org/Considerations_for_licensees
57
+
58
+ =======================================================================
59
+
60
+ Creative Commons Attribution-NonCommercial 4.0 International Public
61
+ License
62
+
63
+ By exercising the Licensed Rights (defined below), You accept and agree
64
+ to be bound by the terms and conditions of this Creative Commons
65
+ Attribution-NonCommercial 4.0 International Public License ("Public
66
+ License"). To the extent this Public License may be interpreted as a
67
+ contract, You are granted the Licensed Rights in consideration of Your
68
+ acceptance of these terms and conditions, and the Licensor grants You
69
+ such rights in consideration of benefits the Licensor receives from
70
+ making the Licensed Material available under these terms and
71
+ conditions.
72
+
73
+
74
+ Section 1 -- Definitions.
75
+
76
+ a. Adapted Material means material subject to Copyright and Similar
77
+ Rights that is derived from or based upon the Licensed Material
78
+ and in which the Licensed Material is translated, altered,
79
+ arranged, transformed, or otherwise modified in a manner requiring
80
+ permission under the Copyright and Similar Rights held by the
81
+ Licensor. For purposes of this Public License, where the Licensed
82
+ Material is a musical work, performance, or sound recording,
83
+ Adapted Material is always produced where the Licensed Material is
84
+ synched in timed relation with a moving image.
85
+
86
+ b. Adapter's License means the license You apply to Your Copyright
87
+ and Similar Rights in Your contributions to Adapted Material in
88
+ accordance with the terms and conditions of this Public License.
89
+
90
+ c. Copyright and Similar Rights means copyright and/or similar rights
91
+ closely related to copyright including, without limitation,
92
+ performance, broadcast, sound recording, and Sui Generis Database
93
+ Rights, without regard to how the rights are labeled or
94
+ categorized. For purposes of this Public License, the rights
95
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
96
+ Rights.
97
+ d. Effective Technological Measures means those measures that, in the
98
+ absence of proper authority, may not be circumvented under laws
99
+ fulfilling obligations under Article 11 of the WIPO Copyright
100
+ Treaty adopted on December 20, 1996, and/or similar international
101
+ agreements.
102
+
103
+ e. Exceptions and Limitations means fair use, fair dealing, and/or
104
+ any other exception or limitation to Copyright and Similar Rights
105
+ that applies to Your use of the Licensed Material.
106
+
107
+ f. Licensed Material means the artistic or literary work, database,
108
+ or other material to which the Licensor applied this Public
109
+ License.
110
+
111
+ g. Licensed Rights means the rights granted to You subject to the
112
+ terms and conditions of this Public License, which are limited to
113
+ all Copyright and Similar Rights that apply to Your use of the
114
+ Licensed Material and that the Licensor has authority to license.
115
+
116
+ h. Licensor means the individual(s) or entity(ies) granting rights
117
+ under this Public License.
118
+
119
+ i. NonCommercial means not primarily intended for or directed towards
120
+ commercial advantage or monetary compensation. For purposes of
121
+ this Public License, the exchange of the Licensed Material for
122
+ other material subject to Copyright and Similar Rights by digital
123
+ file-sharing or similar means is NonCommercial provided there is
124
+ no payment of monetary compensation in connection with the
125
+ exchange.
126
+
127
+ j. Share means to provide material to the public by any means or
128
+ process that requires permission under the Licensed Rights, such
129
+ as reproduction, public display, public performance, distribution,
130
+ dissemination, communication, or importation, and to make material
131
+ available to the public including in ways that members of the
132
+ public may access the material from a place and at a time
133
+ individually chosen by them.
134
+
135
+ k. Sui Generis Database Rights means rights other than copyright
136
+ resulting from Directive 96/9/EC of the European Parliament and of
137
+ the Council of 11 March 1996 on the legal protection of databases,
138
+ as amended and/or succeeded, as well as other essentially
139
+ equivalent rights anywhere in the world.
140
+
141
+ l. You means the individual or entity exercising the Licensed Rights
142
+ under this Public License. Your has a corresponding meaning.
143
+
144
+
145
+ Section 2 -- Scope.
146
+
147
+ a. License grant.
148
+
149
+ 1. Subject to the terms and conditions of this Public License,
150
+ the Licensor hereby grants You a worldwide, royalty-free,
151
+ non-sublicensable, non-exclusive, irrevocable license to
152
+ exercise the Licensed Rights in the Licensed Material to:
153
+
154
+ a. reproduce and Share the Licensed Material, in whole or
155
+ in part, for NonCommercial purposes only; and
156
+
157
+ b. produce, reproduce, and Share Adapted Material for
158
+ NonCommercial purposes only.
159
+
160
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
161
+ Exceptions and Limitations apply to Your use, this Public
162
+ License does not apply, and You do not need to comply with
163
+ its terms and conditions.
164
+
165
+ 3. Term. The term of this Public License is specified in Section
166
+ 6(a).
167
+
168
+ 4. Media and formats; technical modifications allowed. The
169
+ Licensor authorizes You to exercise the Licensed Rights in
170
+ all media and formats whether now known or hereafter created,
171
+ and to make technical modifications necessary to do so. The
172
+ Licensor waives and/or agrees not to assert any right or
173
+ authority to forbid You from making technical modifications
174
+ necessary to exercise the Licensed Rights, including
175
+ technical modifications necessary to circumvent Effective
176
+ Technological Measures. For purposes of this Public License,
177
+ simply making modifications authorized by this Section 2(a)
178
+ (4) never produces Adapted Material.
179
+
180
+ 5. Downstream recipients.
181
+
182
+ a. Offer from the Licensor -- Licensed Material. Every
183
+ recipient of the Licensed Material automatically
184
+ receives an offer from the Licensor to exercise the
185
+ Licensed Rights under the terms and conditions of this
186
+ Public License.
187
+
188
+ b. No downstream restrictions. You may not offer or impose
189
+ any additional or different terms or conditions on, or
190
+ apply any Effective Technological Measures to, the
191
+ Licensed Material if doing so restricts exercise of the
192
+ Licensed Rights by any recipient of the Licensed
193
+ Material.
194
+
195
+ 6. No endorsement. Nothing in this Public License constitutes or
196
+ may be construed as permission to assert or imply that You
197
+ are, or that Your use of the Licensed Material is, connected
198
+ with, or sponsored, endorsed, or granted official status by,
199
+ the Licensor or others designated to receive attribution as
200
+ provided in Section 3(a)(1)(A)(i).
201
+
202
+ b. Other rights.
203
+
204
+ 1. Moral rights, such as the right of integrity, are not
205
+ licensed under this Public License, nor are publicity,
206
+ privacy, and/or other similar personality rights; however, to
207
+ the extent possible, the Licensor waives and/or agrees not to
208
+ assert any such rights held by the Licensor to the limited
209
+ extent necessary to allow You to exercise the Licensed
210
+ Rights, but not otherwise.
211
+
212
+ 2. Patent and trademark rights are not licensed under this
213
+ Public License.
214
+
215
+ 3. To the extent possible, the Licensor waives any right to
216
+ collect royalties from You for the exercise of the Licensed
217
+ Rights, whether directly or through a collecting society
218
+ under any voluntary or waivable statutory or compulsory
219
+ licensing scheme. In all other cases the Licensor expressly
220
+ reserves any right to collect such royalties, including when
221
+ the Licensed Material is used other than for NonCommercial
222
+ purposes.
223
+
224
+
225
+ Section 3 -- License Conditions.
226
+
227
+ Your exercise of the Licensed Rights is expressly made subject to the
228
+ following conditions.
229
+
230
+ a. Attribution.
231
+
232
+ 1. If You Share the Licensed Material (including in modified
233
+ form), You must:
234
+
235
+ a. retain the following if it is supplied by the Licensor
236
+ with the Licensed Material:
237
+
238
+ i. identification of the creator(s) of the Licensed
239
+ Material and any others designated to receive
240
+ attribution, in any reasonable manner requested by
241
+ the Licensor (including by pseudonym if
242
+ designated);
243
+
244
+ ii. a copyright notice;
245
+
246
+ iii. a notice that refers to this Public License;
247
+
248
+ iv. a notice that refers to the disclaimer of
249
+ warranties;
250
+
251
+ v. a URI or hyperlink to the Licensed Material to the
252
+ extent reasonably practicable;
253
+
254
+ b. indicate if You modified the Licensed Material and
255
+ retain an indication of any previous modifications; and
256
+
257
+ c. indicate the Licensed Material is licensed under this
258
+ Public License, and include the text of, or the URI or
259
+ hyperlink to, this Public License.
260
+
261
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
262
+ reasonable manner based on the medium, means, and context in
263
+ which You Share the Licensed Material. For example, it may be
264
+ reasonable to satisfy the conditions by providing a URI or
265
+ hyperlink to a resource that includes the required
266
+ information.
267
+
268
+ 3. If requested by the Licensor, You must remove any of the
269
+ information required by Section 3(a)(1)(A) to the extent
270
+ reasonably practicable.
271
+
272
+ 4. If You Share Adapted Material You produce, the Adapter's
273
+ License You apply must not prevent recipients of the Adapted
274
+ Material from complying with this Public License.
275
+
276
+
277
+ Section 4 -- Sui Generis Database Rights.
278
+
279
+ Where the Licensed Rights include Sui Generis Database Rights that
280
+ apply to Your use of the Licensed Material:
281
+
282
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
283
+ to extract, reuse, reproduce, and Share all or a substantial
284
+ portion of the contents of the database for NonCommercial purposes
285
+ only;
286
+
287
+ b. if You include all or a substantial portion of the database
288
+ contents in a database in which You have Sui Generis Database
289
+ Rights, then the database in which You have Sui Generis Database
290
+ Rights (but not its individual contents) is Adapted Material; and
291
+
292
+ c. You must comply with the conditions in Section 3(a) if You Share
293
+ all or a substantial portion of the contents of the database.
294
+
295
+ For the avoidance of doubt, this Section 4 supplements and does not
296
+ replace Your obligations under this Public License where the Licensed
297
+ Rights include other Copyright and Similar Rights.
298
+
299
+
300
+ Section 5 -- Disclaimer of Warranties and Limitation of Liability.
301
+
302
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
303
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
304
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
305
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
306
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
307
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
308
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
309
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
310
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
311
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
312
+
313
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
314
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
315
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
316
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
317
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
318
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
319
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
320
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
321
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
322
+
323
+ c. The disclaimer of warranties and limitation of liability provided
324
+ above shall be interpreted in a manner that, to the extent
325
+ possible, most closely approximates an absolute disclaimer and
326
+ waiver of all liability.
327
+
328
+
329
+ Section 6 -- Term and Termination.
330
+
331
+ a. This Public License applies for the term of the Copyright and
332
+ Similar Rights licensed here. However, if You fail to comply with
333
+ this Public License, then Your rights under this Public License
334
+ terminate automatically.
335
+
336
+ b. Where Your right to use the Licensed Material has terminated under
337
+ Section 6(a), it reinstates:
338
+
339
+ 1. automatically as of the date the violation is cured, provided
340
+ it is cured within 30 days of Your discovery of the
341
+ violation; or
342
+
343
+ 2. upon express reinstatement by the Licensor.
344
+
345
+ For the avoidance of doubt, this Section 6(b) does not affect any
346
+ right the Licensor may have to seek remedies for Your violations
347
+ of this Public License.
348
+
349
+ c. For the avoidance of doubt, the Licensor may also offer the
350
+ Licensed Material under separate terms or conditions or stop
351
+ distributing the Licensed Material at any time; however, doing so
352
+ will not terminate this Public License.
353
+
354
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
355
+ License.
356
+
357
+
358
+ Section 7 -- Other Terms and Conditions.
359
+
360
+ a. The Licensor shall not be bound by any additional or different
361
+ terms or conditions communicated by You unless expressly agreed.
362
+
363
+ b. Any arrangements, understandings, or agreements regarding the
364
+ Licensed Material not stated herein are separate from and
365
+ independent of the terms and conditions of this Public License.
366
+
367
+
368
+ Section 8 -- Interpretation.
369
+
370
+ a. For the avoidance of doubt, this Public License does not, and
371
+ shall not be interpreted to, reduce, limit, restrict, or impose
372
+ conditions on any use of the Licensed Material that could lawfully
373
+ be made without permission under this Public License.
374
+
375
+ b. To the extent possible, if any provision of this Public License is
376
+ deemed unenforceable, it shall be automatically reformed to the
377
+ minimum extent necessary to make it enforceable. If the provision
378
+ cannot be reformed, it shall be severed from this Public License
379
+ without affecting the enforceability of the remaining terms and
380
+ conditions.
381
+
382
+ c. No term or condition of this Public License will be waived and no
383
+ failure to comply consented to unless expressly agreed to by the
384
+ Licensor.
385
+
386
+ d. Nothing in this Public License constitutes or may be interpreted
387
+ as a limitation upon, or waiver of, any privileges and immunities
388
+ that apply to the Licensor or You, including from the legal
389
+ processes of any jurisdiction or authority.
390
+
391
+ =======================================================================
392
+
393
+ Creative Commons is not a party to its public
394
+ licenses. Notwithstanding, Creative Commons may elect to apply one of
395
+ its public licenses to material it publishes and in those instances
396
+ will be considered the “Licensor.” The text of the Creative Commons
397
+ public licenses is dedicated to the public domain under the CC0 Public
398
+ Domain Dedication. Except for the limited purpose of indicating that
399
+ material is shared under a Creative Commons public license or as
400
+ otherwise permitted by the Creative Commons policies published at
401
+ creativecommons.org/policies, Creative Commons does not authorize the
402
+ use of the trademark "Creative Commons" or any other trademark or logo
403
+ of Creative Commons without its prior written consent including,
404
+ without limitation, in connection with any unauthorized modifications
405
+ to any of its public licenses or any other arrangements,
406
+ understandings, or agreements concerning use of licensed material. For
407
+ the avoidance of doubt, this paragraph does not form part of the
408
+ public licenses.
409
+
410
+ Creative Commons may be contacted at creativecommons.org.
411
+
412
+
413
+
414
+ ---
415
+
416
+ This software contains modified codes from huggingface trainsformers library which is released under Apache v2.0 license.
417
+
418
+ ---
419
+ Copyright 2018- The Hugging Face team. All rights reserved.
420
+
421
+ Apache License
422
+ Version 2.0, January 2004
423
+ http://www.apache.org/licenses/
424
+
425
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
426
+
427
+ 1. Definitions.
428
+
429
+ "License" shall mean the terms and conditions for use, reproduction,
430
+ and distribution as defined by Sections 1 through 9 of this document.
431
+
432
+ "Licensor" shall mean the copyright owner or entity authorized by
433
+ the copyright owner that is granting the License.
434
+
435
+ "Legal Entity" shall mean the union of the acting entity and all
436
+ other entities that control, are controlled by, or are under common
437
+ control with that entity. For the purposes of this definition,
438
+ "control" means (i) the power, direct or indirect, to cause the
439
+ direction or management of such entity, whether by contract or
440
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
441
+ outstanding shares, or (iii) beneficial ownership of such entity.
442
+
443
+ "You" (or "Your") shall mean an individual or Legal Entity
444
+ exercising permissions granted by this License.
445
+
446
+ "Source" form shall mean the preferred form for making modifications,
447
+ including but not limited to software source code, documentation
448
+ source, and configuration files.
449
+
450
+ "Object" form shall mean any form resulting from mechanical
451
+ transformation or translation of a Source form, including but
452
+ not limited to compiled object code, generated documentation,
453
+ and conversions to other media types.
454
+
455
+ "Work" shall mean the work of authorship, whether in Source or
456
+ Object form, made available under the License, as indicated by a
457
+ copyright notice that is included in or attached to the work
458
+ (an example is provided in the Appendix below).
459
+
460
+ "Derivative Works" shall mean any work, whether in Source or Object
461
+ form, that is based on (or derived from) the Work and for which the
462
+ editorial revisions, annotations, elaborations, or other modifications
463
+ represent, as a whole, an original work of authorship. For the purposes
464
+ of this License, Derivative Works shall not include works that remain
465
+ separable from, or merely link (or bind by name) to the interfaces of,
466
+ the Work and Derivative Works thereof.
467
+
468
+ "Contribution" shall mean any work of authorship, including
469
+ the original version of the Work and any modifications or additions
470
+ to that Work or Derivative Works thereof, that is intentionally
471
+ submitted to Licensor for inclusion in the Work by the copyright owner
472
+ or by an individual or Legal Entity authorized to submit on behalf of
473
+ the copyright owner. For the purposes of this definition, "submitted"
474
+ means any form of electronic, verbal, or written communication sent
475
+ to the Licensor or its representatives, including but not limited to
476
+ communication on electronic mailing lists, source code control systems,
477
+ and issue tracking systems that are managed by, or on behalf of, the
478
+ Licensor for the purpose of discussing and improving the Work, but
479
+ excluding communication that is conspicuously marked or otherwise
480
+ designated in writing by the copyright owner as "Not a Contribution."
481
+
482
+ "Contributor" shall mean Licensor and any individual or Legal Entity
483
+ on behalf of whom a Contribution has been received by Licensor and
484
+ subsequently incorporated within the Work.
485
+
486
+ 2. Grant of Copyright License. Subject to the terms and conditions of
487
+ this License, each Contributor hereby grants to You a perpetual,
488
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
489
+ copyright license to reproduce, prepare Derivative Works of,
490
+ publicly display, publicly perform, sublicense, and distribute the
491
+ Work and such Derivative Works in Source or Object form.
492
+
493
+ 3. Grant of Patent License. Subject to the terms and conditions of
494
+ this License, each Contributor hereby grants to You a perpetual,
495
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
496
+ (except as stated in this section) patent license to make, have made,
497
+ use, offer to sell, sell, import, and otherwise transfer the Work,
498
+ where such license applies only to those patent claims licensable
499
+ by such Contributor that are necessarily infringed by their
500
+ Contribution(s) alone or by combination of their Contribution(s)
501
+ with the Work to which such Contribution(s) was submitted. If You
502
+ institute patent litigation against any entity (including a
503
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
504
+ or a Contribution incorporated within the Work constitutes direct
505
+ or contributory patent infringement, then any patent licenses
506
+ granted to You under this License for that Work shall terminate
507
+ as of the date such litigation is filed.
508
+
509
+ 4. Redistribution. You may reproduce and distribute copies of the
510
+ Work or Derivative Works thereof in any medium, with or without
511
+ modifications, and in Source or Object form, provided that You
512
+ meet the following conditions:
513
+
514
+ (a) You must give any other recipients of the Work or
515
+ Derivative Works a copy of this License; and
516
+
517
+ (b) You must cause any modified files to carry prominent notices
518
+ stating that You changed the files; and
519
+
520
+ (c) You must retain, in the Source form of any Derivative Works
521
+ that You distribute, all copyright, patent, trademark, and
522
+ attribution notices from the Source form of the Work,
523
+ excluding those notices that do not pertain to any part of
524
+ the Derivative Works; and
525
+
526
+ (d) If the Work includes a "NOTICE" text file as part of its
527
+ distribution, then any Derivative Works that You distribute must
528
+ include a readable copy of the attribution notices contained
529
+ within such NOTICE file, excluding those notices that do not
530
+ pertain to any part of the Derivative Works, in at least one
531
+ of the following places: within a NOTICE text file distributed
532
+ as part of the Derivative Works; within the Source form or
533
+ documentation, if provided along with the Derivative Works; or,
534
+ within a display generated by the Derivative Works, if and
535
+ wherever such third-party notices normally appear. The contents
536
+ of the NOTICE file are for informational purposes only and
537
+ do not modify the License. You may add Your own attribution
538
+ notices within Derivative Works that You distribute, alongside
539
+ or as an addendum to the NOTICE text from the Work, provided
540
+ that such additional attribution notices cannot be construed
541
+ as modifying the License.
542
+
543
+ You may add Your own copyright statement to Your modifications and
544
+ may provide additional or different license terms and conditions
545
+ for use, reproduction, or distribution of Your modifications, or
546
+ for any such Derivative Works as a whole, provided Your use,
547
+ reproduction, and distribution of the Work otherwise complies with
548
+ the conditions stated in this License.
549
+
550
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
551
+ any Contribution intentionally submitted for inclusion in the Work
552
+ by You to the Licensor shall be under the terms and conditions of
553
+ this License, without any additional terms or conditions.
554
+ Notwithstanding the above, nothing herein shall supersede or modify
555
+ the terms of any separate license agreement you may have executed
556
+ with Licensor regarding such Contributions.
557
+
558
+ 6. Trademarks. This License does not grant permission to use the trade
559
+ names, trademarks, service marks, or product names of the Licensor,
560
+ except as required for reasonable and customary use in describing the
561
+ origin of the Work and reproducing the content of the NOTICE file.
562
+
563
+ 7. Disclaimer of Warranty. Unless required by applicable law or
564
+ agreed to in writing, Licensor provides the Work (and each
565
+ Contributor provides its Contributions) on an "AS IS" BASIS,
566
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
567
+ implied, including, without limitation, any warranties or conditions
568
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
569
+ PARTICULAR PURPOSE. You are solely responsible for determining the
570
+ appropriateness of using or redistributing the Work and assume any
571
+ risks associated with Your exercise of permissions under this License.
572
+
573
+ 8. Limitation of Liability. In no event and under no legal theory,
574
+ whether in tort (including negligence), contract, or otherwise,
575
+ unless required by applicable law (such as deliberate and grossly
576
+ negligent acts) or agreed to in writing, shall any Contributor be
577
+ liable to You for damages, including any direct, indirect, special,
578
+ incidental, or consequential damages of any character arising as a
579
+ result of this License or out of the use or inability to use the
580
+ Work (including but not limited to damages for loss of goodwill,
581
+ work stoppage, computer failure or malfunction, or any and all
582
+ other commercial damages or losses), even if such Contributor
583
+ has been advised of the possibility of such damages.
584
+
585
+ 9. Accepting Warranty or Additional Liability. While redistributing
586
+ the Work or Derivative Works thereof, You may choose to offer,
587
+ and charge a fee for, acceptance of support, warranty, indemnity,
588
+ or other liability obligations and/or rights consistent with this
589
+ License. However, in accepting such obligations, You may act only
590
+ on Your own behalf and on Your sole responsibility, not on behalf
591
+ of any other Contributor, and only if You agree to indemnify,
592
+ defend, and hold each Contributor harmless for any liability
593
+ incurred by, or claims asserted against, such Contributor by reason
594
+ of your accepting any such warranty or additional liability.
595
+
596
+ END OF TERMS AND CONDITIONS
597
+
598
+ APPENDIX: How to apply the Apache License to your work.
599
+
600
+ To apply the Apache License to your work, attach the following
601
+ boilerplate notice, with the fields enclosed by brackets "[]"
602
+ replaced with your own identifying information. (Don't include
603
+ the brackets!) The text should be enclosed in the appropriate
604
+ comment syntax for the file format. We also recommend that a
605
+ file or class name and description of purpose be included on the
606
+ same "printed page" as the copyright notice for easier
607
+ identification within third-party archives.
608
+
609
+ Copyright [yyyy] [name of copyright owner]
610
+
611
+ Licensed under the Apache License, Version 2.0 (the "License");
612
+ you may not use this file except in compliance with the License.
613
+ You may obtain a copy of the License at
614
+
615
+ http://www.apache.org/licenses/LICENSE-2.0
616
+
617
+ Unless required by applicable law or agreed to in writing, software
618
+ distributed under the License is distributed on an "AS IS" BASIS,
619
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
620
+ See the License for the specific language governing permissions and
621
+ limitations under the License.
622
+
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ja
5
+ license: apache-2.0
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ ---
9
+
10
+ # PLaMo-13B
11
+
12
+ ## Model Description
13
+ PLaMo-13B is a LLaMA-based 13B model pre-trained on English and Japanese open datasets, developed by Preferred Networks, Inc.
14
+ PLaMo-13B is released under Apache v2.0 license.
15
+
16
+ [PLaMo-13B Release blog (Japanese)](https://tech.preferred.jp/ja/blog/llm-plamo/)
17
+
18
+ ## Usage
19
+
20
+ ### Requirements
21
+
22
+ - numpy
23
+ - safetensors
24
+ - sentencepiece
25
+ - torch
26
+ - transformers
27
+
28
+ ### Use a pipeline as a high-level helper
29
+ ```python
30
+ import transformers
31
+ pipeline = transformers.pipeline("text-generation", model="pfnet/plamo-13b", trust_remote_code=True)
32
+ print(pipeline("The future of artificial intelligence technology is ", max_new_tokens=32))
33
+ ```
34
+
35
+ ### Load model directly
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForCausalLM
38
+ tokenizer = AutoTokenizer.from_pretrained("pfnet/plamo-13b", trust_remote_code=True)
39
+ model = AutoModelForCausalLM.from_pretrained("pfnet/plamo-13b", trust_remote_code=True)
40
+ text = "これからの人工知能技術は"
41
+ input_ids = tokenizer(text, return_tensors="pt").input_ids
42
+ generated_tokens = model.generate(
43
+ inputs=input_ids,
44
+ max_new_tokens=32,
45
+ do_sample=True,
46
+ top_k=50,
47
+ top_p=0.95,
48
+ temperature=1.0,
49
+ )[0]
50
+ generated_text = tokenizer.decode(generated_tokens)
51
+ print(generated_text)
52
+ ```
53
+
54
+ ## Model Details
55
+
56
+ - Model size: 13B
57
+ - Trained tokens: 1.5T tokens (English: 1.32T tokens, Japanese: 0.18T tokens)
58
+ - Context length: 4096
59
+ - Developed by: Preferred Networkfs, Inc
60
+ - Model type: Causal decoder-only
61
+ - Language(s): English, Japanese
62
+ - License: Apache v2.0
63
+
64
+ ## Training Dataset
65
+
66
+ ### English
67
+
68
+ - C4 - English
69
+ - Project Gutenberg
70
+ - RedPajama - Arxiv
71
+ - RedPajama - CommonCrawl - English
72
+ - RedPajama - Github
73
+ - RedPajama - StackExchange
74
+ - RedPajama - Wikipedia
75
+
76
+ ### Japanese
77
+
78
+ - mC4 - Japanese
79
+ - Wikipedia - Japanese
80
+
81
+ ## Tokenizer
82
+ PLaMo-13B uses sentencepiece tokenizer which is trained on a subset of the datasets for model pre-training.
83
+
84
+ ## Bias, Risks, and Limitations
85
+ PLaMo-13B is a new technology that carries risks with use. Testing conducted to date has been in English and Japanese, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, PLaMo-13B’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of PLaMo-13B, developers should perform safety testing and tuning tailored to their specific applications of the model.
86
+
87
+ ## How to cite
88
+ ```tex
89
+ @online{PLaMo2023Introducing,
90
+ author = {Preferred Networks, Inc},
91
+ title = {PLaMo-13B},
92
+ year = {2023},
93
+ url = {https://huggingface.co/pfnet/plamo-13b},
94
+ urldate = {2023-09-28}
95
+ }
96
+ ```
97
+
98
+ ## References
99
+ ```tex
100
+ @article{touvron2023llama,
101
+ title={LLaMA: Open and Efficient Foundation Language Models},
102
+ author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
103
+ journal={arXiv preprint arXiv:2302.13971},
104
+ year={2023}
105
+ }
106
+ ```
added_tokens.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</s>": 2,
3
+ "<cls>": 4,
4
+ "<mask>": 6,
5
+ "<pad>": 3,
6
+ "<s>": 1,
7
+ "<sep>": 5,
8
+ "<unk>": 0
9
+ }
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "PlamoForCausalLM"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "modeling_plamo.PlamoConfig",
7
+ "AutoModelForCausalLM": "modeling_plamo.PlamoForCausalLM"
8
+ },
9
+ "bos_token_id": 1,
10
+ "eos_token_id": 2,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 5120,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 16640,
15
+ "max_position_embeddings": 8192,
16
+ "model_type": "plamo",
17
+ "n_shared_head": 8,
18
+ "num_attention_heads": 40,
19
+ "num_hidden_layers": 40,
20
+ "num_key_value_heads": 40,
21
+ "pad_token_id": 0,
22
+ "rms_norm_eps": 1e-06,
23
+ "tie_word_embeddings": false,
24
+ "tokenizer_class": "PlamoTokenizer",
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.34.0",
27
+ "use_cache": true,
28
+ "vocab_size": 50432
29
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.34.0"
7
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ebe7756693f16872bf7d4a9a632d1aa1a31ced95efe93af31423c4a1c5dc8eb
3
+ size 9953775928
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:669b4488229bfba0d3b40edeb4d984a36206c7437ce5e64dd5b47c550203dd51
3
+ size 9896104952
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6d7e3217fb1f9cce9de7bb4faa3abbd68e161bcd61657419308878fbabf6583
3
+ size 6349249520
model.safetensors.index.json ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 26199091200
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00003-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.layers.0.norm.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.layers.1.norm.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
27
+ "model.layers.layers.10.norm.weight": "model-00001-of-00003.safetensors",
28
+ "model.layers.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
31
+ "model.layers.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.layers.11.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.layers.11.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.layers.11.norm.weight": "model-00001-of-00003.safetensors",
36
+ "model.layers.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
37
+ "model.layers.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
38
+ "model.layers.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
39
+ "model.layers.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
40
+ "model.layers.layers.12.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
41
+ "model.layers.layers.12.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
42
+ "model.layers.layers.12.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
43
+ "model.layers.layers.12.norm.weight": "model-00001-of-00003.safetensors",
44
+ "model.layers.layers.12.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
45
+ "model.layers.layers.12.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
46
+ "model.layers.layers.12.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
47
+ "model.layers.layers.12.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
48
+ "model.layers.layers.13.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
49
+ "model.layers.layers.13.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
50
+ "model.layers.layers.13.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
51
+ "model.layers.layers.13.norm.weight": "model-00001-of-00003.safetensors",
52
+ "model.layers.layers.13.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
53
+ "model.layers.layers.13.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
54
+ "model.layers.layers.13.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
55
+ "model.layers.layers.13.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
56
+ "model.layers.layers.14.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
57
+ "model.layers.layers.14.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
58
+ "model.layers.layers.14.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
59
+ "model.layers.layers.14.norm.weight": "model-00001-of-00003.safetensors",
60
+ "model.layers.layers.14.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
61
+ "model.layers.layers.14.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
62
+ "model.layers.layers.14.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
63
+ "model.layers.layers.14.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
64
+ "model.layers.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.layers.15.norm.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.layers.16.norm.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.layers.17.norm.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.layers.18.norm.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.layers.19.norm.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
105
+ "model.layers.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
106
+ "model.layers.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
107
+ "model.layers.layers.2.norm.weight": "model-00001-of-00003.safetensors",
108
+ "model.layers.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
109
+ "model.layers.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
110
+ "model.layers.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
111
+ "model.layers.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
112
+ "model.layers.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.layers.20.norm.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
117
+ "model.layers.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
118
+ "model.layers.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
119
+ "model.layers.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
120
+ "model.layers.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
121
+ "model.layers.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
122
+ "model.layers.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
123
+ "model.layers.layers.21.norm.weight": "model-00002-of-00003.safetensors",
124
+ "model.layers.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
125
+ "model.layers.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.layers.22.norm.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.layers.23.norm.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
144
+ "model.layers.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
145
+ "model.layers.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
146
+ "model.layers.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
147
+ "model.layers.layers.24.norm.weight": "model-00002-of-00003.safetensors",
148
+ "model.layers.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.layers.25.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
153
+ "model.layers.layers.25.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
154
+ "model.layers.layers.25.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
155
+ "model.layers.layers.25.norm.weight": "model-00002-of-00003.safetensors",
156
+ "model.layers.layers.25.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
157
+ "model.layers.layers.25.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
158
+ "model.layers.layers.25.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
159
+ "model.layers.layers.25.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
160
+ "model.layers.layers.26.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
161
+ "model.layers.layers.26.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
162
+ "model.layers.layers.26.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
163
+ "model.layers.layers.26.norm.weight": "model-00002-of-00003.safetensors",
164
+ "model.layers.layers.26.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
165
+ "model.layers.layers.26.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
166
+ "model.layers.layers.26.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
167
+ "model.layers.layers.26.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
168
+ "model.layers.layers.27.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
169
+ "model.layers.layers.27.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
170
+ "model.layers.layers.27.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
171
+ "model.layers.layers.27.norm.weight": "model-00002-of-00003.safetensors",
172
+ "model.layers.layers.27.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
173
+ "model.layers.layers.27.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
174
+ "model.layers.layers.27.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
175
+ "model.layers.layers.27.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
176
+ "model.layers.layers.28.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
177
+ "model.layers.layers.28.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
178
+ "model.layers.layers.28.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
179
+ "model.layers.layers.28.norm.weight": "model-00002-of-00003.safetensors",
180
+ "model.layers.layers.28.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
181
+ "model.layers.layers.28.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
182
+ "model.layers.layers.28.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
183
+ "model.layers.layers.28.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
184
+ "model.layers.layers.29.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
185
+ "model.layers.layers.29.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
186
+ "model.layers.layers.29.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
187
+ "model.layers.layers.29.norm.weight": "model-00002-of-00003.safetensors",
188
+ "model.layers.layers.29.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
189
+ "model.layers.layers.29.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
190
+ "model.layers.layers.29.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
191
+ "model.layers.layers.29.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
192
+ "model.layers.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
193
+ "model.layers.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
194
+ "model.layers.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
195
+ "model.layers.layers.3.norm.weight": "model-00001-of-00003.safetensors",
196
+ "model.layers.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
197
+ "model.layers.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
198
+ "model.layers.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
199
+ "model.layers.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
200
+ "model.layers.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
201
+ "model.layers.layers.30.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
202
+ "model.layers.layers.30.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
203
+ "model.layers.layers.30.norm.weight": "model-00003-of-00003.safetensors",
204
+ "model.layers.layers.30.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
205
+ "model.layers.layers.30.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
206
+ "model.layers.layers.30.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
207
+ "model.layers.layers.30.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
208
+ "model.layers.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
209
+ "model.layers.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
210
+ "model.layers.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
211
+ "model.layers.layers.31.norm.weight": "model-00003-of-00003.safetensors",
212
+ "model.layers.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
213
+ "model.layers.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
214
+ "model.layers.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
215
+ "model.layers.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
216
+ "model.layers.layers.32.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
217
+ "model.layers.layers.32.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
218
+ "model.layers.layers.32.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
219
+ "model.layers.layers.32.norm.weight": "model-00003-of-00003.safetensors",
220
+ "model.layers.layers.32.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
221
+ "model.layers.layers.32.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
222
+ "model.layers.layers.32.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
223
+ "model.layers.layers.32.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
224
+ "model.layers.layers.33.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
225
+ "model.layers.layers.33.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
226
+ "model.layers.layers.33.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
227
+ "model.layers.layers.33.norm.weight": "model-00003-of-00003.safetensors",
228
+ "model.layers.layers.33.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
229
+ "model.layers.layers.33.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
230
+ "model.layers.layers.33.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
231
+ "model.layers.layers.33.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
232
+ "model.layers.layers.34.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
233
+ "model.layers.layers.34.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
234
+ "model.layers.layers.34.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
235
+ "model.layers.layers.34.norm.weight": "model-00003-of-00003.safetensors",
236
+ "model.layers.layers.34.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
237
+ "model.layers.layers.34.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.layers.34.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
239
+ "model.layers.layers.34.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.layers.35.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.layers.35.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.layers.35.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
243
+ "model.layers.layers.35.norm.weight": "model-00003-of-00003.safetensors",
244
+ "model.layers.layers.35.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
245
+ "model.layers.layers.35.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
246
+ "model.layers.layers.35.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
247
+ "model.layers.layers.35.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
248
+ "model.layers.layers.36.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
249
+ "model.layers.layers.36.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
250
+ "model.layers.layers.36.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
251
+ "model.layers.layers.36.norm.weight": "model-00003-of-00003.safetensors",
252
+ "model.layers.layers.36.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
253
+ "model.layers.layers.36.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
254
+ "model.layers.layers.36.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
255
+ "model.layers.layers.36.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
256
+ "model.layers.layers.37.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
257
+ "model.layers.layers.37.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
258
+ "model.layers.layers.37.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
259
+ "model.layers.layers.37.norm.weight": "model-00003-of-00003.safetensors",
260
+ "model.layers.layers.37.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
261
+ "model.layers.layers.37.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
262
+ "model.layers.layers.37.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
263
+ "model.layers.layers.37.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
264
+ "model.layers.layers.38.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
265
+ "model.layers.layers.38.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
266
+ "model.layers.layers.38.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
267
+ "model.layers.layers.38.norm.weight": "model-00003-of-00003.safetensors",
268
+ "model.layers.layers.38.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
269
+ "model.layers.layers.38.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
270
+ "model.layers.layers.38.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
271
+ "model.layers.layers.38.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
272
+ "model.layers.layers.39.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
273
+ "model.layers.layers.39.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
274
+ "model.layers.layers.39.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
275
+ "model.layers.layers.39.norm.weight": "model-00003-of-00003.safetensors",
276
+ "model.layers.layers.39.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
277
+ "model.layers.layers.39.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
278
+ "model.layers.layers.39.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
279
+ "model.layers.layers.39.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
280
+ "model.layers.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.layers.4.norm.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.layers.5.norm.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.layers.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
297
+ "model.layers.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
298
+ "model.layers.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
299
+ "model.layers.layers.6.norm.weight": "model-00001-of-00003.safetensors",
300
+ "model.layers.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
301
+ "model.layers.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
302
+ "model.layers.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
303
+ "model.layers.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
304
+ "model.layers.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
305
+ "model.layers.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
306
+ "model.layers.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
307
+ "model.layers.layers.7.norm.weight": "model-00001-of-00003.safetensors",
308
+ "model.layers.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
309
+ "model.layers.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
310
+ "model.layers.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
311
+ "model.layers.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
312
+ "model.layers.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
313
+ "model.layers.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
314
+ "model.layers.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
315
+ "model.layers.layers.8.norm.weight": "model-00001-of-00003.safetensors",
316
+ "model.layers.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
317
+ "model.layers.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
318
+ "model.layers.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
319
+ "model.layers.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
320
+ "model.layers.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
321
+ "model.layers.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
322
+ "model.layers.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
323
+ "model.layers.layers.9.norm.weight": "model-00001-of-00003.safetensors",
324
+ "model.layers.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
325
+ "model.layers.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
326
+ "model.layers.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
327
+ "model.layers.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
328
+ "model.norm.weight": "model-00003-of-00003.safetensors"
329
+ }
330
+ }
modeling_plamo.py ADDED
@@ -0,0 +1,705 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, NamedTuple, Optional, Tuple, Union
2
+
3
+ import numpy as np
4
+ import torch
5
+ from torch import nn
6
+ from torch.nn import functional as F
7
+ from transformers import PretrainedConfig, PreTrainedModel
8
+ from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
9
+
10
+
11
+ class DecoderInput(NamedTuple):
12
+ hidden_states: torch.Tensor
13
+ position_ids: torch.Tensor
14
+ attention_mask: Optional[torch.Tensor] = None
15
+ past_key_values: Optional[List[torch.FloatTensor]] = None
16
+ output_hidden_states: Optional[bool] = False
17
+ output_attentions: Optional[bool] = False
18
+ use_cache: Optional[bool] = False
19
+ gradient_checkpointing: bool = False
20
+
21
+
22
+ class DecoderOutput(NamedTuple):
23
+ hidden_states: torch.Tensor
24
+ all_hidden_states: Optional[Tuple[torch.Tensor, ...]]
25
+ all_self_attns: Optional[Tuple[torch.Tensor, ...]]
26
+ next_decoder_cache: Optional[Tuple[torch.Tensor, ...]]
27
+
28
+
29
+ class PlamoConfig(PretrainedConfig): # type: ignore
30
+ model_type: str = "plamo"
31
+
32
+ def __init__(
33
+ self,
34
+ vocab_size: int = 32000,
35
+ hidden_size: int = 4096,
36
+ intermediate_size: int = 13312,
37
+ num_hidden_layers: int = 32,
38
+ num_attention_heads: int = 32,
39
+ num_key_value_heads: Optional[int] = None,
40
+ max_position_embeddings: int = 2048,
41
+ initializer_range: float = 0.02,
42
+ rms_norm_eps: float = 1e-6,
43
+ use_cache: bool = True,
44
+ tokenizer_class: str = "PlamoTokenizer",
45
+ pad_token_id: Optional[int] = None,
46
+ bos_token_id: int = 1,
47
+ eos_token_id: int = 2,
48
+ n_shared_head: int = 8,
49
+ tie_word_embeddings: bool = False,
50
+ **kwargs: Any,
51
+ ) -> None:
52
+ self.vocab_size = vocab_size
53
+ self.max_position_embeddings = max_position_embeddings
54
+ self.hidden_size = hidden_size
55
+ self.intermediate_size = intermediate_size
56
+ self.num_hidden_layers = num_hidden_layers
57
+ self.num_attention_heads = num_attention_heads
58
+
59
+ # for backward compatibility
60
+ if num_key_value_heads is None:
61
+ num_key_value_heads = num_attention_heads
62
+
63
+ self.num_key_value_heads = num_key_value_heads
64
+ self.initializer_range = initializer_range
65
+ self.rms_norm_eps = rms_norm_eps
66
+ self.use_cache = use_cache
67
+
68
+ self.n_shared_head = n_shared_head
69
+
70
+ super().__init__(
71
+ tokenizer_class=tokenizer_class,
72
+ pad_token_id=pad_token_id,
73
+ bos_token_id=bos_token_id,
74
+ eos_token_id=eos_token_id,
75
+ tie_word_embeddings=tie_word_embeddings,
76
+ **kwargs,
77
+ )
78
+
79
+
80
+ # Copied from transformers.models.bart.modeling_bart._make_causal_mask
81
+ def _make_causal_mask(
82
+ input_ids_shape: Tuple[int, int], dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0
83
+ ) -> torch.Tensor:
84
+ """
85
+ Make causal mask used for bi-directional self-attention.
86
+ """
87
+ bsz, tgt_len = input_ids_shape
88
+ mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device)
89
+ mask_cond = torch.arange(mask.size(-1), device=device)
90
+ mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0)
91
+ mask = mask.to(dtype)
92
+
93
+ if past_key_values_length > 0:
94
+ mask = torch.cat([torch.zeros(tgt_len, past_key_values_length, dtype=dtype, device=device), mask], dim=-1)
95
+ return mask[None, None, :, :].expand(bsz, 1, tgt_len, tgt_len + past_key_values_length)
96
+
97
+
98
+ # Copied from transformers.models.bart.modeling_bart._expand_mask
99
+ def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None) -> torch.Tensor:
100
+ """
101
+ Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
102
+ """
103
+ bsz, src_len = mask.size()
104
+ tgt_len = tgt_len if tgt_len is not None else src_len
105
+
106
+ expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
107
+
108
+ inverted_mask = 1.0 - expanded_mask
109
+
110
+ return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min) # type: ignore
111
+
112
+
113
+ class RotaryEmbedding(torch.nn.Module):
114
+ def __init__(
115
+ self, dim: int, max_position_embeddings: int = 2048, base: int = 10000, device: Optional[torch.device] = None
116
+ ) -> None:
117
+ super().__init__()
118
+
119
+ self.dim = dim
120
+ self.max_position_embeddings = max_position_embeddings
121
+ self.base = base
122
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
123
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
124
+
125
+ # Build here to make `torch.jit.trace` work.
126
+ self._set_cos_sin_cache(
127
+ seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
128
+ )
129
+
130
+ def _set_cos_sin_cache(self, seq_len: int, device: Any, dtype: Any) -> None:
131
+ self.max_seq_len_cached = seq_len
132
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype) # type: ignore
133
+
134
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
135
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
136
+ emb = torch.cat((freqs, freqs), dim=-1)
137
+ self.register_buffer("cos_cached", emb.cos()[None, None, :, :].to(dtype), persistent=False)
138
+ self.register_buffer("sin_cached", emb.sin()[None, None, :, :].to(dtype), persistent=False)
139
+
140
+ def forward(self, x: torch.Tensor, seq_len: int) -> Tuple[torch.Tensor, torch.Tensor]:
141
+ # x: [bs, num_attention_heads, seq_len, head_size]
142
+ if seq_len > self.max_seq_len_cached:
143
+ self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
144
+
145
+ return (
146
+ self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype), # type: ignore
147
+ self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype), # type: ignore
148
+ )
149
+
150
+
151
+ def _rotate_half(x: torch.Tensor) -> torch.Tensor:
152
+ """Rotates half the hidden dims of the input."""
153
+ x1 = x[..., : x.shape[-1] // 2]
154
+ x2 = x[..., x.shape[-1] // 2 :]
155
+ return torch.cat((-x2, x1), dim=-1)
156
+
157
+
158
+ def _rotary_pos_emb(x: torch.Tensor, cos: torch.Tensor, sin: torch.Tensor, position_ids: torch.Tensor) -> torch.Tensor:
159
+ # The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
160
+ cos = cos.squeeze(1).squeeze(0) # [seq_len, dim]
161
+ sin = sin.squeeze(1).squeeze(0) # [seq_len, dim]
162
+ cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
163
+ sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
164
+ x_embed = (x * cos) + (_rotate_half(x) * sin)
165
+ return x_embed
166
+
167
+
168
+ class RMSNorm(nn.Module):
169
+ def __init__(self, hidden_size: int, eps: float = 1e-6) -> None:
170
+ super().__init__()
171
+ self.weight = nn.Parameter(torch.ones(hidden_size))
172
+ self.variance_epsilon = eps
173
+
174
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
175
+ input_dtype = hidden_states.dtype
176
+ hidden_states = hidden_states.to(torch.float32)
177
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
178
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
179
+ return self.weight * hidden_states.to(input_dtype)
180
+
181
+
182
+ class Attention(torch.nn.Module):
183
+ def __init__(self, config: PlamoConfig) -> None:
184
+ super().__init__()
185
+ self.config = config
186
+ self.hidden_size = config.hidden_size
187
+ head_dim = self.hidden_size // config.num_attention_heads
188
+ self.max_position_embeddings = config.max_position_embeddings
189
+
190
+ self.q_num_heads = config.num_attention_heads
191
+ self.qk_dim = self.v_dim = head_dim
192
+ self.k_num_heads = self.v_num_heads = int(np.ceil(self.q_num_heads / config.n_shared_head))
193
+
194
+ self.q_proj = nn.Linear(self.hidden_size, self.q_num_heads * self.qk_dim, bias=False)
195
+ self.k_proj = nn.Linear(self.hidden_size, self.k_num_heads * self.qk_dim, bias=False)
196
+ self.v_proj = nn.Linear(self.hidden_size, self.v_num_heads * self.v_dim, bias=False)
197
+ self.o_proj = nn.Linear(self.q_num_heads * self.v_dim, self.hidden_size, bias=False)
198
+ self.rotary_emb = RotaryEmbedding(self.qk_dim, max_position_embeddings=self.max_position_embeddings)
199
+
200
+ def forward(
201
+ self,
202
+ hidden_states: torch.Tensor,
203
+ attention_mask: Optional[torch.Tensor] = None,
204
+ position_ids: Optional[torch.Tensor] = None,
205
+ past_key_value: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
206
+ output_attentions: bool = False,
207
+ use_cache: bool = False,
208
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor, torch.Tensor]]]:
209
+ bsz, q_len, _ = hidden_states.size()
210
+
211
+ query_states = self.q_proj(hidden_states).view(bsz, q_len, self.q_num_heads, self.qk_dim).transpose(1, 2)
212
+ key_states = self.k_proj(hidden_states).view(bsz, q_len, self.k_num_heads, self.qk_dim).transpose(1, 2)
213
+ value_states = self.v_proj(hidden_states).view(bsz, q_len, self.v_num_heads, self.v_dim).transpose(1, 2)
214
+
215
+ def _expand_kv(t: torch.Tensor, repeat: int, target: int) -> torch.Tensor:
216
+ return t.repeat(1, repeat, 1, 1)[:, :target]
217
+
218
+ # expand shared kv
219
+ assert self.k_num_heads == self.v_num_heads
220
+ key_states = _expand_kv(key_states, self.config.n_shared_head, self.q_num_heads)
221
+ value_states = _expand_kv(value_states, self.config.n_shared_head, self.q_num_heads)
222
+
223
+ kv_seq_len = key_states.shape[-2]
224
+ if past_key_value is not None:
225
+ kv_seq_len += past_key_value[0].shape[-2]
226
+ cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
227
+ assert position_ids is not None
228
+ query_states = _rotary_pos_emb(query_states, cos, sin, position_ids)
229
+ key_states = _rotary_pos_emb(key_states, cos, sin, position_ids)
230
+ # [bsz, nh, t, hd]
231
+
232
+ if past_key_value is not None:
233
+ # reuse k, v, self_attention
234
+ key_states = torch.cat([past_key_value[0], key_states], dim=2)
235
+ value_states = torch.cat([past_key_value[1], value_states], dim=2)
236
+
237
+ past_key_value = (key_states, value_states) if use_cache else None
238
+
239
+ attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attn_mask=attention_mask)
240
+ attn_output = attn_output.transpose(1, 2)
241
+
242
+ attn_output = attn_output.reshape(bsz, q_len, self.q_num_heads * self.v_dim)
243
+ attn_output = self.o_proj(attn_output)
244
+
245
+ if not output_attentions:
246
+ attn_weights = None
247
+
248
+ return attn_output, attn_weights, past_key_value
249
+
250
+
251
+ class MLP(nn.Module):
252
+ def __init__(self, config: PlamoConfig) -> None:
253
+ super().__init__()
254
+ self.config = config
255
+ self.hidden_size = config.hidden_size
256
+ self.intermediate_size = config.intermediate_size
257
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
258
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
259
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
260
+ self.act_fn = torch.nn.functional.silu
261
+
262
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
263
+ return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x)) # type: ignore
264
+
265
+
266
+ class PlamoDecoderLayer(torch.nn.Module):
267
+ def __init__(self, config: PlamoConfig) -> None:
268
+ super().__init__()
269
+ self.config = config
270
+ self.hidden_size = config.hidden_size
271
+ self.self_attn = Attention(config)
272
+ self.mlp = MLP(config)
273
+ self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
274
+
275
+ def forward(
276
+ self,
277
+ hidden_states: torch.Tensor,
278
+ attention_mask: Optional[torch.Tensor] = None,
279
+ position_ids: Optional[torch.LongTensor] = None,
280
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
281
+ output_attentions: Optional[bool] = False,
282
+ use_cache: Optional[bool] = False,
283
+ ) -> Tuple[Any, ...]:
284
+ # from LlamaDecoder
285
+ residual = hidden_states
286
+
287
+ hidden_states = self.norm(hidden_states)
288
+
289
+ # Self Attention
290
+ hidden_states_sa, self_attn_weights, present_key_value = self.self_attn(
291
+ hidden_states=hidden_states,
292
+ attention_mask=attention_mask,
293
+ position_ids=position_ids,
294
+ past_key_value=past_key_value,
295
+ output_attentions=output_attentions,
296
+ use_cache=use_cache,
297
+ )
298
+
299
+ # Fully Connected
300
+ hidden_states_mlp = self.mlp(hidden_states)
301
+
302
+ # Residual
303
+ hidden_states = residual + hidden_states_sa + hidden_states_mlp
304
+
305
+ outputs: Any = (hidden_states,)
306
+
307
+ if output_attentions:
308
+ outputs += (self_attn_weights,)
309
+
310
+ if use_cache:
311
+ outputs += (present_key_value,)
312
+
313
+ return outputs # type: ignore
314
+
315
+
316
+ class PlamoDecoder(torch.nn.Module):
317
+ def __init__(self, config: PlamoConfig) -> None:
318
+ super().__init__()
319
+ self.layers = torch.nn.ModuleList([PlamoDecoderLayer(config) for _ in range(config.num_hidden_layers)])
320
+
321
+ def forward(self, x: DecoderInput) -> DecoderOutput:
322
+ all_hidden_states: Optional[Tuple[torch.Tensor, ...]] = () if x.output_hidden_states else None
323
+ all_self_attns: Optional[Tuple[torch.Tensor, ...]] = () if x.output_attentions else None
324
+ next_decoder_cache: Optional[Tuple[torch.Tensor, ...]] = () if x.use_cache else None
325
+ hidden_states = x.hidden_states
326
+
327
+ for idx, decoder_layer in enumerate(self.layers):
328
+ if x.output_hidden_states:
329
+ assert all_hidden_states is not None
330
+ all_hidden_states += (hidden_states,)
331
+
332
+ past_key_value = x.past_key_values[idx] if x.past_key_values is not None else None
333
+
334
+ if self.training and x.gradient_checkpointing:
335
+
336
+ def create_custom_forward(module): # type: ignore
337
+ def custom_forward(*inputs): # type: ignore
338
+ # None for past_key_value
339
+ return module(*inputs, x.output_attentions, None)
340
+
341
+ return custom_forward
342
+
343
+ layer_outputs = torch.utils.checkpoint.checkpoint(
344
+ create_custom_forward(decoder_layer), # type: ignore
345
+ hidden_states,
346
+ x.attention_mask,
347
+ x.position_ids,
348
+ None,
349
+ )
350
+ else:
351
+ layer_outputs = decoder_layer(
352
+ hidden_states,
353
+ attention_mask=x.attention_mask,
354
+ position_ids=x.position_ids,
355
+ past_key_value=past_key_value,
356
+ output_attentions=x.output_attentions,
357
+ use_cache=x.use_cache,
358
+ )
359
+
360
+ hidden_states = layer_outputs[0]
361
+
362
+ if x.use_cache:
363
+ cache = layer_outputs[2 if x.output_attentions else 1]
364
+ assert cache is not None
365
+ assert next_decoder_cache is not None
366
+ next_decoder_cache += (cache,)
367
+
368
+ if x.output_attentions:
369
+ assert layer_outputs[1] is not None
370
+ assert all_self_attns is not None
371
+ all_self_attns += (layer_outputs[1],)
372
+ return DecoderOutput(hidden_states, all_hidden_states, all_self_attns, next_decoder_cache)
373
+
374
+
375
+ class PlamoPreTrainedModel(PreTrainedModel): # type: ignore
376
+ config_class = PlamoConfig
377
+ _no_split_modules: List[str]
378
+ base_model_prefix = "model"
379
+ supports_gradient_checkpointing = True
380
+ _no_split_modules = ["PlamoDecoderLayer"]
381
+ _skip_keys_device_placement = "past_key_values"
382
+ _keys_to_ignore_on_load_unexpected = [r"decoder\.version"]
383
+
384
+ def _init_weights(self, module: torch.nn.Module) -> None:
385
+ std = self.config.initializer_range
386
+ if isinstance(module, nn.Linear):
387
+ module.weight.data.normal_(mean=0.0, std=std)
388
+ if module.bias is not None:
389
+ module.bias.data.zero_()
390
+ elif isinstance(module, nn.Embedding):
391
+ module.weight.data.normal_(mean=0.0, std=std)
392
+ if module.padding_idx is not None:
393
+ module.weight.data[module.padding_idx].zero_()
394
+
395
+ def _set_gradient_checkpointing(self, module: torch.nn.Module, value: bool = False) -> None:
396
+ module.gradient_checkpointing = value # type: ignore
397
+
398
+
399
+ class PlamoModel(PlamoPreTrainedModel):
400
+ def __init__(self, config: PlamoConfig):
401
+ super().__init__(config)
402
+ self.padding_idx = config.pad_token_id
403
+ self.vocab_size = config.vocab_size
404
+
405
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
406
+ self.layers = PlamoDecoder(config) # type: ignore
407
+ self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
408
+
409
+ self.gradient_checkpointing = False
410
+ # Initialize weights and apply final processing
411
+ self.post_init()
412
+
413
+ def get_input_embeddings(self) -> torch.nn.Embedding:
414
+ return self.embed_tokens
415
+
416
+ def set_input_embeddings(self, value: torch.nn.Embedding) -> None:
417
+ self.embed_tokens = value
418
+
419
+ # Copied from transformers.models.bart.modeling_bart.BartDecoder._prepare_decoder_attention_mask
420
+ def _prepare_decoder_attention_mask(
421
+ self,
422
+ attention_mask: torch.Tensor,
423
+ input_shape: Tuple[int, int],
424
+ inputs_embeds: Optional[torch.FloatTensor],
425
+ past_key_values_length: int,
426
+ ) -> Optional[torch.Tensor]:
427
+ # create causal mask
428
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
429
+ combined_attention_mask: Optional[torch.Tensor] = None
430
+ if input_shape[-1] > 1:
431
+ assert inputs_embeds is not None
432
+ combined_attention_mask = _make_causal_mask(
433
+ input_shape,
434
+ inputs_embeds.dtype,
435
+ device=inputs_embeds.device,
436
+ past_key_values_length=past_key_values_length,
437
+ )
438
+
439
+ if attention_mask is not None:
440
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
441
+ assert inputs_embeds is not None
442
+ expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to(
443
+ inputs_embeds.device
444
+ )
445
+ combined_attention_mask = (
446
+ expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
447
+ )
448
+
449
+ return combined_attention_mask
450
+
451
+ def forward(
452
+ self,
453
+ input_ids: Optional[torch.LongTensor] = None,
454
+ attention_mask: Optional[torch.Tensor] = None,
455
+ position_ids: Optional[torch.Tensor] = None,
456
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
457
+ inputs_embeds: Optional[torch.FloatTensor] = None,
458
+ use_cache: Optional[bool] = None,
459
+ output_attentions: Optional[bool] = None,
460
+ output_hidden_states: Optional[bool] = None,
461
+ return_dict: Optional[bool] = None,
462
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
463
+ assert input_ids is not None
464
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
465
+ output_hidden_states = (
466
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
467
+ )
468
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
469
+
470
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
471
+
472
+ # retrieve input_ids and inputs_embeds
473
+ if input_ids is not None and inputs_embeds is not None:
474
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
475
+ elif input_ids is not None:
476
+ batch_size, seq_length = input_ids.shape
477
+ else:
478
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
479
+
480
+ seq_length_with_past = seq_length
481
+ past_key_values_length = 0
482
+
483
+ if past_key_values is not None:
484
+ past_key_values_length = past_key_values[0][0].shape[2]
485
+ seq_length_with_past = seq_length_with_past + past_key_values_length
486
+
487
+ if position_ids is None:
488
+ device = input_ids.device
489
+ position_ids = torch.arange(
490
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
491
+ )
492
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
493
+ else:
494
+ position_ids = position_ids.view(-1, seq_length).long()
495
+
496
+ if inputs_embeds is None:
497
+ inputs_embeds = self.embed_tokens(input_ids)
498
+ # embed positions
499
+ if attention_mask is None:
500
+ attention_mask = torch.ones(
501
+ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embeds.device
502
+ )
503
+ attention_mask = self._prepare_decoder_attention_mask(
504
+ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_length
505
+ )
506
+
507
+ hidden_states = inputs_embeds
508
+
509
+ if self.gradient_checkpointing and self.training:
510
+ if use_cache:
511
+ use_cache = False
512
+
513
+ # decoder layers
514
+ out = self.layers(
515
+ DecoderInput(
516
+ hidden_states,
517
+ position_ids,
518
+ attention_mask,
519
+ past_key_values,
520
+ output_hidden_states,
521
+ output_attentions,
522
+ use_cache,
523
+ self.gradient_checkpointing,
524
+ )
525
+ )
526
+ assert isinstance(out, DecoderOutput)
527
+ hidden_states = out.hidden_states
528
+ all_hidden_states = out.all_hidden_states
529
+ all_self_attns = out.all_self_attns
530
+ next_decoder_cache = out.next_decoder_cache
531
+
532
+ hidden_states = self.norm(hidden_states)
533
+
534
+ # add hidden states from the last decoder layer
535
+ if output_hidden_states:
536
+ assert all_hidden_states is not None
537
+ all_hidden_states += (hidden_states,)
538
+
539
+ next_cache = next_decoder_cache if use_cache else None
540
+ if not return_dict:
541
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
542
+ return BaseModelOutputWithPast(
543
+ last_hidden_state=hidden_states,
544
+ past_key_values=next_cache,
545
+ hidden_states=all_hidden_states,
546
+ attentions=all_self_attns,
547
+ )
548
+
549
+
550
+ class PlamoForCausalLM(PlamoPreTrainedModel):
551
+ def __init__(self, config: PretrainedConfig) -> None:
552
+ super().__init__(config)
553
+ self.model = PlamoModel(config)
554
+
555
+ self.lm_head: torch.nn.Module = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
556
+
557
+ # Initialize weights and apply final processing
558
+ self.post_init()
559
+
560
+ def get_input_embeddings(self) -> torch.nn.Embedding:
561
+ return self.model.embed_tokens
562
+
563
+ def set_input_embeddings(self, value: torch.nn.Embedding) -> None:
564
+ self.model.embed_tokens = value
565
+
566
+ def get_output_embeddings(self) -> torch.nn.Module:
567
+ return self.lm_head
568
+
569
+ def set_output_embeddings(self, new_embeddings: torch.nn.Module) -> None:
570
+ self.lm_head = new_embeddings
571
+
572
+ def set_decoder(self, decoder: PlamoModel) -> None:
573
+ self.model = decoder
574
+
575
+ def get_decoder(self) -> PlamoModel:
576
+ return self.model
577
+
578
+ def forward( # type: ignore
579
+ self,
580
+ input_ids: Optional[torch.LongTensor] = None,
581
+ attention_mask: Optional[torch.Tensor] = None,
582
+ position_ids: Optional[torch.Tensor] = None,
583
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
584
+ inputs_embeds: Optional[torch.FloatTensor] = None,
585
+ labels: Optional[torch.LongTensor] = None,
586
+ use_cache: Optional[bool] = None,
587
+ output_attentions: Optional[bool] = None,
588
+ output_hidden_states: Optional[bool] = None,
589
+ return_dict: Optional[bool] = None,
590
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
591
+ r"""
592
+ Args:
593
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
594
+ Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
595
+ config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
596
+ (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
597
+
598
+ Returns:
599
+
600
+ Example:
601
+
602
+ ```python
603
+ >>> from transformers import AutoTokenizer, LlamaForCausalLM
604
+
605
+ >>> model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
606
+ >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)
607
+
608
+ >>> prompt = "Hey, are you consciours? Can you talk to me?"
609
+ >>> inputs = tokenizer(prompt, return_tensors="pt")
610
+
611
+ >>> # Generate
612
+ >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
613
+ >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
614
+ "Hey, are you consciours? Can you talk to me?\nI'm not consciours, but I can talk to you."
615
+ ```"""
616
+ assert input_ids is not None
617
+
618
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
619
+ output_hidden_states = (
620
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
621
+ )
622
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
623
+
624
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
625
+ outputs = self.model(
626
+ input_ids=input_ids,
627
+ attention_mask=attention_mask,
628
+ position_ids=position_ids,
629
+ past_key_values=past_key_values,
630
+ inputs_embeds=inputs_embeds,
631
+ use_cache=use_cache,
632
+ output_attentions=output_attentions,
633
+ output_hidden_states=output_hidden_states,
634
+ return_dict=return_dict,
635
+ )
636
+
637
+ hidden_states = outputs[0]
638
+ logits = self.lm_head(hidden_states)
639
+
640
+ loss = None
641
+ if labels is not None:
642
+ # Shift so that tokens < n predict n
643
+ shift_logits = logits[..., :-1, :].contiguous()
644
+ shift_labels = labels[..., 1:].contiguous()
645
+ # Flatten the tokens
646
+ loss_fct = nn.CrossEntropyLoss()
647
+ shift_logits = shift_logits.view(-1, self.config.vocab_size)
648
+ shift_labels = shift_labels.view(-1)
649
+ # Enable model parallelism
650
+ shift_labels = shift_labels.to(shift_logits.device)
651
+ loss = loss_fct(shift_logits, shift_labels)
652
+
653
+ if not return_dict:
654
+ output = (logits,) + outputs[1:]
655
+ return (loss,) + output if loss is not None else output
656
+
657
+ return CausalLMOutputWithPast(
658
+ loss=loss,
659
+ logits=logits,
660
+ past_key_values=outputs.past_key_values,
661
+ hidden_states=outputs.hidden_states,
662
+ attentions=outputs.attentions,
663
+ )
664
+
665
+ def prepare_inputs_for_generation(
666
+ self,
667
+ input_ids: torch.Tensor,
668
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
669
+ attention_mask: Optional[torch.Tensor] = None,
670
+ inputs_embeds: Optional[torch.Tensor] = None,
671
+ **kwargs: Any,
672
+ ) -> Dict[str, Any]:
673
+ if past_key_values:
674
+ input_ids = input_ids[:, -1:]
675
+
676
+ position_ids = kwargs.get("position_ids", None)
677
+ if attention_mask is not None and position_ids is None:
678
+ # create position_ids on the fly for batch generation
679
+ position_ids = attention_mask.long().cumsum(-1) - 1
680
+ position_ids.masked_fill_(attention_mask == 0, 1)
681
+ if past_key_values:
682
+ position_ids = position_ids[:, -1].unsqueeze(-1)
683
+
684
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
685
+ if inputs_embeds is not None and past_key_values is None:
686
+ model_inputs: Dict[str, Any] = {"inputs_embeds": inputs_embeds}
687
+ else:
688
+ model_inputs = {"input_ids": input_ids}
689
+
690
+ model_inputs.update(
691
+ {
692
+ "position_ids": position_ids,
693
+ "past_key_values": past_key_values,
694
+ "use_cache": kwargs.get("use_cache"),
695
+ "attention_mask": attention_mask,
696
+ }
697
+ )
698
+ return model_inputs
699
+
700
+ @staticmethod
701
+ def _reorder_cache(past_key_values: List[torch.FloatTensor], beam_idx: int) -> Tuple[Any, ...]:
702
+ reordered_past: Tuple[Any, ...] = ()
703
+ for layer_past in past_key_values:
704
+ reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
705
+ return reordered_past
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa5a08b51271f2e35ad0f41b322e131df8cd783b62315cdbd25480f6f39cae45
3
+ size 26199259022
special_tokens_map.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>",
6
+ "<pad>",
7
+ "<cls>",
8
+ "<sep>",
9
+ "<mask>"
10
+ ],
11
+ "bos_token": "<s>",
12
+ "cls_token": "<cls>",
13
+ "eos_token": "</s>",
14
+ "mask_token": "<mask>",
15
+ "pad_token": "<pad>",
16
+ "sep_token": "<sep>",
17
+ "unk_token": "<unk>"
18
+ }
tokenization_plamo.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from shutil import copyfile
3
+ from typing import Any, Dict, List, Optional, Tuple
4
+
5
+ import sentencepiece as spm
6
+ from transformers.tokenization_utils import PreTrainedTokenizer
7
+ from transformers.utils import logging
8
+
9
+ VOCAB_FILES_NAMES = {"vocab_file": "tokenizer.model"}
10
+ logger = logging.get_logger(__name__)
11
+
12
+
13
+ class PlamoTokenizer(PreTrainedTokenizer): # type: ignore
14
+ vocab_files_names = VOCAB_FILES_NAMES
15
+ model_input_names = ["input_ids", "attention_mask"]
16
+
17
+ def __init__(
18
+ self,
19
+ vocab_file: str,
20
+ unk_token: str = "<unk>",
21
+ bos_token: str = "<s>",
22
+ eos_token: str = "</s>",
23
+ pad_token: str = "<pad>",
24
+ cls_token: str = "<cls>",
25
+ sep_token: str = "<sep>",
26
+ mask_token: str = "<mask>",
27
+ sp_model_kwargs: Optional[Dict[str, Any]] = None,
28
+ clean_up_tokenization_spaces: bool = False,
29
+ **kwargs: Any,
30
+ ) -> None:
31
+ if "add_bos_token" not in kwargs:
32
+ kwargs["add_bos_token"] = False
33
+ if "add_eos_token" not in kwargs:
34
+ kwargs["add_eos_token"] = False
35
+ self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
36
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
37
+ self.sp_model.Load(vocab_file)
38
+ self.vocab_file = vocab_file
39
+ self.add_bos_token = kwargs["add_bos_token"]
40
+ self.add_eos_token = kwargs["add_eos_token"]
41
+
42
+ super().__init__(
43
+ vocab_file=vocab_file,
44
+ unk_token=unk_token,
45
+ bos_token=bos_token,
46
+ eos_token=eos_token,
47
+ pad_token=pad_token,
48
+ cls_token=cls_token,
49
+ sep_token=sep_token,
50
+ mask_token=mask_token,
51
+ sp_model_kwargs=sp_model_kwargs,
52
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
53
+ **kwargs,
54
+ )
55
+
56
+ # the functions below are copied from hf transformers LlamaTokenizer's implementation to fix the behaviour of the tokenizer
57
+ # https://github.com/huggingface/transformers/blob/v4.30.2/src/transformers/models/llama/tokenization_llama.py
58
+
59
+ def __getstate__(self) -> dict[str, Any]:
60
+ state = self.__dict__.copy()
61
+ state["sp_model"] = None
62
+ state["sp_model_proto"] = self.sp_model.serialized_model_proto()
63
+ return state
64
+
65
+ def __setstate__(self, d: dict[str, Any]) -> None:
66
+ self.__dict__ = d
67
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
68
+ self.sp_model.LoadFromSerializedProto(self.sp_model_proto)
69
+
70
+ @property
71
+ def vocab_size(self) -> Any:
72
+ """Returns vocab size"""
73
+ return self.sp_model.get_piece_size()
74
+
75
+ def get_vocab(self) -> dict[str, int]:
76
+ """Returns vocab as a dict"""
77
+ vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
78
+ vocab.update(self.added_tokens_encoder)
79
+ return vocab
80
+
81
+ def convert_tokens_to_string(self, tokens: List[int]) -> str:
82
+ """Converts a sequence of tokens (string) in a single string."""
83
+ current_sub_tokens: List[int] = []
84
+ out_string = ""
85
+ prev_is_special = False
86
+ for i, token in enumerate(tokens):
87
+ # make sure that special tokens are not decoded using sentencepiece model
88
+ if token in self.all_special_tokens:
89
+ if not prev_is_special and i != 0:
90
+ out_string += " "
91
+ out_string += self.sp_model.decode(current_sub_tokens) + token
92
+ prev_is_special = True
93
+ current_sub_tokens = []
94
+ else:
95
+ current_sub_tokens.append(token)
96
+ prev_is_special = False
97
+ out_string += self.sp_model.decode(current_sub_tokens)
98
+ return out_string
99
+
100
+ def _tokenize(self, text: str) -> Any:
101
+ """Returns a tokenized string."""
102
+ return self.sp_model.encode(text, out_type=str)
103
+
104
+ def _convert_token_to_id(self, token: str) -> Any:
105
+ """Converts a token (str) in an id using the vocab."""
106
+ return self.sp_model.piece_to_id(token)
107
+
108
+ def _convert_id_to_token(self, index: int) -> Any:
109
+ """Converts an index (integer) in a token (str) using the vocab."""
110
+ token = self.sp_model.IdToPiece(index)
111
+ return token
112
+
113
+ def build_inputs_with_special_tokens(
114
+ self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
115
+ ) -> List[int]:
116
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
117
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
118
+
119
+ output = bos_token_id + token_ids_0 + eos_token_id
120
+
121
+ if token_ids_1 is not None:
122
+ output = output + bos_token_id + token_ids_1 + eos_token_id
123
+
124
+ return output
125
+
126
+ def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
127
+ """
128
+ Save the vocabulary and special tokens file to a directory.
129
+
130
+ Args:
131
+ save_directory (`str`):
132
+ The directory in which to save the vocabulary.
133
+
134
+ Returns:
135
+ `Tuple(str)`: Paths to the files saved.
136
+ """
137
+ if not os.path.isdir(save_directory):
138
+ logger.error(f"Vocabulary path ({save_directory}) should be a directory")
139
+ return ("",)
140
+ out_vocab_file = os.path.join(
141
+ save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
142
+ )
143
+
144
+ if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
145
+ copyfile(self.vocab_file, out_vocab_file)
146
+ elif not os.path.isfile(self.vocab_file):
147
+ with open(out_vocab_file, "wb") as fi:
148
+ content_spiece_model = self.sp_model.serialized_model_proto()
149
+ fi.write(content_spiece_model)
150
+
151
+ return (out_vocab_file,)
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59fe756bef3dc5d4813bd2eb9aeb7c39138cbd71e665bc85e6a4c10e766465da
3
+ size 1122464
tokenizer_config.json ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": true,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<pad>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "<cls>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "<sep>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "<mask>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ }
61
+ },
62
+ "additional_special_tokens": [
63
+ "<unk>",
64
+ "<s>",
65
+ "</s>",
66
+ "<pad>",
67
+ "<cls>",
68
+ "<sep>",
69
+ "<mask>"
70
+ ],
71
+ "auto_map": {
72
+ "AutoTokenizer": [
73
+ "tokenization_plamo.PlamoTokenizer",
74
+ null
75
+ ]
76
+ },
77
+ "bos_token": "<s>",
78
+ "clean_up_tokenization_spaces": false,
79
+ "cls_token": "<cls>",
80
+ "eos_token": "</s>",
81
+ "local_file_only": true,
82
+ "mask_token": "<mask>",
83
+ "model_max_length": 2048,
84
+ "pad_token": "<pad>",
85
+ "sep_token": "<sep>",
86
+ "sp_model_kwargs": {},
87
+ "tokenizer_class": "PlamoTokenizer",
88
+ "tokenizer_file": null,
89
+ "unk_token": "<unk>"
90
+ }