File size: 29,371 Bytes
9382e3f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 |
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
â ïž Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Glossary
ãã®çšèªéã¯ãäžè¬çãªæ©æ¢°åŠç¿ãš ð€ ãã©ã³ã¹ãã©ãŒããŒã®çšèªãå®çŸ©ããããã¥ã¡ã³ããŒã·ã§ã³ãããç解ããã®ã«åœ¹ç«ã¡ãŸãã
## A
### attention mask
ã¢ãã³ã·ã§ã³ ãã¹ã¯ã¯ãã·ãŒã±ã³ã¹ããããåŠçããéã«äœ¿çšããããªãã·ã§ã³ã®åŒæ°ã§ãã
<Youtube id="M6adb1j2jPI"/>
ãã®åŒæ°ã¯ãã¢ãã«ã«ã©ã®ããŒã¯ã³ã泚èŠãã¹ãããã©ã®ããŒã¯ã³ã泚èŠããªããã瀺ããŸãã
äŸãã°ã次ã®2ã€ã®ã·ãŒã±ã³ã¹ãèããŠã¿ãŠãã ããïŒ
```python
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
```
The encoded versions have different lengths:
```python
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
```
ãããã£ãŠããããã®ã·ãŒã±ã³ã¹ããã®ãŸãŸåããã³ãœã«ã«é
眮ããããšã¯ã§ããŸãããæåã®ã·ãŒã±ã³ã¹ã¯ã
2çªç®ã®ã·ãŒã±ã³ã¹ã®é·ãã«åãããŠããã£ã³ã°ããå¿
èŠããããŸãããŸãã¯ã2çªç®ã®ã·ãŒã±ã³ã¹ã¯ãæåã®ã·ãŒã±ã³ã¹ã®
é·ãã«åãè©°ããå¿
èŠããããŸãã
æåã®å ŽåãIDã®ãªã¹ãã¯ããã£ã³ã°ã€ã³ããã¯ã¹ã§æ¡åŒµãããŸããããŒã¯ãã€ã¶ã«ãªã¹ããæž¡ãã次ã®ããã«ããã£ã³ã°ããããã«
äŸé Œã§ããŸã:
```python
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
```
0sãè¿œå ãããŠãæåã®æã2çªç®ã®æãšåãé·ãã«ãªãã®ãããããŸãïŒ
```python
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
```
ããã¯ãPyTorchãŸãã¯TensorFlowã§ãã³ãœã«ã«å€æã§ããŸãã泚æãã¹ã¯ã¯ãã¢ãã«ããããã«æ³šæãæããªãããã«ãåã蟌ãŸããã€ã³ããã¯ã¹ã®äœçœ®ã瀺ããã€ããªãã³ãœã«ã§ãã[`BertTokenizer`]ã§ã¯ã`1`ã¯æ³šæãæãå¿
èŠãããå€ã瀺ãã`0`ã¯åã蟌ãŸããå€ã瀺ããŸãããã®æ³šæãã¹ã¯ã¯ãããŒã¯ãã€ã¶ãè¿ãèŸæžã®ããŒãattention_maskãã®äžã«ãããŸãã
```python
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
```
### autoencoding models
[ãšã³ã³ãŒããŒã¢ãã«](#encoder-models) ããã³ [ãã¹ã¯èšèªã¢ããªã³ã°](#masked-language-modeling-mlm) ãåç
§ããŠãã ããã
### autoregressive models
[å æèšèªã¢ããªã³ã°](#causal-language-modeling) ããã³ [ãã³ãŒããŒã¢ãã«](#decoder-models) ãåç
§ããŠãã ããã
## B
### backbone
ããã¯ããŒã³ã¯ãçã®é ããç¶æ
ãç¹åŸŽãåºåãããããã¯ãŒã¯ïŒåã蟌ã¿ãšå±€ïŒã§ããéåžžãç¹åŸŽãå
¥åãšããŠåãåãããã« [ããã](#head) ã«æ¥ç¶ãããŠãããäºæž¬ãè¡ããŸããããšãã°ã[`ViTModel`] ã¯ç¹å®ã®ããããäžã«ãªãããã¯ããŒã³ã§ããä»ã®ã¢ãã«ã [`VitModel`] ãããã¯ããŒã³ãšããŠäœ¿çšã§ããŸããäŸãã° [DPT](model_doc/dpt) ã§ãã
## C
### causal language modeling
ã¢ãã«ãããã¹ããé çªã«èªã¿ã次ã®åèªãäºæž¬ããäºåãã¬ãŒãã³ã°ã¿ã¹ã¯ã§ããéåžžãã¢ãã«ã¯æå
šäœãèªã¿åããŸãããç¹å®ã®ã¿ã€ã ã¹ãããã§æªæ¥ã®ããŒã¯ã³ãé ãããã«ã¢ãã«å
ã§ãã¹ã¯ã䜿çšããŸãã
### channel
ã«ã©ãŒç»åã¯ãèµ€ãç·ãéïŒRGBïŒã®3ã€ã®ãã£ãã«ã®å€ã®çµã¿åããããæãç«ã£ãŠãããã°ã¬ãŒã¹ã±ãŒã«ç»åã¯1ã€ã®ãã£ãã«ããæã¡ãŸãããð€ Transformers ã§ã¯ããã£ãã«ã¯ç»åã®ãã³ãœã«ã®æåãŸãã¯æåŸã®æ¬¡å
ã«ãªãããšããããŸãïŒ[`n_channels`, `height`, `width`] ãŸã㯠[`height`, `width`, `n_channels`]ã
### connectionist temporal classification (CTC)
å
¥åãšåºåãæ£ç¢ºã«ã©ã®ããã«æŽåããããæ£ç¢ºã«ç¥ããªããŠãã¢ãã«ãåŠç¿ãããã¢ã«ãŽãªãºã ãCTC ã¯ãç¹å®ã®å
¥åã«å¯ŸããŠãã¹ãŠã®å¯èœãªåºåã®ååžãèšç®ãããã®äžããæãå¯èœæ§ã®é«ãåºåãéžæããŸããCTC ã¯ãã¹ããŒã«ãŒã®ç°ãªãçºè©±é床ãªã©ãããŸããŸãªçç±ã§é³å£°ããã©ã³ã¹ã¯ãªãããšå®å
šã«æŽåããªãå Žåã«ãé³å£°èªèã¿ã¹ã¯ã§äžè¬çã«äœ¿çšãããŸãã
### convolution
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®äžçš®ã§ãå
¥åè¡åãèŠçŽ ããšã«å°ããªè¡åïŒã«ãŒãã«ãŸãã¯ãã£ã«ã¿ãŒïŒãšä¹ç®ãããå€ãæ°ããè¡åã«åèšãããã¬ã€ã€ãŒã®ã¿ã€ããããã¯å
¥åè¡åå
šäœã«å¯ŸããŠç¹°ãè¿ãããç³ã¿èŸŒã¿æäœãšããŠç¥ãããåæäœã¯å
¥åè¡åã®ç°ãªãã»ã°ã¡ã³ãã«é©çšãããŸããç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒCNNïŒã¯ãã³ã³ãã¥ãŒã¿ããžã§ã³ã§äžè¬çã«äœ¿çšãããŠããŸãã
## D
### decoder input IDs
ãã®å
¥åã¯ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã«ç¹æã§ããããã³ãŒããŒã«äŸçµŠãããå
¥åIDãå«ã¿ãŸãããããã®å
¥åã¯ã翻蚳ãèŠçŽãªã©ã®ã·ãŒã±ã³ã¹ããŒã·ãŒã±ã³ã¹ã¿ã¹ã¯ã«äœ¿çšãããéåžžãåã¢ãã«ã«åºæã®æ¹æ³ã§æ§ç¯ãããŸãã
ã»ãšãã©ã®ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ïŒBARTãT5ïŒã¯ã`labels` ããç¬èªã« `decoder_input_ids` ãäœæããŸãããã®ãããªã¢ãã«ã§ã¯ã`labels` ãæž¡ãããšããã¬ãŒãã³ã°ãåŠçããåªããæ¹æ³ã§ãã
ã·ãŒã±ã³ã¹ããŒã·ãŒã±ã³ã¹ãã¬ãŒãã³ã°ã«ããããããã®å
¥åIDã®åŠçæ¹æ³ã確èªããããã«ãåã¢ãã«ã®ããã¥ã¡ã³ãã確èªããŠãã ããã
### decoder models
ãªãŒããªã°ã¬ãã·ã§ã³ã¢ãã«ãšãåŒã°ããã¢ãã«ãããã¹ããé çªã«èªã¿ã次ã®åèªãäºæž¬ããäºåãã¬ãŒãã³ã°ã¿ã¹ã¯ïŒå æèšèªã¢ããªã³ã°ïŒã«é¢äžããŸããéåžžãã¢ãã«ã¯æå
šäœãèªã¿åããç¹å®ã®ã¿ã€ã ã¹ãããã§æªæ¥ã®ããŒã¯ã³ãé ããã¹ã¯ã䜿çšããŠè¡ãããŸãã
<Youtube id="d_ixlCubqQw"/>
### deep learning (DL)
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã§ãè€æ°ã®å±€ãæã£ãŠããŸãã
## E
### encoder models
ãªãŒããšã³ã³ãŒãã£ã³ã°ã¢ãã«ãšããŠãç¥ãããŠããããšã³ã³ãŒããŒã¢ãã«ã¯å
¥åïŒããã¹ããç»åãªã©ïŒããåã蟌ã¿ãšåŒã°ããç°¡ç¥åãããæ°å€è¡šçŸã«å€æããŸãããšã³ã³ãŒããŒã¢ãã«ã¯ããã°ãã°[ãã¹ã¯ãããèšèªã¢ããªã³ã°ïŒ#masked-language-modeling-mlmïŒ](#masked-language-modeling-mlm)ãªã©ã®æè¡ã䜿çšããŠäºåã«ãã¬ãŒãã³ã°ãããå
¥åã·ãŒã±ã³ã¹ã®äžéšããã¹ã¯ããã¢ãã«ã«ããæå³ã®ããè¡šçŸãäœæããããšã匷å¶ãããŸãã
<Youtube id="H39Z_720T5s"/>
## F
### feature extraction
çããŒã¿ãããæ
å ±è±ãã§æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã«ãšã£ãŠæçšãªç¹åŸŽã®ã»ããã«éžæããã³å€æããããã»ã¹ãç¹åŸŽæœåºã®äŸã«ã¯ãçã®ããã¹ããåèªåã蟌ã¿ã«å€æããããç»å/ãããªããŒã¿ãããšããžã圢ç¶ãªã©ã®éèŠãªç¹åŸŽãæœåºãããããããšãå«ãŸããŸãã
### feed forward chunking
ãã©ã³ã¹ãã©ãŒããŒå
ã®åæ®å·®æ³šæãããã¯ã§ã¯ãéåžžãèªå·±æ³šæå±€ã®åŸã«2ã€ã®ãã£ãŒããã©ã¯ãŒãå±€ãç¶ããŸãã
ãã£ãŒããã©ã¯ãŒãå±€ã®äžéåã蟌ã¿ãµã€ãºã¯ãã¢ãã«ã®é ãããµã€ãºããã倧ããããšããããããŸãïŒããšãã°ã`google-bert/bert-base-uncased`ã®å ŽåïŒã
å
¥åãµã€ãºã `[batch_sizeãsequence_length]` ã®å Žåãäžéãã£ãŒããã©ã¯ãŒãåã蟌㿠`[batch_sizeãsequence_lengthãconfig.intermediate_size]` ãä¿åããããã«å¿
èŠãªã¡ã¢ãªã¯ãã¡ã¢ãªã®å€§éšåãå ããããšããããŸãã[Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451)ã®èè
ã¯ãèšç®ã `sequence_length` 次å
ã«äŸåããªããããäž¡æ¹ã®ãã£ãŒããã©ã¯ãŒãå±€ã®åºååã蟌㿠`[batch_sizeãconfig.hidden_size]_0ã...ã[batch_sizeãconfig.hidden_size]_n` ãåå¥ã«èšç®ããåŸã§ `[batch_sizeãsequence_lengthãconfig.hidden_size]` ã«é£çµããããšã¯æ°åŠçã«ç䟡ã§ãããšæ°ä»ããŸãããããã«ãããå¢å ããèšç®æéãšã¡ã¢ãªäœ¿çšéã®ãã¬ãŒããªããçããŸãããæ°åŠçã«ç䟡ãªçµæãåŸãããŸãã
[`apply_chunking_to_forward`] é¢æ°ã䜿çšããã¢ãã«ã®å Žåã`chunk_size` ã¯äžŠåã«èšç®ãããåºååã蟌ã¿ã®æ°ãå®çŸ©ããã¡ã¢ãªãšæéã®è€éãã®ãã¬ãŒããªããå®çŸ©ããŸãã`chunk_size` ã 0 ã«èšå®ãããŠããå Žåããã£ãŒããã©ã¯ãŒãã®ãã£ã³ãã³ã°ã¯è¡ãããŸããã
### finetuned models
ãã¡ã€ã³ãã¥ãŒãã³ã°ã¯ãäºåã«ãã¬ãŒãã³ã°ãããã¢ãã«ãåãããã®éã¿ãåºå®ããæ°ããè¿œå ããã[model head](#head)ã§åºåã¬ã€ã€ãŒã眮ãæãã圢åŒã®è»¢ç§»åŠç¿ã§ããã¢ãã«ãããã¯å¯Ÿè±¡ã®ããŒã¿ã»ããã§ãã¬ãŒãã³ã°ãããŸãã
詳现ã«ã€ããŠã¯ã[Fine-tune a pretrained model](https://huggingface.co/docs/transformers/training) ãã¥ãŒããªã¢ã«ãåç
§ããŠãð€ Transformersã䜿çšããã¢ãã«ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æ¹æ³ãåŠã³ãŸãããã
## H
### head
ã¢ãã«ãããã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®æåŸã®ã¬ã€ã€ãŒãæããçã®é ããç¶æ
ãåãå
¥ããŠç°ãªã次å
ã«å°åœ±ããŸããåã¿ã¹ã¯ã«å¯ŸããŠç°ãªãã¢ãã«ãããããããŸããäŸãã°ïŒ
* [`GPT2ForSequenceClassification`] ã¯ãããŒã¹ã®[`GPT2Model`]ã®äžã«ããã·ãŒã±ã³ã¹åé¡ãããïŒç·åœ¢å±€ïŒã§ãã
* [`ViTForImageClassification`] ã¯ãããŒã¹ã®[`ViTModel`]ã®`CLS`ããŒã¯ã³ã®æçµé ããç¶æ
ã®äžã«ããç»ååé¡ãããïŒç·åœ¢å±€ïŒã§ãã
* [`Wav2Vec2ForCTC`] ã¯ã[CTC](#connectionist-temporal-classification-ctc)ãæã€ããŒã¹ã®[`Wav2Vec2Model`]ã®èšèªã¢ããªã³ã°ãããã§ãã
## I
### image patch
ããžã§ã³ããŒã¹ã®ãã©ã³ã¹ãã©ãŒããŒã¢ãã«ã¯ãç»åãããå°ããªãããã«åå²ããããããç·åœ¢ã«åã蟌ã¿ãã¢ãã«ã«ã·ãŒã±ã³ã¹ãšããŠæž¡ããŸããã¢ãã«ã®
### inference
æšè«ã¯ããã¬ãŒãã³ã°ãå®äºããåŸã«æ°ããããŒã¿ã§ã¢ãã«ãè©äŸ¡ããããã»ã¹ã§ãã ð€ Transformers ã䜿çšããŠæšè«ãå®è¡ããæ¹æ³ã«ã€ããŠã¯ã[æšè«ã®ãã€ãã©ã€ã³](https://huggingface.co/docs/transformers/pipeline_tutorial) ãã¥ãŒããªã¢ã«ãåç
§ããŠãã ããã
### input IDs
å
¥åIDã¯ãã¢ãã«ãžã®å
¥åãšããŠæž¡ãå¿
èŠããããã©ã¡ãŒã¿ãŒã®äžã§æãäžè¬çãªãã®ã§ãããããã¯ããŒã¯ã³ã®ã€ã³ããã¯ã¹ã§ãããã¢ãã«ã«ãã£ãŠå
¥åãšããŠäœ¿çšãããã·ãŒã±ã³ã¹ãæ§ç¯ããããŒã¯ã³ã®æ°å€è¡šçŸã§ãã
<Youtube id="VFp38yj8h3A"/>
åããŒã¯ãã€ã¶ãŒã¯ç°ãªãæ¹æ³ã§åäœããŸãããåºæ¬çãªã¡ã«ããºã ã¯åãã§ãã以äžã¯BERTããŒã¯ãã€ã¶ãŒã䜿çšããäŸã§ããBERTããŒã¯ãã€ã¶ãŒã¯[WordPiece](https://arxiv.org/pdf/1609.08144.pdf)ããŒã¯ãã€ã¶ãŒã§ãã
```python
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM"
```
ããŒã¯ãã€ã¶ãŒã¯ãã·ãŒã±ã³ã¹ãããŒã¯ãã€ã¶ãŒèªåœã§äœ¿çšå¯èœãªããŒã¯ã³ã«åå²ããŸãã
```python
>>> tokenized_sequence = tokenizer.tokenize(sequence)
```
ããŒã¯ã³ã¯åèªãŸãã¯ãµãã¯ãŒãã§ãã ããšãã°ãããã§ã¯ "VRAM" ã¯ã¢ãã«ã®èªåœã«å«ãŸããŠããªãã£ãããã"V"ã"RA"ã"M" ã«åå²ãããŸããã
ãããã®ããŒã¯ã³ãå¥ã
ã®åèªã§ã¯ãªããåãåèªã®äžéšã§ããããšã瀺ãããã«ã"RA" ãš "M" ã«ã¯ããã«ããã·ã¥ã®ãã¬ãã£ãã¯ã¹ãè¿œå ãããŸãã
```python
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
```
ãããã®ããŒã¯ã³ã¯ãã¢ãã«ãç解ã§ããããã«IDã«å€æã§ããŸããããã¯ãæãããŒã¯ãã€ã¶ãŒã«çŽæ¥äŸçµŠããŠè¡ãããšãã§ããŸããããŒã¯ãã€ã¶ãŒã¯ãããã©ãŒãã³ã¹ã®åäžã®ããã«[ð€ Tokenizers](https://github.com/huggingface/tokenizers)ã®Rustå®è£
ã掻çšããŠããŸãã
```python
>>> inputs = tokenizer(sequence)
```
ããŒã¯ãã€ã¶ãŒã¯ã察å¿ããã¢ãã«ãæ£ããåäœããããã«å¿
èŠãªãã¹ãŠã®åŒæ°ãå«ãèŸæžãè¿ããŸããããŒã¯ã³ã®ã€ã³ããã¯ã¹ã¯ãã㌠`input_ids` ã®äžã«ãããŸãã
```python
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
```
泚æïŒããŒã¯ãã€ã¶ã¯ãé¢é£ããã¢ãã«ãããããå¿
èŠãšããå Žåã«èªåçã«ãç¹å¥ãªããŒã¯ã³ããè¿œå ããŸãããããã¯ãã¢ãã«ãææ䜿çšããç¹å¥ãªIDã§ãã
åã®IDã·ãŒã±ã³ã¹ããã³ãŒãããå Žåã
```python
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
```
ç§ãã¡ã¯èŠãŸã
```python
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
```
ããã¯[`BertModel`]ããã®å
¥åãæåŸ
ããæ¹æ³ã§ãã
## L
### Labels
ã©ãã«ã¯ãã¢ãã«ãæ倱ãèšç®ããããã«æž¡ãããšãã§ãããªãã·ã§ã³ã®åŒæ°ã§ãããããã®ã©ãã«ã¯ãã¢ãã«ã®äºæž¬ã®æåŸ
å€ã§ããã¹ãã§ããã¢ãã«ã¯ãéåžžã®æ倱ã䜿çšããŠããã®äºæž¬ãšæåŸ
å€ïŒã©ãã«ïŒãšã®éã®æ倱ãèšç®ããŸãã
ãããã®ã©ãã«ã¯ã¢ãã«ã®ãããã«å¿ããŠç°ãªããŸããããšãã°ïŒ
- ã·ãŒã±ã³ã¹åé¡ã¢ãã«ïŒ[`BertForSequenceClassification`]ïŒã®å Žåãã¢ãã«ã¯æ¬¡å
ã `(batch_size)` ã®ãã³ãœã«ãæåŸ
ãããããå
ã®åå€ãã·ãŒã±ã³ã¹å
šäœã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã
- ããŒã¯ã³åé¡ã¢ãã«ïŒ[`BertForTokenClassification`]ïŒã®å Žåãã¢ãã«ã¯æ¬¡å
ã `(batch_size, seq_length)` ã®ãã³ãœã«ãæåŸ
ããåå€ãååã
ã®ããŒã¯ã³ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã
- ãã¹ã¯èšèªã¢ããªã³ã°ã®å ŽåïŒ[`BertForMaskedLM`]ïŒãã¢ãã«ã¯æ¬¡å
ã `(batch_size, seq_length)` ã®ãã³ãœã«ãæåŸ
ããåå€ãååã
ã®ããŒã¯ã³ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸããããã§ã®ã©ãã«ã¯ãã¹ã¯ãããããŒã¯ã³ã®ããŒã¯ã³IDã§ãããä»ã®ããŒã¯ã³ã«ã¯éåžž -100 ãªã©ã®å€ãèšå®ãããŸãã
- ã·ãŒã±ã³ã¹éã®ã¿ã¹ã¯ã®å ŽåïŒ[`BartForConditionalGeneration`]ã[`MBartForConditionalGeneration`]ïŒãã¢ãã«ã¯æ¬¡å
ã `(batch_size, tgt_seq_length)` ã®ãã³ãœã«ãæåŸ
ããåå€ãåå
¥åã·ãŒã±ã³ã¹ã«é¢é£ä»ããããã¿ãŒã²ããã·ãŒã±ã³ã¹ã«å¯Ÿå¿ããŸãããã¬ãŒãã³ã°äžãBARTãšT5ã®äž¡æ¹ã¯é©å㪠`decoder_input_ids` ãšãã³ãŒããŒã®ã¢ãã³ã·ã§ã³ãã¹ã¯ãå
éšã§çæããŸããéåžžãããããæäŸããå¿
èŠã¯ãããŸãããããã¯ãšã³ã³ãŒããŒãã³ãŒããŒãã¬ãŒã ã¯ãŒã¯ãå©çšããã¢ãã«ã«ã¯é©çšãããŸããã
- ç»ååé¡ã¢ãã«ã®å ŽåïŒ[`ViTForImageClassification`]ïŒãã¢ãã«ã¯æ¬¡å
ã `(batch_size)` ã®ãã³ãœã«ãæåŸ
ãããããå
ã®åå€ãååã
ã®ç»åã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã
- ã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã¢ãã«ã®å ŽåïŒ[`SegformerForSemanticSegmentation`]ïŒãã¢ãã«ã¯æ¬¡å
ã `(batch_size, height, width)` ã®ãã³ãœã«ãæåŸ
ãããããå
ã®åå€ãååã
ã®ãã¯ã»ã«ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã
- ç©äœæ€åºã¢ãã«ã®å ŽåïŒ[`DetrForObjectDetection`]ïŒãã¢ãã«ã¯ååã
ã®ç»åã®äºæž¬ã©ãã«ãšå¢çããã¯ã¹ã®æ°ã«å¯Ÿå¿ãã `class_labels` ãš `boxes` ããŒãæã€èŸæžã®ãªã¹ããæåŸ
ããŸãã
- èªåé³å£°èªèã¢ãã«ã®å ŽåïŒ[`Wav2Vec2ForCTC`]ïŒãã¢ãã«ã¯æ¬¡å
ã `(batch_size, target_length)` ã®ãã³ãœã«ãæåŸ
ããåå€ãååã
ã®ããŒã¯ã³ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã
<Tip>
åã¢ãã«ã®ã©ãã«ã¯ç°ãªãå Žåããããããåžžã«åã¢ãã«ã®ããã¥ã¡ã³ãã確èªããŠããããã®ç¹å®ã®ã©ãã«ã«é¢ãã詳现æ
å ±ã確èªããŠãã ããïŒ
</Tip>
ããŒã¹ã¢ãã«ïŒ[`BertModel`]ïŒã¯ã©ãã«ãåãå
¥ããŸããããããã¯ããŒã¹ã®ãã©ã³ã¹ãã©ãŒããŒã¢ãã«ã§ãããåã«ç¹åŸŽãåºåããŸãã
### large language models (LLM)
倧éã®ããŒã¿ã§ãã¬ãŒãã³ã°ãããå€æåšèšèªã¢ãã«ïŒGPT-3ãBLOOMãOPTïŒãæãäžè¬çãªçšèªã§ãããããã®ã¢ãã«ã¯éåžžãå€ãã®åŠç¿å¯èœãªãã©ã¡ãŒã¿ãæã£ãŠããŸãïŒããšãã°ãGPT-3ã®å Žåã1750ååïŒã
## M
### masked language modeling (MLM)
ã¢ãã«ã¯ããã¹ãã®ç ŽæããŒãžã§ã³ãèŠãäºåãã¬ãŒãã³ã°ã¿ã¹ã¯ã§ãéåžžã¯ã©ã³ãã ã«äžéšã®ããŒã¯ã³ããã¹ãã³ã°ããŠå
ã®ããã¹ããäºæž¬ããå¿
èŠããããŸãã
### multimodal
ããã¹ããšå¥ã®çš®é¡ã®å
¥åïŒããšãã°ç»åïŒãçµã¿åãããã¿ã¹ã¯ã§ãã
## N
### Natural language generation (NLG)
ããã¹ããçæããé¢é£ãããã¹ãŠã®ã¿ã¹ã¯ïŒããšãã°ã[Transformersã§æžã](https://transformer.huggingface.co/)ã翻蚳ãªã©ïŒã
### Natural language processing (NLP)
ããã¹ããæ±ãæ¹æ³ãäžè¬çã«è¡šçŸãããã®ã§ãã
### Natural language understanding (NLU)
ããã¹ãå
ã«äœãããããç解ããé¢é£ãããã¹ãŠã®ã¿ã¹ã¯ïŒããšãã°ãããã¹ãå
šäœã®åé¡ãåã
ã®åèªã®åé¡ãªã©ïŒã
## P
### pipeline
ð€ Transformersã®ãã€ãã©ã€ã³ã¯ãããŒã¿ã®ååŠçãšå€æãç¹å®ã®é åºã§å®è¡ããŠããŒã¿ãåŠçããã¢ãã«ããäºæž¬ãè¿ãäžé£ã®ã¹ããããæãæœè±¡åã§ãããã€ãã©ã€ã³ã«èŠãããããã€ãã®ã¹ããŒãžã®äŸã«ã¯ãããŒã¿ã®ååŠçãç¹åŸŽæœåºãæ£èŠåãªã©ããããŸãã
詳现ã«ã€ããŠã¯ã[æšè«ã®ããã®ãã€ãã©ã€ã³](https://huggingface.co/docs/transformers/pipeline_tutorial)ãåç
§ããŠãã ããã
### pixel values
ã¢ãã«ã«æž¡ãããç»åã®æ°å€è¡šçŸã®ãã³ãœã«ã§ãããã¯ã»ã«å€ã¯ã圢ç¶ã [`ããããµã€ãº`, `ãã£ãã«æ°`, `é«ã`, `å¹
`] ã®è¡åã§ãç»åããã»ããµããçæãããŸãã
### pooling
è¡åãå°ããªè¡åã«çž®å°ããæäœã§ãããŒã«å¯Ÿè±¡ã®æ¬¡å
ã®æ倧å€ãŸãã¯å¹³åå€ãåãããšãäžè¬çã§ããããŒãªã³ã°ã¬ã€ã€ãŒã¯äžè¬çã«ç³ã¿èŸŒã¿ã¬ã€ã€ãŒã®éã«èŠãããç¹åŸŽè¡šçŸãããŠã³ãµã³ããªã³ã°ããŸãã
### position IDs
ããŒã¯ã³ããšã®äœçœ®ãåã蟌ãŸããŠããRNNãšã¯ç°ãªãããã©ã³ã¹ãã©ãŒããŒã¯åããŒã¯ã³ã®äœçœ®ãææ¡ããŠããŸããããããã£ãŠãã¢ãã«ã¯ããŒã¯ã³ã®äœçœ®ãèå¥ããããã«äœçœ®IDïŒ`position_ids`ïŒã䜿çšããŸãã
ããã¯ãªãã·ã§ã³ã®ãã©ã¡ãŒã¿ã§ããã¢ãã«ã« `position_ids` ãæž¡ãããªãå ŽåãIDã¯èªåçã«çµ¶å¯Ÿçãªäœçœ®åã蟌ã¿ãšããŠäœæãããŸãã
絶察çãªäœçœ®åã蟌ã¿ã¯ç¯å² `[0ãconfig.max_position_embeddings - 1]` ããéžæãããŸããäžéšã®ã¢ãã«ã¯ãæ£åŒŠæ³¢äœçœ®åã蟌ã¿ãçžå¯Ÿäœçœ®åã蟌ã¿ãªã©ãä»ã®ã¿ã€ãã®äœçœ®åã蟌ã¿ã䜿çšããããšããããŸãã
### preprocessing
çããŒã¿ãæ©æ¢°åŠç¿ã¢ãã«ã§ç°¡åã«åŠçã§ãã圢åŒã«æºåããã¿ã¹ã¯ã§ããäŸãã°ãããã¹ãã¯éåžžãããŒã¯ã³åã«ãã£ãŠååŠçãããŸããä»ã®å
¥åã¿ã€ãã«å¯ŸããååŠçã®å
·äœçãªæ¹æ³ãç¥ãããå Žåã¯ã[Preprocess](https://huggingface.co/docs/transformers/preprocessing) ãã¥ãŒããªã¢ã«ãã芧ãã ããã
### pretrained model
ããããŒã¿ïŒããšãã°ãWikipediaå
šäœãªã©ïŒã§äºåã«åŠç¿ãããã¢ãã«ã§ããäºååŠç¿ã®æ¹æ³ã«ã¯ãèªå·±æåž«ããã®ç®çãå«ãŸããããã¹ããèªã¿åãã次ã®åèªãäºæž¬ããããšãããã®ïŒ[å æèšèªã¢ããªã³ã°](#causal-language-modeling)ãåç
§ïŒããäžéšã®åèªããã¹ã¯ããããããäºæž¬ããããšãããã®ïŒ[ãã¹ã¯èšèªã¢ããªã³ã°](#masked-language-modeling-mlm)ãåç
§ïŒããããŸãã
é³å£°ãšããžã§ã³ã¢ãã«ã«ã¯ç¬èªã®äºååŠç¿ã®ç®çããããŸããããšãã°ãWav2Vec2ã¯é³å£°ã¢ãã«ã§ãã¢ãã«ã«å¯ŸããŠãçã®ãé³å£°è¡šçŸãåœã®é³å£°è¡šçŸã®ã»ããããèå¥ããå¿
èŠããã察æ¯çãªã¿ã¹ã¯ã§äºååŠç¿ãããŠããŸããäžæ¹ãBEiTã¯ããžã§ã³ã¢ãã«ã§ãäžéšã®ç»åãããããã¹ã¯ããã¢ãã«ã«ãã¹ã¯ãããããããäºæž¬ãããã¿ã¹ã¯ïŒãã¹ã¯èšèªã¢ããªã³ã°ã®ç®çãšäŒŒãŠããŸãïŒã§äºååŠç¿ãããŠããŸãã
## R
### recurrent neural network (RNN)
ããã¹ããåŠçããããã«å±€ãã«ãŒããããã¢ãã«ã®äžçš®ã§ãã
### representation learning
çããŒã¿ã®æå³ã®ããè¡šçŸãåŠç¿ããæ©æ¢°åŠç¿ã®ãµããã£ãŒã«ãã§ããè¡šçŸåŠç¿ã®æè¡ã®äžéšã«ã¯åèªåã蟌ã¿ããªãŒããšã³ã³ãŒããŒãGenerative Adversarial NetworksïŒGANsïŒãªã©ããããŸãã
## S
### sampling rate
ç§ããšã«åããããµã³ãã«ïŒãªãŒãã£ãªä¿¡å·ãªã©ïŒã®æ°ããã«ãåäœã§æž¬å®ãããã®ã§ãããµã³ããªã³ã°ã¬ãŒãã¯é³å£°ãªã©ã®é£ç¶ä¿¡å·ãé¢æ£åããçµæã§ãã
### self-attention
å
¥åã®åèŠçŽ ã¯ãã©ã®ä»ã®èŠçŽ ã«æ³šæãæãã¹ãããæ€åºããŸãã
### self-supervised learning
ã¢ãã«ãã©ãã«ã®ãªãããŒã¿ããèªåèªèº«ã®åŠç¿ç®æšãäœæããæ©æ¢°åŠç¿æè¡ã®ã«ããŽãªã§ããããã¯[æåž«ãªãåŠç¿](#unsupervised-learning)ã[æåž«ããåŠç¿](#supervised-learning)ãšã¯ç°ãªããåŠç¿ããã»ã¹ã¯ãŠãŒã¶ãŒããã¯æ瀺çã«ã¯ç£ç£ãããŠããªãç¹ãç°ãªããŸãã
èªå·±æåž«ããåŠç¿ã®1ã€ã®äŸã¯[ãã¹ã¯èšèªã¢ããªã³ã°](#masked-language-modeling-mlm)ã§ãã¢ãã«ã«ã¯äžéšã®ããŒã¯ã³ãåé€ãããæãäžããããæ¬ èœããããŒã¯ã³ãäºæž¬ããããã«åŠç¿ããŸãã
### semi-supervised learning
ã©ãã«ä»ãããŒã¿ã®å°éãšã©ãã«ã®ãªãããŒã¿ã®å€§éãçµã¿åãããŠã¢ãã«ã®ç²ŸåºŠãåäžãããåºç¯ãªæ©æ¢°åŠç¿ãã¬ãŒãã³ã°æè¡ã®ã«ããŽãªã§ãã[æåž«ããåŠç¿](#supervised-learning)ã[æåž«ãªãåŠç¿](#unsupervised-learning)ãšã¯ç°ãªããåæåž«ããåŠç¿ã®ã¢ãããŒãã®1ã€ã¯ãã»ã«ããã¬ãŒãã³ã°ãã§ãããã¢ãã«ã¯ã©ãã«ä»ãããŒã¿ã§ãã¬ãŒãã³ã°ããã次ã«ã©ãã«ã®ãªãããŒã¿ã§äºæž¬ãè¡ããŸããã¢ãã«ãæãèªä¿¡ãæã£ãŠäºæž¬ããéšåãã©ãã«ä»ãããŒã¿ã»ããã«è¿œå ãããã¢ãã«ã®åãã¬ãŒãã³ã°ã«äœ¿çšãããŸãã
### sequence-to-sequence (seq2seq)
å
¥åããæ°ããã·ãŒã±ã³ã¹ãçæããã¢ãã«ã§ãã翻蚳ã¢ãã«ãèŠçŽã¢ãã«ïŒ[Bart](model_doc/bart)ã[T5](model_doc/t5)ãªã©ïŒãªã©ãããã«è©²åœããŸãã
### stride
[ç³ã¿èŸŒã¿](#convolution)ãŸãã¯[ããŒãªã³ã°](#pooling)ã«ãããŠãã¹ãã©ã€ãã¯ã«ãŒãã«ãè¡åäžã§ç§»åããè·é¢ãæããŸããã¹ãã©ã€ãã1ã®å Žåãã«ãŒãã«ã¯1ãã¯ã»ã«ãã€ç§»åããã¹ãã©ã€ãã2ã®å Žåãã«ãŒãã«ã¯2ãã¯ã»ã«ãã€ç§»åããŸãã
### supervised learning
ã¢ãã«ã®ãã¬ãŒãã³ã°æ¹æ³ã®äžã€ã§ãçŽæ¥ã©ãã«ä»ãããŒã¿ã䜿çšããŠã¢ãã«ã®æ§èœãä¿®æ£ãæå°ããŸããããŒã¿ããã¬ãŒãã³ã°ãããŠããã¢ãã«ã«äŸçµŠããããã®äºæž¬ãæ¢ç¥ã®ã©ãã«ãšæ¯èŒãããŸããã¢ãã«ã¯äºæž¬ãã©ãã ã誀ã£ãŠãããã«åºã¥ããŠéã¿ãæŽæ°ããããã»ã¹ã¯ã¢ãã«ã®æ§èœãæé©åããããã«ç¹°ãè¿ãããŸãã
## T
### token
æã®äžéšã§ãããéåžžã¯åèªã§ããããµãã¯ãŒãïŒäžè¬çã§ãªãåèªã¯ãã°ãã°ãµãã¯ãŒãã«åå²ãããããšããããŸãïŒãŸãã¯å¥èªç¹ã®èšå·ã§ããããšããããŸãã
### token Type IDs
äžéšã®ã¢ãã«ã¯ãæã®ãã¢ã®åé¡ã質åå¿çãè¡ãããšãç®çãšããŠããŸãã
<Youtube id="0u3ioSwev3s"/>
ããã«ã¯ç°ãªã2ã€ã®ã·ãŒã±ã³ã¹ãåäžã®ãinput_idsããšã³ããªã«çµåããå¿
èŠããããéåžžã¯åé¡åïŒ`[CLS]`ïŒãåºåãèšå·ïŒ`[SEP]`ïŒãªã©ã®ç¹å¥ãªããŒã¯ã³ã®å©ããåããŠå®è¡ãããŸããäŸãã°ãBERTã¢ãã«ã¯æ¬¡ã®ããã«2ã€ã®ã·ãŒã±ã³ã¹å
¥åãæ§ç¯ããŸãïŒ
æ¥æ¬èªèš³ãæäŸããŠããã ãããã§ããMarkdown圢åŒã§èšè¿°ããŠãã ããã
```python
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
```
æã
ã¯ãåè¿°ã®ããã«ã2ã€ã®ã·ãŒã±ã³ã¹ã2ã€ã®åŒæ°ãšã㊠`tokenizer` ã«æž¡ãããšã§ããã®ãããªæãèªåçã«çæããããšãã§ããŸãïŒä»¥åã®ããã«ãªã¹ãã§ã¯ãªãïŒã以äžã®ããã«ïŒ
```python
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
```
ããã«å¯Ÿå¿ããã³ãŒãã¯ä»¥äžã§ãïŒ
```python
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
```
äžéšã®ã¢ãã«ã§ã¯ã1ã€ã®ã·ãŒã±ã³ã¹ãã©ãã§çµãããå¥ã®ã·ãŒã±ã³ã¹ãã©ãã§å§ãŸãããç解ããã®ã«ååãªæ
å ±ãåãã£ãŠããŸãããã ããBERTãªã©ã®ä»ã®ã¢ãã«ã§ã¯ãããŒã¯ã³ã¿ã€ãIDïŒã»ã°ã¡ã³ãIDãšãåŒã°ããïŒã䜿çšãããŠããŸããããã¯ãã¢ãã«å
ã®2ã€ã®ã·ãŒã±ã³ã¹ãèå¥ãããã€ããªãã¹ã¯ãšããŠè¡šãããŸãã
ããŒã¯ãã€ã¶ã¯ããã®ãã¹ã¯ããtoken_type_idsããšããŠè¿ããŸãã
```python
>>> encoded_dict["token_type_ids"]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```
æåã®ã·ãŒã±ã³ã¹ãã€ãŸã質åã®ããã«äœ¿çšããããã³ã³ããã¹ããã¯ããã¹ãŠã®ããŒã¯ã³ãã0ãã§è¡šãããŠããŸããäžæ¹ã2çªç®ã®ã·ãŒã±ã³ã¹ã質åã«å¯Ÿå¿ãããã®ã¯ããã¹ãŠã®ããŒã¯ã³ãã1ãã§è¡šãããŠããŸãã
äžéšã®ã¢ãã«ãäŸãã° [`XLNetModel`] ã®ããã«ãè¿œå ã®ããŒã¯ã³ãã2ãã§è¡šãããŸãã
### transfer learning
äºåã«åŠç¿ãããã¢ãã«ãåãããããã¿ã¹ã¯åºæã®ããŒã¿ã»ããã«é©å¿ãããæè¡ããŒãããã¢ãã«ãèšç·Žãã代ããã«ãæ¢åã®ã¢ãã«ããåŸãç¥èãåºçºç¹ãšããŠæŽ»çšã§ããŸããããã«ããåŠç¿ããã»ã¹ãå éããå¿
èŠãªèšç·ŽããŒã¿ã®éãæžå°ããŸãã
### transformer
èªå·±æ³šæããŒã¹ã®æ·±å±€åŠç¿ã¢ãã«ã¢ãŒããã¯ãã£ã
## U
### unsupervised learning
ã¢ãã«ã«æäŸãããããŒã¿ãã©ãã«ä»ããããŠããªãã¢ãã«ãã¬ãŒãã³ã°ã®åœ¢æ
ãæåž«ãªãåŠç¿ã®æè¡ã¯ãã¿ã¹ã¯ã«åœ¹ç«ã€ãã¿ãŒã³ãèŠã€ããããã«ããŒã¿ååžã®çµ±èšæ
å ±ã掻çšããŸãã
|