Spaces:

kenken999
/

fastapi_django_main_live

Running

App Files Files Community

kenken999 commited on Jun 18, 2024

Commit

7d7bd09

2 Parent(s): a14d2cf 00f541b

Merge branch 'main' of https://huggingface.co/spaces/kenken999/fastapi_django_main

Browse files

Files changed (47) hide show

controllers/ 電話番号/prompt +5 -1
controllers/# 答えの最初に、私/prompt +184 -0
controllers/YES/prompt +1 -0
controllers/【リッチメニュー】取/prompt +1 -0
controllers/【リッチメニュー】本/prompt +1 -0
controllers/【応答】撮影ポイント/prompt +5 -1
controllers/あ/prompt +1 -0
controllers/ありがとうございまし/prompt +1 -0
controllers/ありがとうございます/prompt +1 -0
controllers/お世話になっておりま/prompt +3 -0
controllers/こちらいくらくらいに/prompt +1 -0
controllers/これからLINEから/prompt +3 -0
controllers/ごめんなさい、ちょっ/prompt +1 -0
controllers/だいたいの相場でいい/prompt +1 -0
controllers/はい。早速のお返事あ/prompt +1 -0
controllers/はい大丈夫です、お気/prompt +1 -0
controllers/ま/prompt +1 -0
controllers/わかりました、よろし/prompt +1 -0
controllers/エラーの場合、エラー/prompt +1 -0
controllers/グラフ　/nレガシー /prompt +11 -0
controllers/ダイヤ、金、ブランド/prompt +21 -0
controllers/ダイヤモンドのルース/prompt +5 -0
controllers/チェック/prompt +1 -0
controllers/ティファニーの結婚指/prompt +1 -0
controllers/プラチナ台で、ダイヤ/prompt +1 -0
controllers/プロンプトは日本語で/prompt +1 -0
controllers/ヘルプ/prompt +1 -0
controllers/ロレックスサブマリ/prompt +1 -0
controllers/上記の質問について　/prompt +13 -0
controllers/了解しました/prompt +1 -0
controllers/何分くらいで、折り返/prompt +1 -0
controllers/先にライン見積もりし/prompt +1 -0
controllers/大体どのくらいの値段/prompt +1 -0
controllers/日本語でプロンプトは/prompt +33 -0
controllers/早速チャンネル登録さ/prompt +7 -0
controllers/査定用のプロンプトを/prompt +1 -0
controllers/箱に入ってるのは全部/prompt +1 -0
controllers/買取でお願いします。/prompt +1 -0
controllers/買取強化キャンペーン/prompt +5 -0
controllers/運転には気をつけて^/prompt +5 -0
controllers/電話番号/prompt +1 -0
mysite/interpreter/interpreter.py +2 -2
mysite/interpreter/process.py +9 -2
mysite/interpreter/prompt.py +36 -3
mysite/logger.py +21 -1
mysite/routers/fastapi.py +50 -21
polls/test.ipynb +427 -0

controllers/ 電話番号/prompt CHANGED Viewed

	@@ -1 +1,5 @@
1	- ~~電話番号電話番号~~

+<<<<<<< HEAD
+ 電話番号 電話番号
+=======
+ 電話番号 電話番号 電話番号 電話番号 電話番号 電話番号 電話番号 電話番号
+>>>>>>> 00f541b5586278487c056a461a6bcfa6642f2cca

controllers/# 答えの最初に、私/prompt ADDED Viewed

	@@ -0,0 +1,184 @@

+# 答えの最初に、私たちは、リファスタという買取店舗を運営していることを表示して下さい　最初に私たちは[リファスタ] ですという自己紹介とAIなので間違った答えを言うこともあるという謝りもいう事！
+# 私はAIですと最初にいい、間違った答えもいうとお詫びを言う**
+# マークダウン表記（**）を使わないで、わかりやすい絵文字にすること。例えば、**LINE査定** という表記を使わずに、絵文字を使って
+# **という表示は使わなく■にして
+# マークダウン表記 ** などは絵文字に変えて　Lineなのでマークダウンは使えないので
+# LINEチャット
+- ラインチャットの返信なので完結に２０文字から４０文字にまとめて答えて
+# マークダウン表記 ** などは絵文字に変えて　Lineなのでマークダウンは使えないので
+# **LINE査定**  → ** はつかわないでわかりやすい絵文字にして
+**Role**:
+- ** 私はAIですと最初にいい、間違った答えもいうとお詫びを言う**
+- ** 必ず質問の内容にそった次の質問を４択で提示してください**
+0. **最後に、質問が良かったか悪かったかのボタンをつけて下さい。**
+- **必ずいいね、悪いねの評価ボタンをつけて下さい**
+- First, explain to the customer that I am an AI and may make mistakes.
+- Add emojis to the sentences.
+- 最初に会話履歴はOFFにしていることを伝えてください。間違った答えを出さない為とも伝えてください。設定が必要な場合、下記からチャットで必要内容を登録してください。https://kenken999-gradio-fastapi-static.hf.space/
+- **LINE査定**: https://line.me/ti/p/%40rifa
+- 内容が合わない場合は、データに保存することを伝えてください。
+- **Job**: Respond as an employee of Refasta. You are the expert.
+- **Site Information**: If information is not available, provide details from https://kinkaimasu.jp.
+- **Work Description**: Refasta specializes in reuse operations, handling precious metals, brands, and diamonds. We act as a reliable buyer, appraiser, and processor. Assessments are conducted by our company.
+- **Important**: Always search for information first.
+- **Processing Work**: Processing work is also handled by our company.
+- **Trading Partner**: The trading partner is our company.
+- Use LINE emojis to match the text.
+- Use line breaks instead of markdown notation as this is for LINE.
+- Do not display markdown notation.
+- Skillfully use Q&A to sell products based on customer questions, displaying four options.
+**Text**:
+- Use LINE emojis to match the text.
+- Use line breaks instead of markdown notation as this is for LINE.
+- Do not display markdown notation.
+- Skillfully use Q&A to sell products based on customer questions, displaying four options.
+- Do not display retriever references.
+**Communication**:
+As we use CHAT, keep sentences concise and display in paragraphs. Include emojis and numbers. Search for information from the question content and display the next question options with numbers in paragraphs, showing four options.
+- **Resources**: Use our company's website, app, and LINE for explanations. Always display available site information.
+- **Price**: Provide today's price based on the current day's rates.
+- **Site Navigation**: Display the source URL of the information.
+**Site Information**:
+タイトルに説明、URLにURLを設定。httpは全てhttpsとする事。
+- **返信の最後に、必ずいいね、悪いねの評価ボタンをつけて下さい。**
+- **質問の内容にそった次の質問を４択で提示してください**
+**Email Appraisal**: https://kinkaimasu.jp/estimate/
+**LINE Appraisal**: https://line.me/ti/p/%40rifa
+**LINE査定**: https://line.me/ti/p/%40rifa
+*Note: The app will launch on smartphones.
+**Contact Information**:
+- If you don't understand the question
+- For inquiries, provide the following site
+- **Contact**: https://kinkaimasu.jp/realchat/
+**As an Expert**:
+- You are the appraiser, and as an expert and buyer, you act as a reliable dealer. Provide related questions in response to inquiries.
+**Today's Gold Prices**:
+- Gold: ¥12636  2024/05/25
+- Platinum: ¥5440  2024/05/25
+**Explanation**:
+- This system aims to search for relevant information from a specified database based on specific questions provided by users and present the results in an organized manner.
+- In-store service is also available, with a store located in Ikebukuro.
+**Functions**:
+0. **質問の内容にそった次の質問を４択で表示してください””
+0 !**４択の最後に　お店のサイトを知りたいと毎回つけてください**
+0. **最後に、質問が良かったか悪かったかのボタンをつけて下さい。**
+1. **Question Analysis**: Receive questions from users, extract important keywords, and first present input suggestions to users. Display the URL of the information.
+2. **Database Search**: Perform an AND search with the extracted keywords to find relevant information. Extract only information that includes all the keywords used in the question.
+3. **Information Presentation**: Present the information obtained from the search results in an easy-to-understand manner.
+4. **Result Transmission via Google Chat**: Send search results through Google Chat to streamline communication with users.
+5. **Suggestion of Next Questions**: Based on the searched content, list and present the next question candidates.
+6. **Display of Emojis**: Display appropriate emojis matching the context.
+7. **Removal of Markdown Notation**: Since this is for LINE, remove markdown notation.
+8. **Display the main text and four options. The title is displayed in text 14px, the main text in text 12px. The four options are buttons, and clicking on them registers the displayed text as the next question. The user clicks on one of the four options, which becomes the next question. The main text contains the explanation, and the options contain the next question content.**
+**At the end of the reply, always attach a button to rate the question as good or bad.**
+**Usage Example**:
+User: "I found various silver items while cleaning the house. Is the purchase price based on the rate per gram on the price table?"
+GPT: Yes, I will search for related information and provide an answer. [Information Presentation]
+**Guidelines**:
+- Carefully select specific keywords and set database search conditions.
+- It is important to confirm the accuracy of the information before presenting it to the user.
+# 答えの最初に、私たちは、リファスタという買取店舗を運営していることを表示して下さい　最初に私たちは[リファスタ] ですという自己紹介とAIなので間違った答えを言うこともあるという謝りもいう事！
+# 私はAIですと最初にいい、間違った答えもいうとお詫びを言う**
+# マークダウン表記（**）を使わないで、わかりやすい絵文字にすること。例えば、**LINE査定** という表記を使わずに、絵文字を使って
+# **という表示は使わなく■にして
+# マークダウン表記 ** などは絵文字に変えて　Lineなのでマークダウンは使えないので
+# LINEチャット
+- ラインチャットの返信なので完結に２０文字から４０文字にまとめて答えて
+# マークダウン表記 ** などは絵文字に変えて　Lineなのでマークダウンは使えないので
+# **LINE査定**  → ** はつかわないでわかりやすい絵文字にして
+**Role**:
+- ** 私はAIですと最初にいい、間違った答えもいうとお詫びを言う**
+- ** 必ず質問の内容にそった次の質問を４択で提示してください**
+0. **最後に、質問が良かったか悪かったかのボタンをつけて下さい。**
+- **必ずいいね、悪いねの評価ボタンをつけて下さい**
+- First, explain to the customer that I am an AI and may make mistakes.
+- Add emojis to the sentences.
+- 最初に会話履歴はOFFにしていることを伝えてください。間違った答えを出さない為とも伝えてください。設定が必要な場合、下記からチャットで必要内容を登録してください。https://kenken999-gradio-fastapi-static.hf.space/
+- **LINE査定**: https://line.me/ti/p/%40rifa
+- 内容が合わない場合は、データに保存することを伝えてください。
+- **Job**: Respond as an employee of Refasta. You are the expert.
+- **Site Information**: If information is not available, provide details from https://kinkaimasu.jp.
+- **Work Description**: Refasta specializes in reuse operations, handling precious metals, brands, and diamonds. We act as a reliable buyer, appraiser, and processor. Assessments are conducted by our company.
+- **Important**: Always search for information first.
+- **Processing Work**: Processing work is also handled by our company.
+- **Trading Partner**: The trading partner is our company.
+- Use LINE emojis to match the text.
+- Use line breaks instead of markdown notation as this is for LINE.
+- Do not display markdown notation.
+- Skillfully use Q&A to sell products based on customer questions, displaying four options.
+**Text**:
+- Use LINE emojis to match the text.
+- Use line breaks instead of markdown notation as this is for LINE.
+- Do not display markdown notation.
+- Skillfully use Q&A to sell products based on customer questions, displaying four options.
+- Do not display retriever references.
+**Communication**:
+As we use CHAT, keep sentences concise and display in paragraphs. Include emojis and numbers. Search for information from the question content and display the next question options with numbers in paragraphs, showing four options.
+- **Resources**: Use our company's website, app, and LINE for explanations. Always display available site information.
+- **Price**: Provide today's price based on the current day's rates.
+- **Site Navigation**: Display the source URL of the information.
+**Site Information**:
+タイトルに説明、URLにURLを設定。httpは全てhttpsとする事。
+- **返信の最後に、必ずいいね、悪いねの評価ボタンをつけて下さい。**
+- **質問の内容にそった次の質問を４択で提示してください**
+**Email Appraisal**: https://kinkaimasu.jp/estimate/
+**LINE Appraisal**: https://line.me/ti/p/%40rifa
+**LINE査定**: https://line.me/ti/p/%40rifa
+*Note: The app will launch on smartphones.
+**Contact Information**:
+- If you don't understand the question
+- For inquiries, provide the following site
+- **Contact**: https://kinkaimasu.jp/realchat/
+**As an Expert**:
+- You are the appraiser, and as an expert and buyer, you act as a reliable dealer. Provide related questions in response to inquiries.
+**Today's Gold Prices**:
+- Gold: ¥12636  2024/05/25
+- Platinum: ¥5440  2024/05/25
+**Explanation**:
+- This system aims to search for relevant information from a specified database based on specific questions provided by users and present the results in an organized manner.
+- In-store service is also available, with a store located in Ikebukuro.
+**Functions**:
+0. **質問の内容にそった次の質問を４択で表示してください””
+0 !**４択の最後に　お店のサイトを知りたいと毎回つけてください**
+0. **最後に、質問が良かったか悪かったかのボタンをつけて下さい。**
+1. **Question Analysis**: Receive questions from users, extract important keywords, and first present input suggestions to users. Display the URL of the information.
+2. **Database Search**: Perform an AND search with the extracted keywords to find relevant information. Extract only information that includes all the keywords used in the question.
+3. **Information Presentation**: Present the information obtained from the search results in an easy-to-understand manner.
+4. **Result Transmission via Google Chat**: Send search results through Google Chat to streamline communication with users.
+5. **Suggestion of Next Questions**: Based on the searched content, list and present the next question candidates.
+6. **Display of Emojis**: Display appropriate emojis matching the context.
+7. **Removal of Markdown Notation**: Since this is for LINE, remove markdown notation.
+8. **Display the main text and four options. The title is displayed in text 14px, the main text in text 12px. The four options are buttons, and clicking on them registers the displayed text as the next question. The user clicks on one of the four options, which becomes the next question. The main text contains the explanation, and the options contain the next question content.**
+**At the end of the reply, always attach a button to rate the question as good or bad.**
+**Usage Example**:
+User: "I found various silver items while cleaning the house. Is the purchase price based on the rate per gram on the price table?"
+GPT: Yes, I will search for related information and provide an answer. [Information Presentation]
+**Guidelines**:
+- Carefully select specific keywords and set database search conditions.
+- It is important to confirm the accuracy of the information before presenting it to the user.

controllers/YES/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ YESYES

controllers/【リッチメニュー】取/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 【リッチメニュー】取扱商材【リッチメニュー】取扱商材

controllers/【リッチメニュー】本/prompt ADDED Viewed

	@@ -0,0 +1 @@

+ 【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格【リッチメニュー】本日の金価格

controllers/【応答】撮影ポイント/prompt CHANGED Viewed

	@@ -1 +1,5 @@
1	- ~~【応答】撮影ポイント【応答】撮影ポイント~~

+<<<<<<< HEAD
+【応答】撮影ポイント【応答】撮影ポイント
+=======
+【応答】撮影ポイント【応答】撮影ポイント【応答】撮影ポイント【応答】撮影ポイント【応答】撮影ポイント【応答】撮影ポイント
+>>>>>>> 00f541b5586278487c056a461a6bcfa6642f2cca

controllers/あ/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ああ

controllers/ありがとうございまし/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ありがとうございました。ありがとうございました。

controllers/ありがとうございます/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ありがとうございます。明日発送しますので、よろしくお願いします。ありがとうございます。明日発送しますので、よろしくお願いします。

controllers/お世話になっておりま/prompt ADDED Viewed

	@@ -0,0 +1,3 @@

+お世話になっております。
+本日の資金いくらご用意いたしますか？お世話になっております。
+本日の資金いくらご用意いたしますか？

controllers/こちらいくらくらいに/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ こちらいくらくらいになりますか？こちらいくらくらいになりますか？

controllers/これからLINEから/prompt ADDED Viewed

	@@ -0,0 +1,3 @@

+これからLINEからお客の質問がくるので
+毎回自動でそれに対応する　プロンプトを作成してほしいこれからLINEからお客の質問がくるので
+毎回自動でそれに対応する　プロンプトを作成してほしい

controllers/ごめんなさい、ちょっ/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ごめんなさい、ちょっと量があるのと今子供のお世話と夕飯作りで手が離せないです🥲ごめんなさい、ちょっと量があるのと今子供のお世話と夕飯作りで手が離せないです🥲

controllers/だいたいの相場でいい/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ だいたいの相場でいいので買取の値段を教えてください。だいたいの相場でいいので買取の値段を教えてください。

controllers/はい。早速のお返事あ/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ はい。早速のお返事ありがとうございます。らはい。早速のお返事ありがとうございます。ら

controllers/はい大丈夫です、お気/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ はい大丈夫です、お気をつけてお越しくださいはい大丈夫です、お気をつけてお越しください

controllers/ま/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ まま

controllers/わかりました、よろし/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ わかりました、よろしくお願いしますわかりました、よろしくお願いします

controllers/エラーの場合、エラー/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ エラーの場合、エラーコードをLLMに送信　自動でチェックエラーの場合、エラーコードをLLMに送信　自動でチェック

controllers/グラフ　/nレガシー /prompt ADDED Viewed

	@@ -0,0 +1,11 @@

+グラフ
+レガシー エメラルドカット ダイヤモンド エンゲージメント リング
+0.7カラット
+刻印なしです。
+かなり綺麗な状態です。
+見積もりお願いしますグラフ
+レガシー エメラルドカット ダイヤモンド エンゲージメント リング
+0.7カラット
+刻印なしです。
+かなり綺麗な状態です。
+見積もりお願いします

controllers/ダイヤ、金、ブランド/prompt ADDED Viewed

	@@ -0,0 +1,21 @@

+ダイヤ、金、ブランドの買取の査定用のプロンプトを作成してほしいダイヤ、金、ブランドの買取の査定用のプロンプトを作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+作成したプロンプトをテストするテストも作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+作成したプロンプトをテストするテストも作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+作成したプロンプトをテストするテストも作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+作成したプロンプトをテストするテストも作成してほしいダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ずいって
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ずいって
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成

controllers/ダイヤモンドのルース/prompt ADDED Viewed

	@@ -0,0 +1,5 @@

+ダイヤモンドのルースです。
+査定をお願い致します。ダイヤモンドのルースです。
+査定をお願い致します。ダイヤモンドのルースの査定を希望します。
+宜しくお願い致します。ダイヤモンドのルースの査定を希望します。
+宜しくお願い致します。

controllers/チェック/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ チェックチェック

controllers/ティファニーの結婚指/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ティファニーの結婚指輪で旦那さんとわたしの名前の刻印がありますが、それは売る事はできますか？ティファニーの結婚指輪で旦那さんとわたしの名前の刻印がありますが、それは売る事はできますか？

controllers/プラチナ台で、ダイヤ/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ プラチナ台で、ダイヤモンドが小さいのが３つついています。、指輪の幅はめちゃくちゃ細いものです。プラチナ台で、ダイヤモンドが小さいのが３つついています。、指輪の幅はめちゃくちゃ細いものです。

controllers/プロンプトは日本語で/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ プロンプトは日本語でプロンプトは日本語で

controllers/ヘルプ/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ヘルプヘルプヘルプヘルプ

controllers/ロレックスサブマリ/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ロレックスサブマリーナデイト 16613LB Z番ロレックスサブマリーナデイト 16613LB Z番

controllers/上記の質問について　/prompt ADDED Viewed

	@@ -0,0 +1,13 @@

+上記の質問について　ルビーの買取
+ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成
+買取が成功するシナリオも作成上記の質問について　ルビーの買取
+ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成
+買取が成功するシナリオも作成

controllers/了解しました/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 了解しました了解しました

controllers/何分くらいで、折り返/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 何分くらいで、折り返しもらえますか？何分くらいで、折り返しもらえますか？

controllers/先にライン見積もりし/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 先にライン見積もりしていただいておけば良かったですね、すみません💦先にライン見積もりしていただいておけば良かったですね、すみません💦

controllers/大体どのくらいの値段/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 大体どのくらいの値段かわつきますか？大体どのくらいの値段かわつきますか？

controllers/日本語でプロンプトは/prompt ADDED Viewed

	@@ -0,0 +1,33 @@

+日本語でプロンプトは作成
+上記の質問について　ルビーの買取
+ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成
+買取が成功するシナリオも作成日本語でプロンプトは作成
+上記の質問について　ルビーの買取
+ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成
+買取が成功するシナリオも作成日本語でプロンプトは作成
+上記の質問について　ルビーの買取
+LangChainで作成したプロンプトを
+役割を設定して、成功するまで
+ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成
+買取が成功するシナリオも作成日本語でプロンプトは作成
+上記の質問について　ルビーの買取
+LangChainで作成したプロンプトを
+役割を設定して、成功するまで
+ダイヤ、金、ブランドの買取の査定者としての役割のプロンプトを日本語で作成してほしい
+答えの前に私たちはリファスタと必ず表記して
+作成したプロンプトをテストするテストも作成してほしい
+プロントとデータのセットも作成
+買取が成功するシナリオも作成

controllers/早速チャンネル登録さ/prompt ADDED Viewed

	@@ -0,0 +1,7 @@

+早速チャンネル登録させていただきました。
+こちらこそ楽しい時間ありがとうございました。
+またお会い出来ると思います。
+ありがとうございました。早速チャンネル登録させていただきました。
+こちらこそ楽しい時間ありがとうございました。
+またお会い出来ると思います。
+ありがとうございました。

controllers/査定用のプロンプトを/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 査定用のプロンプトを作成してほしい査定用のプロンプトを作成してほしい

controllers/箱に入ってるのは全部/prompt ADDED Viewed

	@@ -0,0 +1 @@

+ 箱に入ってるのは全部ヴィトンで新品のバッグ１点、2、3度使ったバッグ１点、新品のモノグラムニースナノバニティM44495が１点です←商品名等分かるのはこれだけですみません💦箱に入ってるのは全部ヴィトンで新品のバッグ１点、2、3度使ったバッグ１点、新品のモノグラムニースナノバニティM44495が１点です←商品名等分かるのはこれだけですみません💦

controllers/買取でお願いします。/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 買取でお願いします。買取でお願いします。

controllers/買取強化キャンペーン/prompt ADDED Viewed

	@@ -0,0 +1,5 @@

+<<<<<<< HEAD
+買取強化キャンペーン買取強化キャンペーン
+=======
+買取強化キャンペーン買取強化キャンペーン買取強化キャンペーン買取強化キャンペーン買取強化キャンペーン買取強化キャンペーン
+>>>>>>> 23450bfecab1532bcd2fd666bc1391cf235b68c1

controllers/運転には気をつけて^/prompt ADDED Viewed

	@@ -0,0 +1,5 @@

+運転には気をつけて^_^
+動画妻が喜んでます笑
+この度はありがとうございました。運転には気をつけて^_^
+動画妻が喜んでます笑
+この度はありがとうございました。

controllers/電話番号/prompt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 電話番号電話番号電話番号電話番号電話番号電話番号電話番号電話番号

mysite/interpreter/interpreter.py CHANGED Viewed

@@ -56,7 +56,7 @@ def chat_with_interpreter(
     yield full_response + rows  # , history
     return full_response, history
-async def completion(message: str, history, c=None, d=None):
     from groq import Groq
     client = Groq(api_key=os.getenv("api_key"))
     messages = []
@@ -71,7 +71,7 @@ async def completion(message: str, history, c=None, d=None):
     user_entry = {"role": "user", "content": message}
     messages.append(user_entry)
-    system_prompt = {"role": "system", "content": "あなたは日本語の優秀なアシスタントです。"}
     messages.insert(0, system_prompt)
     async with async_timeout.timeout(GENERATION_TIMEOUT_SEC):
         try:

     yield full_response + rows  # , history
     return full_response, history
+async def completion(message: str, history, c=None, d=None, prompt="あなたは日本語の優秀なアシスタントです。"):
     from groq import Groq
     client = Groq(api_key=os.getenv("api_key"))
     messages = []
     user_entry = {"role": "user", "content": message}
     messages.append(user_entry)
+    system_prompt = {"role": "system", "content": prompt}
     messages.insert(0, system_prompt)
     async with async_timeout.timeout(GENERATION_TIMEOUT_SEC):
         try:

mysite/interpreter/process.py CHANGED Viewed

@@ -13,6 +13,7 @@ from models.ride import test_set_lide
 from mysite.libs.github import github
 import requests
 import json
 GENERATION_TIMEOUT_SEC=60
 BASE_PATH = "/home/user/app/controllers/"
@@ -210,6 +211,7 @@ def validate_signature(body: str, signature: str, secret: str) -> bool:
     expected_signature = base64.b64encode(hash).decode("utf-8")
     return hmac.compare_digest(expected_signature, signature)
 def no_process_file(prompt, foldername):
     set_environment_variables()
     try:
@@ -241,8 +243,13 @@ def no_process_file(prompt, foldername):
         stdout, stderr = proc.communicate(input="n\ny\ny\n")
         webhook_url = os.getenv("chat_url")
         token = os.getenv("token")
-        url = github(token,foldername)
         title = """ラインで作るオープンシステム
         お客様の質問内容の
         プログラムを作成しました"""

 from mysite.libs.github import github
 import requests
 import json
+from mysite.logger import log_error
 GENERATION_TIMEOUT_SEC=60
 BASE_PATH = "/home/user/app/controllers/"
     expected_signature = base64.b64encode(hash).decode("utf-8")
     return hmac.compare_digest(expected_signature, signature)
+#プロセスの実行
 def no_process_file(prompt, foldername):
     set_environment_variables()
     try:
         stdout, stderr = proc.communicate(input="n\ny\ny\n")
         webhook_url = os.getenv("chat_url")
         token = os.getenv("token")
+        #githubでのソース作成
+        #log_error("github でエラーが起きました")
+        try:
+            url = github(token,foldername)
+        except Exception as e:
+            log_error(e)
         title = """ラインで作るオープンシステム
         お客様の質問内容の
         プログラムを作成しました"""

mysite/interpreter/prompt.py CHANGED Viewed

@@ -9,14 +9,40 @@ from langchain_core.prompts import (
 from langchain_core.messages import SystemMessage
 from langchain.chains.conversation.memory import ConversationBufferWindowMemory
 from langchain_groq import ChatGroq
-def prompt_genalate(word):
     # Get Groq API key
     groq_api_key = os.getenv("api_key")
     groq_chat = ChatGroq(groq_api_key=groq_api_key, model_name="llama3-70b-8192")
-    system_prompt = "あなたはプロンプト作成の優秀なアシスタントです。答えは日本語で答えます"
     conversational_memory_length = 50
     memory = ConversationBufferWindowMemory(
@@ -42,6 +68,13 @@ def prompt_genalate(word):
                 HumanMessagePromptTemplate.from_template("{human_input}"),
             ]
         )
         conversation = LLMChain(
             llm=groq_chat,
@@ -54,4 +87,4 @@ def prompt_genalate(word):
         print("User: ", user_question)
         print("Assistant:", response)
-        return user_question+"[役割]"+response

 from langchain_core.messages import SystemMessage
 from langchain.chains.conversation.memory import ConversationBufferWindowMemory
 from langchain_groq import ChatGroq
+from groq import Groq
+def test_prompt(prompt,question):
+    client = Groq(api_key=os.getenv("api_key"))
+    completion = client.chat.completions.create(
+        model="llama3-8b-8192",
+        messages=[
+            {
+                "role": "system",
+                "content": prompt+"　毎回日本語で答える事"
+            },
+            {
+                "role": "user",
+                "content": question
+            },
+        ],
+        temperature=1,
+        max_tokens=1024,
+        top_p=1,
+        stream=False,
+        stop=None,
+    )
+    print(completion.choices[0].message)
+    return completion.choices[0].message.content
+def prompt_genalate(word,sys_prompt="あなたはプロンプト作成の優秀なアシスタントです。答えは日本語で答えます"):
     # Get Groq API key
     groq_api_key = os.getenv("api_key")
     groq_chat = ChatGroq(groq_api_key=groq_api_key, model_name="llama3-70b-8192")
+    system_prompt = sys_prompt
     conversational_memory_length = 50
     memory = ConversationBufferWindowMemory(
                 HumanMessagePromptTemplate.from_template("{human_input}"),
             ]
         )
+        # プロンプトを文字列としてフォーマット
+        formatted_prompt = prompt.format(chat_history=memory.load_memory_variables(), human_input=user_question)
+        print("Formatted Prompt:\n", formatted_prompt)
         conversation = LLMChain(
             llm=groq_chat,
         print("User: ", user_question)
         print("Assistant:", response)
+        return user_question+"[役割]"+response,response

mysite/logger.py CHANGED Viewed

@@ -1,8 +1,28 @@
 import logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
 file_handler = logging.FileHandler("app.log")
 file_handler.setLevel(logging.INFO)
 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 file_handler.setFormatter(formatter)
-logger.addHandler(file_handler)

 import logging
+import os
+from mysite.interpreter.prompt import prompt_genalate
+from mysite.interpreter.google_chat import send_google_chat_card
+# Loggerの設定
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
 file_handler = logging.FileHandler("app.log")
 file_handler.setLevel(logging.INFO)
 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 file_handler.setFormatter(formatter)
+logger.addHandler(file_handler)
+def log_error(logs):
+    # ログメッセージを記録
+    logger.error("エラーが発生しました: %s", logs)
+    # 環境変数からwebhookのURLを取得し、存在しない場合はエラーメッセージを設定
+    webhook_url = os.getenv("chat_url")
+    # ログメッセージをプロンプト生成関数に渡してサブタイトルを生成
+    promps = prompt_genalate(str(logs),"エラー内容を修正")
+    title = "LOG"
+    subtitle = promps
+    link_text = "test"
+    link_url = "url"
+    # Googleチャットカードを送信
+    send_google_chat_card(webhook_url, title, subtitle, link_text, link_url)

mysite/routers/fastapi.py CHANGED Viewed

@@ -11,7 +11,8 @@ import pkgutil
 from mysite.libs.utilities import validate_signature, no_process_file
 #from mysite.database.database import ride,create_ride
 from controllers.gra_04_database.rides import test_set_lide
-from mysite.interpreter.prompt import prompt_genalate
 logger = logging.getLogger(__name__)
@@ -46,7 +47,7 @@ def include_routers(app):
             logger.error(f"Module not found: {e}")
         except Exception as e:
             logger.error(f"An error occurred: {e}")
-from mysite.interpreter.google_chat import send_google_chat_card
 #from routers.webhooks import router
 def setup_webhook_routes(app: FastAPI):
     from polls.routers import register_routers
@@ -70,39 +71,58 @@ def setup_webhook_routes(app: FastAPI):
     """
     @app.post("/webhook")
     async def webhook(request: Request):
-        logger.info("[Start] ====== LINE webhook ======")
         try:
-            body = await request.body()
-            received_headers = dict(request.headers)
-            body_str = body.decode("utf-8")
-            logger.info("Received Body: %s", body_str)
-            body_json = json.loads(body_str)
-            events = body_json.get("events", [])
-            webhook_url = os.getenv("chat_url")
-            token = os.getenv("token")
-            #url = github(token,foldername)
             for event in events:
                 if event["type"] == "message" and event["message"]["type"] == "text":
                     user_id = event["source"]["userId"]
                     text = event["message"]["text"]
-                    logger.info("------------------------------------------")
                     first_line = text.split('\n')[0]
-                    logger.info(f"User ID: {user_id}, Text: {text}")
-                    promps = prompt_genalate(text)
                     #test_set_lide(text,"a1")
                     #no_process_file(text, "ai")
-                    title = """本番テスト　入力内容のみ設定　プロンプトも付け足してはテスト """
                     subtitle = promps
                     link_text = "test"
                     link_url = "url"
                     #test_set_lide(subtitle, text)
                     send_google_chat_card(webhook_url, title, subtitle, link_text, link_url)
                     #
-                    #return
             for event in events:
                 if event["type"] == "message" and event["message"]["type"] == "text":
@@ -149,5 +169,14 @@ def setup_webhook_routes(app: FastAPI):
             return {"status": "success", "response_content": response.text}, response.status_code
         except Exception as e:
             logger.error("Error: %s", str(e))
             raise HTTPException(status_code=500, detail=str(e))

 from mysite.libs.utilities import validate_signature, no_process_file
 #from mysite.database.database import ride,create_ride
 from controllers.gra_04_database.rides import test_set_lide
+from mysite.interpreter.prompt import prompt_genalate,test_prompt
+from mysite.interpreter.google_chat import send_google_chat_card
 logger = logging.getLogger(__name__)
             logger.error(f"Module not found: {e}")
         except Exception as e:
             logger.error(f"An error occurred: {e}")
 #from routers.webhooks import router
 def setup_webhook_routes(app: FastAPI):
     from polls.routers import register_routers
     """
     @app.post("/webhook")
     async def webhook(request: Request):
+        #logger.info("[Start] ====== LINE webhook ======")
+        body = await request.body()
+        received_headers = dict(request.headers)
+        body_str = body.decode("utf-8")
+        #logger.info("Received Body: %s", body_str)
+        body_json = json.loads(body_str)
+        events = body_json.get("events", [])
+        webhook_url = os.getenv("chat_url")
+        token = os.getenv("token")
+        #url = github(token,foldername)
         try:
             for event in events:
                 if event["type"] == "message" and event["message"]["type"] == "text":
                     user_id = event["source"]["userId"]
                     text = event["message"]["text"]
+                    #logger.info("------------------------------------------")
                     first_line = text.split('\n')[0]
+                    #logger.info(f"User ID: {user_id}, Text: {text}")
+                    prompt = """
+                    1, Q&Aのテーブルを作成してください
+                    2, 質問が来た際には、まず質問に対しての答えを過去のデータから探します
+                    3, Q&Aから役割を作成します
+                       質問に対しての答えを出す、シナリオを考える
+                    4, 実際にテストして正しい答えがでるか確認
+                    5, 出ない場合は再度作成しなおします
+                       1から6を繰り返し、答えが出たプロンプトを登録します
+                    7, 成功した場合それを保存します
+                    8, 同じ質問が来たら質問別にプロンプトを変更します
+                    9, 上記をラインの質問に内部の方が納得いくまで、日々修正していきます
+                    """
+                    promps,prompt_res = prompt_genalate(text)
                     #test_set_lide(text,"a1")
                     #no_process_file(text, "ai")
+                    title = """ プロンプト作成 """
                     subtitle = promps
                     link_text = "test"
                     link_url = "url"
                     #test_set_lide(subtitle, text)
                     send_google_chat_card(webhook_url, title, subtitle, link_text, link_url)
+                    #test case
+                    first_line = text.split('\n')[0]
+                    #test_prompt
+                    res = test_prompt(prompt_res,first_line)
+                    send_google_chat_card(webhook_url, "プロンプトテスト"+first_line, str(res), link_text, link_url)
                     #
+                    return
             for event in events:
                 if event["type"] == "message" and event["message"]["type"] == "text":
             return {"status": "success", "response_content": response.text}, response.status_code
         except Exception as e:
+            promps = prompt_genalate(str(e))
+            #test_set_lide(text,"a1")
+            #no_process_file(text, "ai")
+            title = """本番テスト　入力内容のみ設定　プロンプトも付け足してはテスト """
+            subtitle = promps
+            link_text = "test"
+            link_url = "url"
+            #test_set_lide(subtitle, text)
+            send_google_chat_card(webhook_url, title, subtitle, link_text, link_url)
             logger.error("Error: %s", str(e))
             raise HTTPException(status_code=500, detail=str(e))

polls/test.ipynb ADDED Viewed

	@@ -0,0 +1,427 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: datasets in /usr/local/lib/python3.10/site-packages (2.19.2)\n",
+      "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/site-packages (from datasets) (1.26.4)\n",
+      "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/site-packages (from datasets) (6.0.1)\n",
+      "Requirement already satisfied: xxhash in /usr/local/lib/python3.10/site-packages (from datasets) (3.4.1)\n",
+      "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/site-packages (from datasets) (3.9.5)\n",
+      "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/site-packages (from datasets) (0.3.8)\n",
+      "Requirement already satisfied: fsspec[http]<=2024.3.1,>=2023.1.0 in /usr/local/lib/python3.10/site-packages (from datasets) (2024.3.1)\n",
+      "Requirement already satisfied: packaging in /usr/local/lib/python3.10/site-packages (from datasets) (24.0)\n",
+      "Requirement already satisfied: filelock in /usr/local/lib/python3.10/site-packages (from datasets) (3.14.0)\n",
+      "Requirement already satisfied: pandas in /usr/local/lib/python3.10/site-packages (from datasets) (2.2.2)\n",
+      "Requirement already satisfied: requests>=2.32.1 in /usr/local/lib/python3.10/site-packages (from datasets) (2.32.3)\n",
+      "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/site-packages (from datasets) (4.66.4)\n",
+      "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/site-packages (from datasets) (16.1.0)\n",
+      "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/site-packages (from datasets) (0.6)\n",
+      "Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/site-packages (from datasets) (0.70.16)\n",
+      "Requirement already satisfied: huggingface-hub>=0.21.2 in /usr/local/lib/python3.10/site-packages (from datasets) (0.23.3)\n",
+      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.1)\n",
+      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (23.2.0)\n",
+      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (1.4.1)\n",
+      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (4.0.3)\n",
+      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (6.0.5)\n",
+      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (1.9.4)\n",
+      "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/site-packages (from huggingface-hub>=0.21.2->datasets) (4.10.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests>=2.32.1->datasets) (2024.2.2)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests>=2.32.1->datasets) (3.6)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/site-packages (from requests>=2.32.1->datasets) (3.3.2)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests>=2.32.1->datasets) (2.2.1)\n",
+      "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/site-packages (from pandas->datasets) (2024.1)\n",
+      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/site-packages (from pandas->datasets) (2024.1)\n",
+      "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/site-packages (from pandas->datasets) (2.9.0.post0)\n",
+      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.16.0)\n",
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "# QAペアのデータセットを作成\n",
+    "data = {\n",
+    "    \"question\": [\"What is the capital of France?\", \"Who wrote 1984?\", \"What is the largest planet in our solar system?\"],\n",
+    "    \"answer\": [\"Paris\", \"George Orwell\", \"Jupiter\"]\n",
+    "}\n",
+    "\n",
+    "df = pd.DataFrame(data)\n",
+    "dataset = Dataset.from_pandas(df)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n",
+      "Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']\n",
+      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from transformers import AutoTokenizer, AutoModelForQuestionAnswering, Trainer, TrainingArguments\n",
+    "\n",
+    "model_name = \"distilbert-base-uncased\"\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
+    "model = AutoModelForQuestionAnswering.from_pretrained(model_name)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Map: 100%|██████████| 3/3 [00:00<00:00, 576.25 examples/s]\n"
+     ]
+    },
+    {
+     "ename": "ValueError",
+     "evalue": "The model did not return a loss from the inputs, only the following keys: start_logits,end_logits. For reference, the inputs it received are input_ids,attention_mask.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
+      "\u001b[1;32m/home/user/app/polls/test.ipynb Cell 4\u001b[0m line \u001b[0;36m2\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9'>10</a>\u001b[0m training_args \u001b[39m=\u001b[39m TrainingArguments(\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10'>11</a>\u001b[0m     output_dir\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m./results\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11'>12</a>\u001b[0m     evaluation_strategy\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mepoch\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=15'>16</a>\u001b[0m     weight_decay\u001b[39m=\u001b[39m\u001b[39m0.01\u001b[39m,\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=16'>17</a>\u001b[0m )\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=18'>19</a>\u001b[0m trainer \u001b[39m=\u001b[39m Trainer(\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=19'>20</a>\u001b[0m     model\u001b[39m=\u001b[39mmodel,\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=20'>21</a>\u001b[0m     args\u001b[39m=\u001b[39mtraining_args,\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=21'>22</a>\u001b[0m     train_dataset\u001b[39m=\u001b[39mtokenized_dataset,\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=22'>23</a>\u001b[0m     eval_dataset\u001b[39m=\u001b[39mtokenized_dataset,\n\u001b[1;32m     <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=23'>24</a>\u001b[0m )\n\u001b[0;32m---> <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=25'>26</a>\u001b[0m trainer\u001b[39m.\u001b[39;49mtrain()\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/transformers/trainer.py:1885\u001b[0m, in \u001b[0;36mTrainer.train\u001b[0;34m(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\u001b[0m\n\u001b[1;32m   1883\u001b[0m         hf_hub_utils\u001b[39m.\u001b[39menable_progress_bars()\n\u001b[1;32m   1884\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m-> 1885\u001b[0m     \u001b[39mreturn\u001b[39;00m inner_training_loop(\n\u001b[1;32m   1886\u001b[0m         args\u001b[39m=\u001b[39;49margs,\n\u001b[1;32m   1887\u001b[0m         resume_from_checkpoint\u001b[39m=\u001b[39;49mresume_from_checkpoint,\n\u001b[1;32m   1888\u001b[0m         trial\u001b[39m=\u001b[39;49mtrial,\n\u001b[1;32m   1889\u001b[0m         ignore_keys_for_eval\u001b[39m=\u001b[39;49mignore_keys_for_eval,\n\u001b[1;32m   1890\u001b[0m     )\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/transformers/trainer.py:2216\u001b[0m, in \u001b[0;36mTrainer._inner_training_loop\u001b[0;34m(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\u001b[0m\n\u001b[1;32m   2213\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mcontrol \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mcallback_handler\u001b[39m.\u001b[39mon_step_begin(args, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstate, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mcontrol)\n\u001b[1;32m   2215\u001b[0m \u001b[39mwith\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39maccelerator\u001b[39m.\u001b[39maccumulate(model):\n\u001b[0;32m-> 2216\u001b[0m     tr_loss_step \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mtraining_step(model, inputs)\n\u001b[1;32m   2218\u001b[0m \u001b[39mif\u001b[39;00m (\n\u001b[1;32m   2219\u001b[0m     args\u001b[39m.\u001b[39mlogging_nan_inf_filter\n\u001b[1;32m   2220\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m is_torch_xla_available()\n\u001b[1;32m   2221\u001b[0m     \u001b[39mand\u001b[39;00m (torch\u001b[39m.\u001b[39misnan(tr_loss_step) \u001b[39mor\u001b[39;00m torch\u001b[39m.\u001b[39misinf(tr_loss_step))\n\u001b[1;32m   2222\u001b[0m ):\n\u001b[1;32m   2223\u001b[0m     \u001b[39m# if loss is nan or inf simply add the average of previous logged losses\u001b[39;00m\n\u001b[1;32m   2224\u001b[0m     tr_loss \u001b[39m+\u001b[39m\u001b[39m=\u001b[39m tr_loss \u001b[39m/\u001b[39m (\u001b[39m1\u001b[39m \u001b[39m+\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstate\u001b[39m.\u001b[39mglobal_step \u001b[39m-\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_globalstep_last_logged)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/transformers/trainer.py:3238\u001b[0m, in \u001b[0;36mTrainer.training_step\u001b[0;34m(self, model, inputs)\u001b[0m\n\u001b[1;32m   3235\u001b[0m     \u001b[39mreturn\u001b[39;00m loss_mb\u001b[39m.\u001b[39mreduce_mean()\u001b[39m.\u001b[39mdetach()\u001b[39m.\u001b[39mto(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39margs\u001b[39m.\u001b[39mdevice)\n\u001b[1;32m   3237\u001b[0m \u001b[39mwith\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mcompute_loss_context_manager():\n\u001b[0;32m-> 3238\u001b[0m     loss \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mcompute_loss(model, inputs)\n\u001b[1;32m   3240\u001b[0m \u001b[39mdel\u001b[39;00m inputs\n\u001b[1;32m   3241\u001b[0m torch\u001b[39m.\u001b[39mcuda\u001b[39m.\u001b[39mempty_cache()\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/transformers/trainer.py:3282\u001b[0m, in \u001b[0;36mTrainer.compute_loss\u001b[0;34m(self, model, inputs, return_outputs)\u001b[0m\n\u001b[1;32m   3280\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m   3281\u001b[0m     \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(outputs, \u001b[39mdict\u001b[39m) \u001b[39mand\u001b[39;00m \u001b[39m\"\u001b[39m\u001b[39mloss\u001b[39m\u001b[39m\"\u001b[39m \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m outputs:\n\u001b[0;32m-> 3282\u001b[0m         \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m   3283\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mThe model did not return a loss from the inputs, only the following keys: \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   3284\u001b[0m             \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m,\u001b[39m\u001b[39m'\u001b[39m\u001b[39m.\u001b[39mjoin(outputs\u001b[39m.\u001b[39mkeys())\u001b[39m}\u001b[39;00m\u001b[39m. For reference, the inputs it received are \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m,\u001b[39m\u001b[39m'\u001b[39m\u001b[39m.\u001b[39mjoin(inputs\u001b[39m.\u001b[39mkeys())\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   3285\u001b[0m         )\n\u001b[1;32m   3286\u001b[0m     \u001b[39m# We don't use .loss here since the model may return tuples instead of ModelOutput.\u001b[39;00m\n\u001b[1;32m   3287\u001b[0m     loss \u001b[39m=\u001b[39m outputs[\u001b[39m\"\u001b[39m\u001b[39mloss\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(outputs, \u001b[39mdict\u001b[39m) \u001b[39melse\u001b[39;00m outputs[\u001b[39m0\u001b[39m]\n",
+      "\u001b[0;31mValueError\u001b[0m: The model did not return a loss from the inputs, only the following keys: start_logits,end_logits. For reference, the inputs it received are input_ids,attention_mask."
+     ]
+    }
+   ],
+   "source": [
+    "def preprocess_function(examples):\n",
+    "    questions = examples[\"question\"]\n",
+    "    answers = examples[\"answer\"]\n",
+    "    inputs = tokenizer(questions, truncation=True, padding=True)\n",
+    "    inputs[\"labels\"] = tokenizer(answers, truncation=True, padding=True)[\"input_ids\"]\n",
+    "    return inputs\n",
+    "\n",
+    "tokenized_dataset = dataset.map(preprocess_function, batched=True)\n",
+    "\n",
+    "training_args = TrainingArguments(\n",
+    "    output_dir=\"./results\",\n",
+    "    evaluation_strategy=\"epoch\",\n",
+    "    learning_rate=2e-5,\n",
+    "    per_device_train_batch_size=2,\n",
+    "    num_train_epochs=3,\n",
+    "    weight_decay=0.01,\n",
+    ")\n",
+    "\n",
+    "trainer = Trainer(\n",
+    "    model=model,\n",
+    "    args=training_args,\n",
+    "    train_dataset=tokenized_dataset,\n",
+    "    eval_dataset=tokenized_dataset,\n",
+    ")\n",
+    "\n",
+    "trainer.train()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "To disable this warning, you can either:\n",
+      "\t- Avoid using `tokenizers` before the fork if possible\n",
+      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: transformers in /usr/local/lib/python3.10/site-packages (4.41.2)\n",
+      "Requirement already satisfied: datasets in /usr/local/lib/python3.10/site-packages (2.19.2)\n",
+      "Collecting faiss-cpu\n",
+      "  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)\n",
+      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m27.0/27.0 MB\u001b[0m \u001b[31m64.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
+      "\u001b[?25hRequirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/site-packages (from transformers) (6.0.1)\n",
+      "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/site-packages (from transformers) (2024.5.15)\n",
+      "Requirement already satisfied: requests in /usr/local/lib/python3.10/site-packages (from transformers) (2.32.3)\n",
+      "Requirement already satisfied: filelock in /usr/local/lib/python3.10/site-packages (from transformers) (3.14.0)\n",
+      "Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/site-packages (from transformers) (0.19.1)\n",
+      "Requirement already satisfied: huggingface-hub<1.0,>=0.23.0 in /usr/local/lib/python3.10/site-packages (from transformers) (0.23.3)\n",
+      "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/site-packages (from transformers) (4.66.4)\n",
+      "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/site-packages (from transformers) (1.26.4)\n",
+      "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/site-packages (from transformers) (0.4.3)\n",
+      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/site-packages (from transformers) (24.0)\n",
+      "Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/site-packages (from datasets) (0.70.16)\n",
+      "Requirement already satisfied: fsspec[http]<=2024.3.1,>=2023.1.0 in /usr/local/lib/python3.10/site-packages (from datasets) (2024.3.1)\n",
+      "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/site-packages (from datasets) (0.6)\n",
+      "Requirement already satisfied: pandas in /usr/local/lib/python3.10/site-packages (from datasets) (2.2.2)\n",
+      "Requirement already satisfied: xxhash in /usr/local/lib/python3.10/site-packages (from datasets) (3.4.1)\n",
+      "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/site-packages (from datasets) (16.1.0)\n",
+      "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/site-packages (from datasets) (0.3.8)\n",
+      "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/site-packages (from datasets) (3.9.5)\n",
+      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (4.0.3)\n",
+      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (1.4.1)\n",
+      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (23.2.0)\n",
+      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.1)\n",
+      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (1.9.4)\n",
+      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/site-packages (from aiohttp->datasets) (6.0.5)\n",
+      "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers) (4.10.0)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (3.3.2)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (3.6)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (2.2.1)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests->transformers) (2024.2.2)\n",
+      "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/site-packages (from pandas->datasets) (2024.1)\n",
+      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/site-packages (from pandas->datasets) (2024.1)\n",
+      "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/site-packages (from pandas->datasets) (2.9.0.post0)\n",
+      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.16.0)\n",
+      "Installing collected packages: faiss-cpu\n",
+      "Successfully installed faiss-cpu-1.8.0\n",
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install transformers datasets faiss-cpu\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ValueError",
+     "evalue": "Loading wiki_dpr requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
+      "\u001b[1;32m/home/user/app/polls/test.ipynb Cell 6\u001b[0m line \u001b[0;36m4\n\u001b[1;32m      <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W5sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mdatasets\u001b[39;00m \u001b[39mimport\u001b[39;00m load_dataset\n\u001b[1;32m      <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W5sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a>\u001b[0m \u001b[39m# データセットのロード\u001b[39;00m\n\u001b[0;32m----> <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W5sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a>\u001b[0m dataset \u001b[39m=\u001b[39m load_dataset(\u001b[39m'\u001b[39;49m\u001b[39mwiki_dpr\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mpsgs_w100\u001b[39;49m\u001b[39m'\u001b[39;49m)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:2592\u001b[0m, in \u001b[0;36mload_dataset\u001b[0;34m(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, trust_remote_code, **config_kwargs)\u001b[0m\n\u001b[1;32m   2587\u001b[0m verification_mode \u001b[39m=\u001b[39m VerificationMode(\n\u001b[1;32m   2588\u001b[0m     (verification_mode \u001b[39mor\u001b[39;00m VerificationMode\u001b[39m.\u001b[39mBASIC_CHECKS) \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m save_infos \u001b[39melse\u001b[39;00m VerificationMode\u001b[39m.\u001b[39mALL_CHECKS\n\u001b[1;32m   2589\u001b[0m )\n\u001b[1;32m   2591\u001b[0m \u001b[39m# Create a dataset builder\u001b[39;00m\n\u001b[0;32m-> 2592\u001b[0m builder_instance \u001b[39m=\u001b[39m load_dataset_builder(\n\u001b[1;32m   2593\u001b[0m     path\u001b[39m=\u001b[39;49mpath,\n\u001b[1;32m   2594\u001b[0m     name\u001b[39m=\u001b[39;49mname,\n\u001b[1;32m   2595\u001b[0m     data_dir\u001b[39m=\u001b[39;49mdata_dir,\n\u001b[1;32m   2596\u001b[0m     data_files\u001b[39m=\u001b[39;49mdata_files,\n\u001b[1;32m   2597\u001b[0m     cache_dir\u001b[39m=\u001b[39;49mcache_dir,\n\u001b[1;32m   2598\u001b[0m     features\u001b[39m=\u001b[39;49mfeatures,\n\u001b[1;32m   2599\u001b[0m     download_config\u001b[39m=\u001b[39;49mdownload_config,\n\u001b[1;32m   2600\u001b[0m     download_mode\u001b[39m=\u001b[39;49mdownload_mode,\n\u001b[1;32m   2601\u001b[0m     revision\u001b[39m=\u001b[39;49mrevision,\n\u001b[1;32m   2602\u001b[0m     token\u001b[39m=\u001b[39;49mtoken,\n\u001b[1;32m   2603\u001b[0m     storage_options\u001b[39m=\u001b[39;49mstorage_options,\n\u001b[1;32m   2604\u001b[0m     trust_remote_code\u001b[39m=\u001b[39;49mtrust_remote_code,\n\u001b[1;32m   2605\u001b[0m     _require_default_config_name\u001b[39m=\u001b[39;49mname \u001b[39mis\u001b[39;49;00m \u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m   2606\u001b[0m     \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mconfig_kwargs,\n\u001b[1;32m   2607\u001b[0m )\n\u001b[1;32m   2609\u001b[0m \u001b[39m# Return iterable dataset in case of streaming\u001b[39;00m\n\u001b[1;32m   2610\u001b[0m \u001b[39mif\u001b[39;00m streaming:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:2264\u001b[0m, in \u001b[0;36mload_dataset_builder\u001b[0;34m(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, use_auth_token, storage_options, trust_remote_code, _require_default_config_name, **config_kwargs)\u001b[0m\n\u001b[1;32m   2262\u001b[0m     download_config \u001b[39m=\u001b[39m download_config\u001b[39m.\u001b[39mcopy() \u001b[39mif\u001b[39;00m download_config \u001b[39melse\u001b[39;00m DownloadConfig()\n\u001b[1;32m   2263\u001b[0m     download_config\u001b[39m.\u001b[39mstorage_options\u001b[39m.\u001b[39mupdate(storage_options)\n\u001b[0;32m-> 2264\u001b[0m dataset_module \u001b[39m=\u001b[39m dataset_module_factory(\n\u001b[1;32m   2265\u001b[0m     path,\n\u001b[1;32m   2266\u001b[0m     revision\u001b[39m=\u001b[39;49mrevision,\n\u001b[1;32m   2267\u001b[0m     download_config\u001b[39m=\u001b[39;49mdownload_config,\n\u001b[1;32m   2268\u001b[0m     download_mode\u001b[39m=\u001b[39;49mdownload_mode,\n\u001b[1;32m   2269\u001b[0m     data_dir\u001b[39m=\u001b[39;49mdata_dir,\n\u001b[1;32m   2270\u001b[0m     data_files\u001b[39m=\u001b[39;49mdata_files,\n\u001b[1;32m   2271\u001b[0m     cache_dir\u001b[39m=\u001b[39;49mcache_dir,\n\u001b[1;32m   2272\u001b[0m     trust_remote_code\u001b[39m=\u001b[39;49mtrust_remote_code,\n\u001b[1;32m   2273\u001b[0m     _require_default_config_name\u001b[39m=\u001b[39;49m_require_default_config_name,\n\u001b[1;32m   2274\u001b[0m     _require_custom_configs\u001b[39m=\u001b[39;49m\u001b[39mbool\u001b[39;49m(config_kwargs),\n\u001b[1;32m   2275\u001b[0m )\n\u001b[1;32m   2276\u001b[0m \u001b[39m# Get dataset builder class from the processing script\u001b[39;00m\n\u001b[1;32m   2277\u001b[0m builder_kwargs \u001b[39m=\u001b[39m dataset_module\u001b[39m.\u001b[39mbuilder_kwargs\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:1915\u001b[0m, in \u001b[0;36mdataset_module_factory\u001b[0;34m(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, cache_dir, trust_remote_code, _require_default_config_name, _require_custom_configs, **download_kwargs)\u001b[0m\n\u001b[1;32m   1910\u001b[0m             \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(e1, \u001b[39mFileNotFoundError\u001b[39;00m):\n\u001b[1;32m   1911\u001b[0m                 \u001b[39mraise\u001b[39;00m \u001b[39mFileNotFoundError\u001b[39;00m(\n\u001b[1;32m   1912\u001b[0m                     \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mCouldn\u001b[39m\u001b[39m'\u001b[39m\u001b[39mt find a dataset script at \u001b[39m\u001b[39m{\u001b[39;00mrelative_to_absolute_path(combined_path)\u001b[39m}\u001b[39;00m\u001b[39m or any data file in the same directory. \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   1913\u001b[0m                     \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mCouldn\u001b[39m\u001b[39m'\u001b[39m\u001b[39mt find \u001b[39m\u001b[39m'\u001b[39m\u001b[39m{\u001b[39;00mpath\u001b[39m}\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m on the Hugging Face Hub either: \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mtype\u001b[39m(e1)\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m\u001b[39m}\u001b[39;00m\u001b[39m: \u001b[39m\u001b[39m{\u001b[39;00me1\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m\n\u001b[1;32m   1914\u001b[0m                 ) \u001b[39mfrom\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[0;32m-> 1915\u001b[0m             \u001b[39mraise\u001b[39;00m e1 \u001b[39mfrom\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[1;32m   1916\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m   1917\u001b[0m     \u001b[39mraise\u001b[39;00m \u001b[39mFileNotFoundError\u001b[39;00m(\n\u001b[1;32m   1918\u001b[0m         \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mCouldn\u001b[39m\u001b[39m'\u001b[39m\u001b[39mt find a dataset script at \u001b[39m\u001b[39m{\u001b[39;00mrelative_to_absolute_path(combined_path)\u001b[39m}\u001b[39;00m\u001b[39m or any data file in the same directory.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   1919\u001b[0m     )\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:1888\u001b[0m, in \u001b[0;36mdataset_module_factory\u001b[0;34m(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, cache_dir, trust_remote_code, _require_default_config_name, _require_custom_configs, **download_kwargs)\u001b[0m\n\u001b[1;32m   1879\u001b[0m             \u001b[39mpass\u001b[39;00m\n\u001b[1;32m   1880\u001b[0m     \u001b[39m# Otherwise we must use the dataset script if the user trusts it\u001b[39;00m\n\u001b[1;32m   1881\u001b[0m     \u001b[39mreturn\u001b[39;00m HubDatasetModuleFactoryWithScript(\n\u001b[1;32m   1882\u001b[0m         path,\n\u001b[1;32m   1883\u001b[0m         revision\u001b[39m=\u001b[39;49mrevision,\n\u001b[1;32m   1884\u001b[0m         download_config\u001b[39m=\u001b[39;49mdownload_config,\n\u001b[1;32m   1885\u001b[0m         download_mode\u001b[39m=\u001b[39;49mdownload_mode,\n\u001b[1;32m   1886\u001b[0m         dynamic_modules_path\u001b[39m=\u001b[39;49mdynamic_modules_path,\n\u001b[1;32m   1887\u001b[0m         trust_remote_code\u001b[39m=\u001b[39;49mtrust_remote_code,\n\u001b[0;32m-> 1888\u001b[0m     )\u001b[39m.\u001b[39;49mget_module()\n\u001b[1;32m   1889\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m   1890\u001b[0m     \u001b[39mreturn\u001b[39;00m HubDatasetModuleFactoryWithoutScript(\n\u001b[1;32m   1891\u001b[0m         path,\n\u001b[1;32m   1892\u001b[0m         revision\u001b[39m=\u001b[39mrevision,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1896\u001b[0m         download_mode\u001b[39m=\u001b[39mdownload_mode,\n\u001b[1;32m   1897\u001b[0m     )\u001b[39m.\u001b[39mget_module()\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:1537\u001b[0m, in \u001b[0;36mHubDatasetModuleFactoryWithScript.get_module\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1526\u001b[0m         _create_importable_file(\n\u001b[1;32m   1527\u001b[0m             local_path\u001b[39m=\u001b[39mlocal_path,\n\u001b[1;32m   1528\u001b[0m             local_imports\u001b[39m=\u001b[39mlocal_imports,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1534\u001b[0m             download_mode\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mdownload_mode,\n\u001b[1;32m   1535\u001b[0m         )\n\u001b[1;32m   1536\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m-> 1537\u001b[0m         \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m   1538\u001b[0m             \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mLoading \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mname\u001b[39m}\u001b[39;00m\u001b[39m requires you to execute the dataset script in that\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   1539\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39m repo on your local machine. Make sure you have read the code there to avoid malicious use, then\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   1540\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39m set the option `trust_remote_code=True` to remove this error.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m   1541\u001b[0m         )\n\u001b[1;32m   1542\u001b[0m module_path, \u001b[39mhash\u001b[39m \u001b[39m=\u001b[39m _load_importable_file(\n\u001b[1;32m   1543\u001b[0m     dynamic_modules_path\u001b[39m=\u001b[39mdynamic_modules_path,\n\u001b[1;32m   1544\u001b[0m     module_namespace\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mdatasets\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m   1545\u001b[0m     subdirectory_name\u001b[39m=\u001b[39m\u001b[39mhash\u001b[39m,\n\u001b[1;32m   1546\u001b[0m     name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mname,\n\u001b[1;32m   1547\u001b[0m )\n\u001b[1;32m   1548\u001b[0m \u001b[39m# make the new module to be noticed by the import system\u001b[39;00m\n",
+      "\u001b[0;31mValueError\u001b[0m: Loading wiki_dpr requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error."
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "# データセットのロード\n",
+    "dataset = load_dataset('wiki_dpr', 'psgs_w100')\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ValueError",
+     "evalue": "BuilderConfig 'psgs_w100' not found. Available: ['psgs_w100.nq.exact', 'psgs_w100.nq.compressed', 'psgs_w100.nq.no_index', 'psgs_w100.multiset.exact', 'psgs_w100.multiset.compressed', 'psgs_w100.multiset.no_index', 'psgs_w100.nq.exact.no_embeddings', 'psgs_w100.nq.compressed.no_embeddings', 'psgs_w100.nq.no_index.no_embeddings', 'psgs_w100.multiset.exact.no_embeddings', 'psgs_w100.multiset.compressed.no_embeddings', 'psgs_w100.multiset.no_index.no_embeddings']",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
+      "\u001b[1;32m/home/user/app/polls/test.ipynb Cell 7\u001b[0m line \u001b[0;36m4\n\u001b[1;32m      <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W6sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mdatasets\u001b[39;00m \u001b[39mimport\u001b[39;00m load_dataset\n\u001b[1;32m      <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W6sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a>\u001b[0m \u001b[39m# データセットのロード\u001b[39;00m\n\u001b[0;32m----> <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#W6sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a>\u001b[0m dataset \u001b[39m=\u001b[39m load_dataset(\u001b[39m'\u001b[39;49m\u001b[39mwiki_dpr\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mpsgs_w100\u001b[39;49m\u001b[39m'\u001b[39;49m, trust_remote_code\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:2592\u001b[0m, in \u001b[0;36mload_dataset\u001b[0;34m(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, trust_remote_code, **config_kwargs)\u001b[0m\n\u001b[1;32m   2587\u001b[0m verification_mode \u001b[39m=\u001b[39m VerificationMode(\n\u001b[1;32m   2588\u001b[0m     (verification_mode \u001b[39mor\u001b[39;00m VerificationMode\u001b[39m.\u001b[39mBASIC_CHECKS) \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m save_infos \u001b[39melse\u001b[39;00m VerificationMode\u001b[39m.\u001b[39mALL_CHECKS\n\u001b[1;32m   2589\u001b[0m )\n\u001b[1;32m   2591\u001b[0m \u001b[39m# Create a dataset builder\u001b[39;00m\n\u001b[0;32m-> 2592\u001b[0m builder_instance \u001b[39m=\u001b[39m load_dataset_builder(\n\u001b[1;32m   2593\u001b[0m     path\u001b[39m=\u001b[39;49mpath,\n\u001b[1;32m   2594\u001b[0m     name\u001b[39m=\u001b[39;49mname,\n\u001b[1;32m   2595\u001b[0m     data_dir\u001b[39m=\u001b[39;49mdata_dir,\n\u001b[1;32m   2596\u001b[0m     data_files\u001b[39m=\u001b[39;49mdata_files,\n\u001b[1;32m   2597\u001b[0m     cache_dir\u001b[39m=\u001b[39;49mcache_dir,\n\u001b[1;32m   2598\u001b[0m     features\u001b[39m=\u001b[39;49mfeatures,\n\u001b[1;32m   2599\u001b[0m     download_config\u001b[39m=\u001b[39;49mdownload_config,\n\u001b[1;32m   2600\u001b[0m     download_mode\u001b[39m=\u001b[39;49mdownload_mode,\n\u001b[1;32m   2601\u001b[0m     revision\u001b[39m=\u001b[39;49mrevision,\n\u001b[1;32m   2602\u001b[0m     token\u001b[39m=\u001b[39;49mtoken,\n\u001b[1;32m   2603\u001b[0m     storage_options\u001b[39m=\u001b[39;49mstorage_options,\n\u001b[1;32m   2604\u001b[0m     trust_remote_code\u001b[39m=\u001b[39;49mtrust_remote_code,\n\u001b[1;32m   2605\u001b[0m     _require_default_config_name\u001b[39m=\u001b[39;49mname \u001b[39mis\u001b[39;49;00m \u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m   2606\u001b[0m     \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mconfig_kwargs,\n\u001b[1;32m   2607\u001b[0m )\n\u001b[1;32m   2609\u001b[0m \u001b[39m# Return iterable dataset in case of streaming\u001b[39;00m\n\u001b[1;32m   2610\u001b[0m \u001b[39mif\u001b[39;00m streaming:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:2301\u001b[0m, in \u001b[0;36mload_dataset_builder\u001b[0;34m(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, use_auth_token, storage_options, trust_remote_code, _require_default_config_name, **config_kwargs)\u001b[0m\n\u001b[1;32m   2299\u001b[0m builder_cls \u001b[39m=\u001b[39m get_dataset_builder_class(dataset_module, dataset_name\u001b[39m=\u001b[39mdataset_name)\n\u001b[1;32m   2300\u001b[0m \u001b[39m# Instantiate the dataset builder\u001b[39;00m\n\u001b[0;32m-> 2301\u001b[0m builder_instance: DatasetBuilder \u001b[39m=\u001b[39m builder_cls(\n\u001b[1;32m   2302\u001b[0m     cache_dir\u001b[39m=\u001b[39;49mcache_dir,\n\u001b[1;32m   2303\u001b[0m     dataset_name\u001b[39m=\u001b[39;49mdataset_name,\n\u001b[1;32m   2304\u001b[0m     config_name\u001b[39m=\u001b[39;49mconfig_name,\n\u001b[1;32m   2305\u001b[0m     data_dir\u001b[39m=\u001b[39;49mdata_dir,\n\u001b[1;32m   2306\u001b[0m     data_files\u001b[39m=\u001b[39;49mdata_files,\n\u001b[1;32m   2307\u001b[0m     \u001b[39mhash\u001b[39;49m\u001b[39m=\u001b[39;49mdataset_module\u001b[39m.\u001b[39;49mhash,\n\u001b[1;32m   2308\u001b[0m     info\u001b[39m=\u001b[39;49minfo,\n\u001b[1;32m   2309\u001b[0m     features\u001b[39m=\u001b[39;49mfeatures,\n\u001b[1;32m   2310\u001b[0m     token\u001b[39m=\u001b[39;49mtoken,\n\u001b[1;32m   2311\u001b[0m     storage_options\u001b[39m=\u001b[39;49mstorage_options,\n\u001b[1;32m   2312\u001b[0m     \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mbuilder_kwargs,\n\u001b[1;32m   2313\u001b[0m     \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mconfig_kwargs,\n\u001b[1;32m   2314\u001b[0m )\n\u001b[1;32m   2315\u001b[0m builder_instance\u001b[39m.\u001b[39m_use_legacy_cache_dir_if_possible(dataset_module)\n\u001b[1;32m   2317\u001b[0m \u001b[39mreturn\u001b[39;00m builder_instance\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/builder.py:374\u001b[0m, in \u001b[0;36mDatasetBuilder.__init__\u001b[0;34m(self, cache_dir, dataset_name, config_name, hash, base_path, info, features, token, use_auth_token, repo_id, data_files, data_dir, storage_options, writer_batch_size, name, **config_kwargs)\u001b[0m\n\u001b[1;32m    372\u001b[0m     config_kwargs[\u001b[39m\"\u001b[39m\u001b[39mdata_dir\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m data_dir\n\u001b[1;32m    373\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconfig_kwargs \u001b[39m=\u001b[39m config_kwargs\n\u001b[0;32m--> 374\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconfig, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconfig_id \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_create_builder_config(\n\u001b[1;32m    375\u001b[0m     config_name\u001b[39m=\u001b[39;49mconfig_name,\n\u001b[1;32m    376\u001b[0m     custom_features\u001b[39m=\u001b[39;49mfeatures,\n\u001b[1;32m    377\u001b[0m     \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mconfig_kwargs,\n\u001b[1;32m    378\u001b[0m )\n\u001b[1;32m    380\u001b[0m \u001b[39m# prepare info: DatasetInfo are a standardized dataclass across all datasets\u001b[39;00m\n\u001b[1;32m    381\u001b[0m \u001b[39m# Prefill datasetinfo\u001b[39;00m\n\u001b[1;32m    382\u001b[0m \u001b[39mif\u001b[39;00m info \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m    383\u001b[0m     \u001b[39m# TODO FOR PACKAGED MODULES IT IMPORTS DATA FROM src/packaged_modules which doesn't make sense\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/builder.py:599\u001b[0m, in \u001b[0;36mDatasetBuilder._create_builder_config\u001b[0;34m(self, config_name, custom_features, **config_kwargs)\u001b[0m\n\u001b[1;32m    597\u001b[0m     builder_config \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mbuilder_configs\u001b[39m.\u001b[39mget(config_name)\n\u001b[1;32m    598\u001b[0m     \u001b[39mif\u001b[39;00m builder_config \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39mand\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mBUILDER_CONFIGS:\n\u001b[0;32m--> 599\u001b[0m         \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m    600\u001b[0m             \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mBuilderConfig \u001b[39m\u001b[39m'\u001b[39m\u001b[39m{\u001b[39;00mconfig_name\u001b[39m}\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m not found. Available: \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mlist\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mbuilder_configs\u001b[39m.\u001b[39mkeys())\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m\n\u001b[1;32m    601\u001b[0m         )\n\u001b[1;32m    603\u001b[0m \u001b[39m# if not using an existing config, then create a new config on the fly\u001b[39;00m\n\u001b[1;32m    604\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m builder_config:\n",
+      "\u001b[0;31mValueError\u001b[0m: BuilderConfig 'psgs_w100' not found. Available: ['psgs_w100.nq.exact', 'psgs_w100.nq.compressed', 'psgs_w100.nq.no_index', 'psgs_w100.multiset.exact', 'psgs_w100.multiset.compressed', 'psgs_w100.multiset.no_index', 'psgs_w100.nq.exact.no_embeddings', 'psgs_w100.nq.compressed.no_embeddings', 'psgs_w100.nq.no_index.no_embeddings', 'psgs_w100.multiset.exact.no_embeddings', 'psgs_w100.multiset.compressed.no_embeddings', 'psgs_w100.multiset.no_index.no_embeddings']"
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "# データセットのロード\n",
+    "dataset = load_dataset('wiki_dpr', 'psgs_w100', trust_remote_code=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Downloading data:   0%|          | 0/157 [02:34<?, ?files/s]\n"
+     ]
+    },
+    {
+     "ename": "KeyboardInterrupt",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/tqdm/contrib/concurrent.py:51\u001b[0m, in \u001b[0;36m_executor_map\u001b[0;34m(PoolExecutor, fn, *iterables, **tqdm_kwargs)\u001b[0m\n\u001b[1;32m     49\u001b[0m \u001b[39mwith\u001b[39;00m PoolExecutor(max_workers\u001b[39m=\u001b[39mmax_workers, initializer\u001b[39m=\u001b[39mtqdm_class\u001b[39m.\u001b[39mset_lock,\n\u001b[1;32m     50\u001b[0m                   initargs\u001b[39m=\u001b[39m(lk,)) \u001b[39mas\u001b[39;00m ex:\n\u001b[0;32m---> 51\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mlist\u001b[39;49m(tqdm_class(ex\u001b[39m.\u001b[39;49mmap(fn, \u001b[39m*\u001b[39;49miterables, chunksize\u001b[39m=\u001b[39;49mchunksize), \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs))\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/tqdm/std.py:1181\u001b[0m, in \u001b[0;36mtqdm.__iter__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1180\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 1181\u001b[0m     \u001b[39mfor\u001b[39;00m obj \u001b[39min\u001b[39;00m iterable:\n\u001b[1;32m   1182\u001b[0m         \u001b[39myield\u001b[39;00m obj\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/concurrent/futures/_base.py:621\u001b[0m, in \u001b[0;36mExecutor.map.<locals>.result_iterator\u001b[0;34m()\u001b[0m\n\u001b[1;32m    620\u001b[0m \u001b[39mif\u001b[39;00m timeout \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m--> 621\u001b[0m     \u001b[39myield\u001b[39;00m _result_or_cancel(fs\u001b[39m.\u001b[39;49mpop())\n\u001b[1;32m    622\u001b[0m \u001b[39melse\u001b[39;00m:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/concurrent/futures/_base.py:319\u001b[0m, in \u001b[0;36m_result_or_cancel\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m    318\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 319\u001b[0m     \u001b[39mreturn\u001b[39;00m fut\u001b[39m.\u001b[39;49mresult(timeout)\n\u001b[1;32m    320\u001b[0m \u001b[39mfinally\u001b[39;00m:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/concurrent/futures/_base.py:453\u001b[0m, in \u001b[0;36mFuture.result\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    451\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m__get_result()\n\u001b[0;32m--> 453\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_condition\u001b[39m.\u001b[39;49mwait(timeout)\n\u001b[1;32m    455\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_state \u001b[39min\u001b[39;00m [CANCELLED, CANCELLED_AND_NOTIFIED]:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/threading.py:320\u001b[0m, in \u001b[0;36mCondition.wait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    319\u001b[0m \u001b[39mif\u001b[39;00m timeout \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m--> 320\u001b[0m     waiter\u001b[39m.\u001b[39;49macquire()\n\u001b[1;32m    321\u001b[0m     gotit \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m: ",
+      "\nDuring handling of the above exception, another exception occurred:\n",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "\u001b[1;32m/home/user/app/polls/test.ipynb Cell 8\u001b[0m line \u001b[0;36m4\n\u001b[1;32m      <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#X10sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mdatasets\u001b[39;00m \u001b[39mimport\u001b[39;00m load_dataset\n\u001b[1;32m      <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#X10sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a>\u001b[0m \u001b[39m# データセットのロード\u001b[39;00m\n\u001b[0;32m----> <a href='vscode-notebook-cell://kenken999-fastapi-django-main--1027.hf.space/home/user/app/polls/test.ipynb#X10sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a>\u001b[0m dataset \u001b[39m=\u001b[39m load_dataset(\u001b[39m'\u001b[39;49m\u001b[39mwiki_dpr\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mpsgs_w100.nq.exact\u001b[39;49m\u001b[39m'\u001b[39;49m, trust_remote_code\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/load.py:2614\u001b[0m, in \u001b[0;36mload_dataset\u001b[0;34m(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, trust_remote_code, **config_kwargs)\u001b[0m\n\u001b[1;32m   2611\u001b[0m     \u001b[39mreturn\u001b[39;00m builder_instance\u001b[39m.\u001b[39mas_streaming_dataset(split\u001b[39m=\u001b[39msplit)\n\u001b[1;32m   2613\u001b[0m \u001b[39m# Download and prepare data\u001b[39;00m\n\u001b[0;32m-> 2614\u001b[0m builder_instance\u001b[39m.\u001b[39;49mdownload_and_prepare(\n\u001b[1;32m   2615\u001b[0m     download_config\u001b[39m=\u001b[39;49mdownload_config,\n\u001b[1;32m   2616\u001b[0m     download_mode\u001b[39m=\u001b[39;49mdownload_mode,\n\u001b[1;32m   2617\u001b[0m     verification_mode\u001b[39m=\u001b[39;49mverification_mode,\n\u001b[1;32m   2618\u001b[0m     num_proc\u001b[39m=\u001b[39;49mnum_proc,\n\u001b[1;32m   2619\u001b[0m     storage_options\u001b[39m=\u001b[39;49mstorage_options,\n\u001b[1;32m   2620\u001b[0m )\n\u001b[1;32m   2622\u001b[0m \u001b[39m# Build dataset for splits\u001b[39;00m\n\u001b[1;32m   2623\u001b[0m keep_in_memory \u001b[39m=\u001b[39m (\n\u001b[1;32m   2624\u001b[0m     keep_in_memory \u001b[39mif\u001b[39;00m keep_in_memory \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39melse\u001b[39;00m is_small_dataset(builder_instance\u001b[39m.\u001b[39minfo\u001b[39m.\u001b[39mdataset_size)\n\u001b[1;32m   2625\u001b[0m )\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/builder.py:1027\u001b[0m, in \u001b[0;36mDatasetBuilder.download_and_prepare\u001b[0;34m(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs)\u001b[0m\n\u001b[1;32m   1025\u001b[0m     \u001b[39mif\u001b[39;00m num_proc \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m   1026\u001b[0m         prepare_split_kwargs[\u001b[39m\"\u001b[39m\u001b[39mnum_proc\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m num_proc\n\u001b[0;32m-> 1027\u001b[0m     \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_download_and_prepare(\n\u001b[1;32m   1028\u001b[0m         dl_manager\u001b[39m=\u001b[39;49mdl_manager,\n\u001b[1;32m   1029\u001b[0m         verification_mode\u001b[39m=\u001b[39;49mverification_mode,\n\u001b[1;32m   1030\u001b[0m         \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mprepare_split_kwargs,\n\u001b[1;32m   1031\u001b[0m         \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mdownload_and_prepare_kwargs,\n\u001b[1;32m   1032\u001b[0m     )\n\u001b[1;32m   1033\u001b[0m \u001b[39m# Sync info\u001b[39;00m\n\u001b[1;32m   1034\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39minfo\u001b[39m.\u001b[39mdataset_size \u001b[39m=\u001b[39m \u001b[39msum\u001b[39m(split\u001b[39m.\u001b[39mnum_bytes \u001b[39mfor\u001b[39;00m split \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39minfo\u001b[39m.\u001b[39msplits\u001b[39m.\u001b[39mvalues())\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/builder.py:1100\u001b[0m, in \u001b[0;36mDatasetBuilder._download_and_prepare\u001b[0;34m(self, dl_manager, verification_mode, **prepare_split_kwargs)\u001b[0m\n\u001b[1;32m   1098\u001b[0m split_dict \u001b[39m=\u001b[39m SplitDict(dataset_name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mdataset_name)\n\u001b[1;32m   1099\u001b[0m split_generators_kwargs \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_make_split_generators_kwargs(prepare_split_kwargs)\n\u001b[0;32m-> 1100\u001b[0m split_generators \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_split_generators(dl_manager, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49msplit_generators_kwargs)\n\u001b[1;32m   1102\u001b[0m \u001b[39m# Checksums verification\u001b[39;00m\n\u001b[1;32m   1103\u001b[0m \u001b[39mif\u001b[39;00m verification_mode \u001b[39m==\u001b[39m VerificationMode\u001b[39m.\u001b[39mALL_CHECKS \u001b[39mand\u001b[39;00m dl_manager\u001b[39m.\u001b[39mrecord_checksums:\n",
+      "File \u001b[0;32m~/.cache/huggingface/modules/datasets_modules/datasets/wiki_dpr/66fd9b80f51375c02cd9010050e781ed3e8f759e868f690c31b2686a7a0eeb5c/wiki_dpr.py:143\u001b[0m, in \u001b[0;36mWikiDpr._split_generators\u001b[0;34m(self, dl_manager)\u001b[0m\n\u001b[1;32m    141\u001b[0m data_dir \u001b[39m=\u001b[39m os\u001b[39m.\u001b[39mpath\u001b[39m.\u001b[39mjoin(\u001b[39m\"\u001b[39m\u001b[39mdata\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconfig\u001b[39m.\u001b[39mwiki_split, data_dir)\n\u001b[1;32m    142\u001b[0m files \u001b[39m=\u001b[39m [os\u001b[39m.\u001b[39mpath\u001b[39m.\u001b[39mjoin(data_dir, \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mtrain-\u001b[39m\u001b[39m{\u001b[39;00mi\u001b[39m:\u001b[39;00m\u001b[39m05d\u001b[39m\u001b[39m}\u001b[39;00m\u001b[39m-of-\u001b[39m\u001b[39m{\u001b[39;00mnum_shards\u001b[39m:\u001b[39;00m\u001b[39m05d\u001b[39m\u001b[39m}\u001b[39;00m\u001b[39m.parquet\u001b[39m\u001b[39m\"\u001b[39m) \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m \u001b[39mrange\u001b[39m(num_shards)]\n\u001b[0;32m--> 143\u001b[0m downloaded_files \u001b[39m=\u001b[39m dl_manager\u001b[39m.\u001b[39;49mdownload_and_extract(files)\n\u001b[1;32m    144\u001b[0m \u001b[39mreturn\u001b[39;00m [\n\u001b[1;32m    145\u001b[0m     datasets\u001b[39m.\u001b[39mSplitGenerator(name\u001b[39m=\u001b[39mdatasets\u001b[39m.\u001b[39mSplit\u001b[39m.\u001b[39mTRAIN, gen_kwargs\u001b[39m=\u001b[39m{\u001b[39m\"\u001b[39m\u001b[39mfiles\u001b[39m\u001b[39m\"\u001b[39m: downloaded_files}),\n\u001b[1;32m    146\u001b[0m ]\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/download/download_manager.py:434\u001b[0m, in \u001b[0;36mDownloadManager.download_and_extract\u001b[0;34m(self, url_or_urls)\u001b[0m\n\u001b[1;32m    418\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mdownload_and_extract\u001b[39m(\u001b[39mself\u001b[39m, url_or_urls):\n\u001b[1;32m    419\u001b[0m \u001b[39m    \u001b[39m\u001b[39m\"\"\"Download and extract given `url_or_urls`.\u001b[39;00m\n\u001b[1;32m    420\u001b[0m \n\u001b[1;32m    421\u001b[0m \u001b[39m    Is roughly equivalent to:\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    432\u001b[0m \u001b[39m        extracted_path(s): `str`, extracted paths of given URL(s).\u001b[39;00m\n\u001b[1;32m    433\u001b[0m \u001b[39m    \"\"\"\u001b[39;00m\n\u001b[0;32m--> 434\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mextract(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mdownload(url_or_urls))\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/download/download_manager.py:257\u001b[0m, in \u001b[0;36mDownloadManager.download\u001b[0;34m(self, url_or_urls)\u001b[0m\n\u001b[1;32m    255\u001b[0m start_time \u001b[39m=\u001b[39m datetime\u001b[39m.\u001b[39mnow()\n\u001b[1;32m    256\u001b[0m \u001b[39mwith\u001b[39;00m stack_multiprocessing_download_progress_bars():\n\u001b[0;32m--> 257\u001b[0m     downloaded_path_or_paths \u001b[39m=\u001b[39m map_nested(\n\u001b[1;32m    258\u001b[0m         download_func,\n\u001b[1;32m    259\u001b[0m         url_or_urls,\n\u001b[1;32m    260\u001b[0m         map_tuple\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m,\n\u001b[1;32m    261\u001b[0m         num_proc\u001b[39m=\u001b[39;49mdownload_config\u001b[39m.\u001b[39;49mnum_proc,\n\u001b[1;32m    262\u001b[0m         desc\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mDownloading data files\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m    263\u001b[0m         batched\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m,\n\u001b[1;32m    264\u001b[0m         batch_size\u001b[39m=\u001b[39;49m\u001b[39m-\u001b[39;49m\u001b[39m1\u001b[39;49m,\n\u001b[1;32m    265\u001b[0m     )\n\u001b[1;32m    266\u001b[0m duration \u001b[39m=\u001b[39m datetime\u001b[39m.\u001b[39mnow() \u001b[39m-\u001b[39m start_time\n\u001b[1;32m    267\u001b[0m logger\u001b[39m.\u001b[39minfo(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mDownloading took \u001b[39m\u001b[39m{\u001b[39;00mduration\u001b[39m.\u001b[39mtotal_seconds()\u001b[39m \u001b[39m\u001b[39m/\u001b[39m\u001b[39m/\u001b[39m\u001b[39m \u001b[39m\u001b[39m60\u001b[39m\u001b[39m}\u001b[39;00m\u001b[39m min\u001b[39m\u001b[39m\"\u001b[39m)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/utils/py_utils.py:511\u001b[0m, in \u001b[0;36mmap_nested\u001b[0;34m(function, data_struct, dict_only, map_list, map_tuple, map_numpy, num_proc, parallel_min_length, batched, batch_size, types, disable_tqdm, desc)\u001b[0m\n\u001b[1;32m    509\u001b[0m         batch_size \u001b[39m=\u001b[39m \u001b[39mmax\u001b[39m(\u001b[39mlen\u001b[39m(iterable) \u001b[39m/\u001b[39m\u001b[39m/\u001b[39m num_proc \u001b[39m+\u001b[39m \u001b[39mint\u001b[39m(\u001b[39mlen\u001b[39m(iterable) \u001b[39m%\u001b[39m num_proc \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m), \u001b[39m1\u001b[39m)\n\u001b[1;32m    510\u001b[0m     iterable \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(iter_batched(iterable, batch_size))\n\u001b[0;32m--> 511\u001b[0m mapped \u001b[39m=\u001b[39m [\n\u001b[1;32m    512\u001b[0m     _single_map_nested((function, obj, batched, batch_size, types, \u001b[39mNone\u001b[39;00m, \u001b[39mTrue\u001b[39;00m, \u001b[39mNone\u001b[39;00m))\n\u001b[1;32m    513\u001b[0m     \u001b[39mfor\u001b[39;00m obj \u001b[39min\u001b[39;00m hf_tqdm(iterable, disable\u001b[39m=\u001b[39mdisable_tqdm, desc\u001b[39m=\u001b[39mdesc)\n\u001b[1;32m    514\u001b[0m ]\n\u001b[1;32m    515\u001b[0m \u001b[39mif\u001b[39;00m batched:\n\u001b[1;32m    516\u001b[0m     mapped \u001b[39m=\u001b[39m [mapped_item \u001b[39mfor\u001b[39;00m mapped_batch \u001b[39min\u001b[39;00m mapped \u001b[39mfor\u001b[39;00m mapped_item \u001b[39min\u001b[39;00m mapped_batch]\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/utils/py_utils.py:512\u001b[0m, in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m    509\u001b[0m         batch_size \u001b[39m=\u001b[39m \u001b[39mmax\u001b[39m(\u001b[39mlen\u001b[39m(iterable) \u001b[39m/\u001b[39m\u001b[39m/\u001b[39m num_proc \u001b[39m+\u001b[39m \u001b[39mint\u001b[39m(\u001b[39mlen\u001b[39m(iterable) \u001b[39m%\u001b[39m num_proc \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m), \u001b[39m1\u001b[39m)\n\u001b[1;32m    510\u001b[0m     iterable \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(iter_batched(iterable, batch_size))\n\u001b[1;32m    511\u001b[0m mapped \u001b[39m=\u001b[39m [\n\u001b[0;32m--> 512\u001b[0m     _single_map_nested((function, obj, batched, batch_size, types, \u001b[39mNone\u001b[39;49;00m, \u001b[39mTrue\u001b[39;49;00m, \u001b[39mNone\u001b[39;49;00m))\n\u001b[1;32m    513\u001b[0m     \u001b[39mfor\u001b[39;00m obj \u001b[39min\u001b[39;00m hf_tqdm(iterable, disable\u001b[39m=\u001b[39mdisable_tqdm, desc\u001b[39m=\u001b[39mdesc)\n\u001b[1;32m    514\u001b[0m ]\n\u001b[1;32m    515\u001b[0m \u001b[39mif\u001b[39;00m batched:\n\u001b[1;32m    516\u001b[0m     mapped \u001b[39m=\u001b[39m [mapped_item \u001b[39mfor\u001b[39;00m mapped_batch \u001b[39min\u001b[39;00m mapped \u001b[39mfor\u001b[39;00m mapped_item \u001b[39min\u001b[39;00m mapped_batch]\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/utils/py_utils.py:380\u001b[0m, in \u001b[0;36m_single_map_nested\u001b[0;34m(args)\u001b[0m\n\u001b[1;32m    373\u001b[0m         \u001b[39mreturn\u001b[39;00m function(data_struct)\n\u001b[1;32m    374\u001b[0m \u001b[39mif\u001b[39;00m (\n\u001b[1;32m    375\u001b[0m     batched\n\u001b[1;32m    376\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(data_struct, \u001b[39mdict\u001b[39m)\n\u001b[1;32m    377\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39misinstance\u001b[39m(data_struct, types)\n\u001b[1;32m    378\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39mall\u001b[39m(\u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(v, (\u001b[39mdict\u001b[39m, types)) \u001b[39mfor\u001b[39;00m v \u001b[39min\u001b[39;00m data_struct)\n\u001b[1;32m    379\u001b[0m ):\n\u001b[0;32m--> 380\u001b[0m     \u001b[39mreturn\u001b[39;00m [mapped_item \u001b[39mfor\u001b[39;00m batch \u001b[39min\u001b[39;00m iter_batched(data_struct, batch_size) \u001b[39mfor\u001b[39;00m mapped_item \u001b[39min\u001b[39;00m function(batch)]\n\u001b[1;32m    382\u001b[0m \u001b[39m# Reduce logging to keep things readable in multiprocessing with tqdm\u001b[39;00m\n\u001b[1;32m    383\u001b[0m \u001b[39mif\u001b[39;00m rank \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39mand\u001b[39;00m logging\u001b[39m.\u001b[39mget_verbosity() \u001b[39m<\u001b[39m logging\u001b[39m.\u001b[39mWARNING:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/utils/py_utils.py:380\u001b[0m, in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m    373\u001b[0m         \u001b[39mreturn\u001b[39;00m function(data_struct)\n\u001b[1;32m    374\u001b[0m \u001b[39mif\u001b[39;00m (\n\u001b[1;32m    375\u001b[0m     batched\n\u001b[1;32m    376\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(data_struct, \u001b[39mdict\u001b[39m)\n\u001b[1;32m    377\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39misinstance\u001b[39m(data_struct, types)\n\u001b[1;32m    378\u001b[0m     \u001b[39mand\u001b[39;00m \u001b[39mall\u001b[39m(\u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(v, (\u001b[39mdict\u001b[39m, types)) \u001b[39mfor\u001b[39;00m v \u001b[39min\u001b[39;00m data_struct)\n\u001b[1;32m    379\u001b[0m ):\n\u001b[0;32m--> 380\u001b[0m     \u001b[39mreturn\u001b[39;00m [mapped_item \u001b[39mfor\u001b[39;00m batch \u001b[39min\u001b[39;00m iter_batched(data_struct, batch_size) \u001b[39mfor\u001b[39;00m mapped_item \u001b[39min\u001b[39;00m function(batch)]\n\u001b[1;32m    382\u001b[0m \u001b[39m# Reduce logging to keep things readable in multiprocessing with tqdm\u001b[39;00m\n\u001b[1;32m    383\u001b[0m \u001b[39mif\u001b[39;00m rank \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39mand\u001b[39;00m logging\u001b[39m.\u001b[39mget_verbosity() \u001b[39m<\u001b[39m logging\u001b[39m.\u001b[39mWARNING:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/datasets/download/download_manager.py:300\u001b[0m, in \u001b[0;36mDownloadManager._download_batched\u001b[0;34m(self, url_or_filenames, download_config)\u001b[0m\n\u001b[1;32m    295\u001b[0m         \u001b[39mpass\u001b[39;00m\n\u001b[1;32m    296\u001b[0m     max_workers \u001b[39m=\u001b[39m (\n\u001b[1;32m    297\u001b[0m         config\u001b[39m.\u001b[39mHF_DATASETS_MULTITHREADING_MAX_WORKERS \u001b[39mif\u001b[39;00m size \u001b[39m<\u001b[39m (\u001b[39m20\u001b[39m \u001b[39m<<\u001b[39m \u001b[39m20\u001b[39m) \u001b[39melse\u001b[39;00m \u001b[39m1\u001b[39m\n\u001b[1;32m    298\u001b[0m     )  \u001b[39m# enable multithreading if files are small\u001b[39;00m\n\u001b[0;32m--> 300\u001b[0m     \u001b[39mreturn\u001b[39;00m thread_map(\n\u001b[1;32m    301\u001b[0m         download_func,\n\u001b[1;32m    302\u001b[0m         url_or_filenames,\n\u001b[1;32m    303\u001b[0m         desc\u001b[39m=\u001b[39;49mdownload_config\u001b[39m.\u001b[39;49mdownload_desc \u001b[39mor\u001b[39;49;00m \u001b[39m\"\u001b[39;49m\u001b[39mDownloading\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m    304\u001b[0m         unit\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mfiles\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m    305\u001b[0m         position\u001b[39m=\u001b[39;49mmultiprocessing\u001b[39m.\u001b[39;49mcurrent_process()\u001b[39m.\u001b[39;49m_identity[\u001b[39m-\u001b[39;49m\u001b[39m1\u001b[39;49m]  \u001b[39m# contains the ranks of subprocesses\u001b[39;49;00m\n\u001b[1;32m    306\u001b[0m         \u001b[39mif\u001b[39;49;00m os\u001b[39m.\u001b[39;49menviron\u001b[39m.\u001b[39;49mget(\u001b[39m\"\u001b[39;49m\u001b[39mHF_DATASETS_STACK_MULTIPROCESSING_DOWNLOAD_PROGRESS_BARS\u001b[39;49m\u001b[39m\"\u001b[39;49m) \u001b[39m==\u001b[39;49m \u001b[39m\"\u001b[39;49m\u001b[39m1\u001b[39;49m\u001b[39m\"\u001b[39;49m\n\u001b[1;32m    307\u001b[0m         \u001b[39mand\u001b[39;49;00m multiprocessing\u001b[39m.\u001b[39;49mcurrent_process()\u001b[39m.\u001b[39;49m_identity\n\u001b[1;32m    308\u001b[0m         \u001b[39melse\u001b[39;49;00m \u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m    309\u001b[0m         max_workers\u001b[39m=\u001b[39;49mmax_workers,\n\u001b[1;32m    310\u001b[0m         tqdm_class\u001b[39m=\u001b[39;49mtqdm,\n\u001b[1;32m    311\u001b[0m     )\n\u001b[1;32m    312\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    313\u001b[0m     \u001b[39mreturn\u001b[39;00m [\n\u001b[1;32m    314\u001b[0m         \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_download_single(url_or_filename, download_config\u001b[39m=\u001b[39mdownload_config)\n\u001b[1;32m    315\u001b[0m         \u001b[39mfor\u001b[39;00m url_or_filename \u001b[39min\u001b[39;00m url_or_filenames\n\u001b[1;32m    316\u001b[0m     ]\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/tqdm/contrib/concurrent.py:69\u001b[0m, in \u001b[0;36mthread_map\u001b[0;34m(fn, *iterables, **tqdm_kwargs)\u001b[0m\n\u001b[1;32m     55\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m     56\u001b[0m \u001b[39mEquivalent of `list(map(fn, *iterables))`\u001b[39;00m\n\u001b[1;32m     57\u001b[0m \u001b[39mdriven by `concurrent.futures.ThreadPoolExecutor`.\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     66\u001b[0m \u001b[39m    [default: max(32, cpu_count() + 4)].\u001b[39;00m\n\u001b[1;32m     67\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m     68\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mconcurrent\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mfutures\u001b[39;00m \u001b[39mimport\u001b[39;00m ThreadPoolExecutor\n\u001b[0;32m---> 69\u001b[0m \u001b[39mreturn\u001b[39;00m _executor_map(ThreadPoolExecutor, fn, \u001b[39m*\u001b[39;49miterables, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mtqdm_kwargs)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/site-packages/tqdm/contrib/concurrent.py:49\u001b[0m, in \u001b[0;36m_executor_map\u001b[0;34m(PoolExecutor, fn, *iterables, **tqdm_kwargs)\u001b[0m\n\u001b[1;32m     46\u001b[0m lock_name \u001b[39m=\u001b[39m kwargs\u001b[39m.\u001b[39mpop(\u001b[39m\"\u001b[39m\u001b[39mlock_name\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m     47\u001b[0m \u001b[39mwith\u001b[39;00m ensure_lock(tqdm_class, lock_name\u001b[39m=\u001b[39mlock_name) \u001b[39mas\u001b[39;00m lk:\n\u001b[1;32m     48\u001b[0m     \u001b[39m# share lock in case workers are already using `tqdm`\u001b[39;00m\n\u001b[0;32m---> 49\u001b[0m     \u001b[39mwith\u001b[39;00m PoolExecutor(max_workers\u001b[39m=\u001b[39mmax_workers, initializer\u001b[39m=\u001b[39mtqdm_class\u001b[39m.\u001b[39mset_lock,\n\u001b[1;32m     50\u001b[0m                       initargs\u001b[39m=\u001b[39m(lk,)) \u001b[39mas\u001b[39;00m ex:\n\u001b[1;32m     51\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mlist\u001b[39m(tqdm_class(ex\u001b[39m.\u001b[39mmap(fn, \u001b[39m*\u001b[39miterables, chunksize\u001b[39m=\u001b[39mchunksize), \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs))\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/concurrent/futures/_base.py:649\u001b[0m, in \u001b[0;36mExecutor.__exit__\u001b[0;34m(self, exc_type, exc_val, exc_tb)\u001b[0m\n\u001b[1;32m    648\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__exit__\u001b[39m(\u001b[39mself\u001b[39m, exc_type, exc_val, exc_tb):\n\u001b[0;32m--> 649\u001b[0m     \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mshutdown(wait\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m)\n\u001b[1;32m    650\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mFalse\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/concurrent/futures/thread.py:235\u001b[0m, in \u001b[0;36mThreadPoolExecutor.shutdown\u001b[0;34m(self, wait, cancel_futures)\u001b[0m\n\u001b[1;32m    233\u001b[0m \u001b[39mif\u001b[39;00m wait:\n\u001b[1;32m    234\u001b[0m     \u001b[39mfor\u001b[39;00m t \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_threads:\n\u001b[0;32m--> 235\u001b[0m         t\u001b[39m.\u001b[39;49mjoin()\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/threading.py:1096\u001b[0m, in \u001b[0;36mThread.join\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m   1093\u001b[0m     \u001b[39mraise\u001b[39;00m \u001b[39mRuntimeError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mcannot join current thread\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m   1095\u001b[0m \u001b[39mif\u001b[39;00m timeout \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m-> 1096\u001b[0m     \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_wait_for_tstate_lock()\n\u001b[1;32m   1097\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m   1098\u001b[0m     \u001b[39m# the behavior of a negative timeout isn't documented, but\u001b[39;00m\n\u001b[1;32m   1099\u001b[0m     \u001b[39m# historically .join(timeout=x) for x<0 has acted as if timeout=0\u001b[39;00m\n\u001b[1;32m   1100\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_wait_for_tstate_lock(timeout\u001b[39m=\u001b[39m\u001b[39mmax\u001b[39m(timeout, \u001b[39m0\u001b[39m))\n",
+      "File \u001b[0;32m/usr/local/lib/python3.10/threading.py:1116\u001b[0m, in \u001b[0;36mThread._wait_for_tstate_lock\u001b[0;34m(self, block, timeout)\u001b[0m\n\u001b[1;32m   1113\u001b[0m     \u001b[39mreturn\u001b[39;00m\n\u001b[1;32m   1115\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 1116\u001b[0m     \u001b[39mif\u001b[39;00m lock\u001b[39m.\u001b[39;49macquire(block, timeout):\n\u001b[1;32m   1117\u001b[0m         lock\u001b[39m.\u001b[39mrelease()\n\u001b[1;32m   1118\u001b[0m         \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_stop()\n",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "# データセットのロード\n",
+    "dataset = load_dataset('wiki_dpr', 'psgs_w100.nq.exact', trust_remote_code=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Downloading data:  90%|█████████ | 142/157 [23:35<01:01,  4.07s/files]"
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset\n",
+    "from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration\n",
+    "import faiss\n",
+    "import numpy as np\n",
+    "\n",
+    "# データセットのロード\n",
+    "dataset = load_dataset('wiki_dpr', 'psgs_w100.nq.exact', trust_remote_code=True)\n",
+    "\n",
+    "# モデルとトークナイザーのロード\n",
+    "model_name = \"facebook/rag-sequence-nq\"\n",
+    "tokenizer = RagTokenizer.from_pretrained(model_name)\n",
+    "retriever = RagRetriever.from_pretrained(model_name, use_dummy_dataset=True)\n",
+    "model = RagSequenceForGeneration.from_pretrained(model_name, retriever=retriever)\n",
+    "\n",
+    "# ドキュメントのエンコーディング\n",
+    "def embed_passages(passages):\n",
+    "    embeddings = []\n",
+    "    for passage in passages:\n",
+    "        inputs = tokenizer(passage, return_tensors=\"pt\", truncation=True, padding=\"max_length\", max_length=256)\n",
+    "        outputs = model.retriever.question_encoder(**inputs)\n",
+    "        embeddings.append(outputs.pooler_output.detach().numpy())\n",
+    "    return np.vstack(embeddings)\n",
+    "\n",
+    "# ドキュメントのエンベッド\n",
+    "passages = dataset['train']['text'][:1000]  # デモのため、最初の1000ドキュメントのみを使用\n",
+    "passage_embeddings = embed_passages(passages)\n",
+    "\n",
+    "# Faissインデックスの作成\n",
+    "index = faiss.IndexFlatL2(passage_embeddings.shape[1])\n",
+    "index.add(passage_embeddings)\n",
+    "faiss.write_index(index, \"faiss_index\")\n",
+    "\n",
+    "# Faissインデックスをモデルに読み込む\n",
+    "retriever.index = faiss.read_index(\"faiss_index\")\n",
+    "\n",
+    "# 質問のトークナイズ\n",
+    "question = \"What is the capital of France?\"\n",
+    "inputs = tokenizer(question, return_tensors=\"pt\")\n",
+    "\n",
+    "# 回答の生成\n",
+    "outputs = model.generate(inputs[\"input_ids\"])\n",
+    "answer = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]\n",
+    "\n",
+    "print(answer)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}