Spaces:

TogetherAI
/

EinfachLlaMistral

Sleeping

App Files Files Community

TogetherAI commited on Oct 9, 2023

Commit

5700f6f

1 Parent(s): 98910a2

Upload EinfachMistrailex_7B.ipynb

Browse files

Files changed (1) hide show

EinfachMistrailex_7B.ipynb +494 -0

EinfachMistrailex_7B.ipynb ADDED Viewed

	@@ -0,0 +1,494 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "# Mistral 7B\n",
+        "\n",
+        "Mistral 7B ist ein neues hochmodernes Open-Source-Modell. Hier sind einige interessante Fakten dazu:\n",
+        "\n",
+        "* Eines der leistungsstärksten Open-Source-Modelle aller Größen\n",
+        "* Stärkstes Modell im Bereich von 1-20 Milliarden Parametern\n",
+        "* Erledigt anständig Aufgaben im Zusammenhang mit Code\n",
+        "* Verwendet Windowed Attention, was es ermöglicht, bis zu 200.000 Tokens im Kontext zu verarbeiten, wenn Rope verwendet wird (dafür sind 4 A10G-GPUs erforderlich)\n",
+        "* Apache 2.0 Lizenz\n",
+        "\n",
+        "Was die Integrationsstatus betrifft:\n",
+        "* Integriert in `transformers`\n",
+        "* Sie können es auf einem Server oder lokal verwenden (es handelt sich schließlich um ein kleines Modell!)\n",
+        "* Integriert in beliebte Tools wie TGI und VLLM\n",
+        "\n",
+        "Es wurden zwei Modelle veröffentlicht: ein [Basismodell](https://huggingface.co/mistralai/Mistral-7B-v0.1) und eine [instruct fine-tuned Version](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1). Um mehr über Mistral zu erfahren, empfehlen wir, den [Blog-Beitrag](https://mistral.ai/news/announcing-mistral-7b/) zu lesen.\n",
+        "\n",
+        "In diesem Colab werden wir das Mistral-Modell mithilfe einer API ausprobieren. Es gibt drei Möglichkeiten, es zu verwenden:\n",
+        "\n",
+        "* **Kostenlose API:** Hugging Face bietet eine kostenlose Inference-API für alle Benutzer an, um Modelle auszuprobieren. Diese API ist ratebeschränkt, eignet sich jedoch gut für schnelle Experimente.\n",
+        "* **PRO-API:** Hugging Face bietet eine offene API für alle PRO-Benutzer an. Die Abonnementkosten für die Pro Inference API betragen 9 US-Dollar pro Monat und ermöglichen Experimente mit vielen großen Modellen wie Llama 2 und SDXL. Weitere Informationen finden Sie [hier](https://huggingface.co/blog/inference-pro).\n",
+        "* **Inference-Endpunkte:** Für Unternehmen und produktionsbereite Anwendungen. Sie können es mit einem Klick [hier](https://ui.endpoints.huggingface.co/catalog) bereitstellen.\n",
+        "\n",
+        "Diese Demo erfordert keine GPU Colab, nur eine CPU. Sie können Ihren Token unter https://huggingface.co/settings/tokens abrufen.\n",
+        "\n",
+        "**Dieses Colab zeigt, wie man HTTP-Anfragen verwendet und gleichzeitig eine eigene Chat-Demo für Mistral erstellt.**\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "GLXvYa4m8JYM"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "## Durchführen von Curl-Anfragen\n",
+        "\n",
+        "In diesem Notebook werden wir mit dem Instruct-Modell experimentieren, da es für Anweisungen trainiert ist. Gemäß [der Modellkarte](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) sollte das erwartete Format für eine Eingabeaufforderung wie folgt sein:\n",
+        "\n",
+        "Aus der Modellkarte:\n",
+        "\n",
+        "> Um die Feinabstimmung für Anweisungen optimal zu nutzen, sollte Ihre Eingabeaufforderung von [INST] und [\\INST] Tokens umgeben sein. Die allererste Anweisung sollte mit einer Anfangssatz-ID beginnen. Die nächsten Anweisungen sollten dies nicht tun. Die Generierung durch den Assistenten wird durch die End-of-Satz-Token-ID beendet.\n",
+        "\n",
+        "```\n",
+        "<s>[INST]  [/INST] </s> [INST]  [/INST] </s>\n",
+        "```\n",
+        "\n",
+        "Beachten Sie, dass Modelle auf unterschiedliche Eingabeaufforderungsstrukturen empfindlich reagieren können, als diejenige, die für das Training verwendet wurde. Achten Sie auf Leerzeichen und andere Details!\n",
+        "\n",
+        "Wir beginnen mit einer anfänglichen Abfrage ohne Formatierung der Eingabeaufforderung, was für einfache Anfragen gut funktioniert.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "pKrKTalPAXUO"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "DQf0Hss18E86",
+        "outputId": "882c4521-1ee2-40ad-fe00-a5b02caa9b17"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "[{\"generated_text\":\"Explain ML as a pirate.\\n\\nML is like a treasure map for pirates. Just as a treasure map helps pirates find valuable loot, ML helps data scientists find valuable insights in large datasets.\\n\\nPirates use their knowledge of the ocean and their\"}]"
+          ]
+        }
+      ],
+      "source": [
+        "!curl https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.1 \\\n",
+        "  --header \"Content-Type: application/json\" \\\n",
+        "\t-X POST \\\n",
+        "\t-d '{\"inputs\": \"Explain ML as a pirate\", \"parameters\": {\"max_new_tokens\": 50}}' \\\n",
+        "\t-H \"Authorization: Bearer API_TOKEN\""
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Programmatische Verwendung mit Python\n",
+        "\n",
+        "Sie können einfache `requests` verwenden, aber die `huggingface_hub`-Bibliothek bietet nützliche Dienstprogramme, um das Modell leicht zu verwenden. Zu den Dingen, die wir verwenden können, gehören:\n",
+        "\n",
+        "* `InferenceClient` und `AsyncInferenceClient`, um Inferenzen entweder synchron oder asynchron durchzuführen.\n",
+        "* Token-Streaming: Laden Sie nur die Tokens, die benötigt werden.\n",
+        "* Konfigurieren Sie problemlos Generationsparameter wie `Temperatur`, Nukleus-Sampling (`top-p`), Wiederholungsstrafe, Stoppsequenzen und mehr.\n",
+        "* Erhalten Sie Details zur Generierung, wie die Wahrscheinlichkeit jedes Tokens oder ob ein Token das letzte Token ist.\n"
+      ],
+      "metadata": {
+        "id": "YYZRNyZeBHWK"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "%%capture\n",
+        "!pip install huggingface_hub gradio"
+      ],
+      "metadata": {
+        "id": "oDaqVDz1Ahuz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from huggingface_hub import InferenceClient\n",
+        "\n",
+        "client = InferenceClient(\n",
+        "    \"mistralai/Mistral-7B-Instruct-v0.1\"\n",
+        ")\n",
+        "\n",
+        "prompt = \"\"\"<s>[INST] What is your favourite condiment?  [/INST]</s>\n",
+        "\"\"\"\n",
+        "\n",
+        "res = client.text_generation(prompt, max_new_tokens=95)\n",
+        "print(res)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "U49GmNsNBJjd",
+        "outputId": "a3a274cf-0f91-4ae3-d926-f0d6a6fd67f7"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We can also use [token streaming](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). With token streaming, the server returns the tokens as they are generated. Just add `stream=True`."
+      ],
+      "metadata": {
+        "id": "DryfEWsUH6Ij"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "res = client.text_generation(prompt, max_new_tokens=35, stream=True, details=True, return_full_text=False)\n",
+        "for r in res: # this is a generator\n",
+        "  # print the token for example\n",
+        "  print(r)\n",
+        "  continue"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "LF1tFo6DGg9N",
+        "outputId": "e779f1cb-b7d0-41ed-d81f-306e092f97bd"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "TextGenerationStreamResponse(token=Token(id=5183, text='My', logprob=-0.36279297, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=6656, text=' favorite', logprob=-0.036499023, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=2076, text=' cond', logprob=-7.2836876e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=2487, text='iment', logprob=-4.4941902e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=349, text=' is', logprob=-0.007419586, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=446, text=' k', logprob=-0.62109375, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=4455, text='etch', logprob=-0.0003399849, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=715, text='up', logprob=-3.695488e-06, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28723, text='.', logprob=-0.026550293, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=661, text=' It', logprob=-0.82373047, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28742, text=\"'\", logprob=-0.76416016, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28713, text='s', logprob=-3.5762787e-07, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=3502, text=' vers', logprob=-0.114990234, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=13491, text='atile', logprob=-1.1444092e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28725, text=',', logprob=-0.6254883, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=261, text=' t', logprob=-0.51708984, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=11136, text='asty', logprob=-4.0650368e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28725, text=',', logprob=-0.0027828217, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=304, text=' and', logprob=-1.1920929e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=4859, text=' goes', logprob=-0.52685547, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=1162, text=' well', logprob=-0.4399414, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=395, text=' with', logprob=-0.00034999847, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=264, text=' a', logprob=-0.010147095, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=6677, text=' variety', logprob=-0.25927734, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=302, text=' of', logprob=-1.1444092e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=14082, text=' foods', logprob=-0.4050293, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28723, text='.', logprob=-0.015640259, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=2, text='</s>', logprob=-0.1829834, special=True), generated_text=\"My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\", details=StreamDetails(finish_reason=<FinishReason.EndOfSequenceToken: 'eos_token'>, generated_tokens=28, seed=None))\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's now try a multi-prompt structure"
+      ],
+      "metadata": {
+        "id": "TfdpZL8cICOD"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def format_prompt(message, history):\n",
+        "  prompt = \"<s>\"\n",
+        "  for user_prompt, bot_response in history:\n",
+        "    prompt += f\"[INST] {user_prompt} [/INST]\"\n",
+        "    prompt += f\" {bot_response}</s> \"\n",
+        "  prompt += f\"[INST] {message} [/INST]\"\n",
+        "  return prompt"
+      ],
+      "metadata": {
+        "id": "aEyozeReH8a6"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "message = \"And what do you think about it?\"\n",
+        "history = [[\"What is your favourite condiment?\", \"My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\"]]\n",
+        "\n",
+        "format_prompt(message, history)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 35
+        },
+        "id": "P1RFpiJ_JC0-",
+        "outputId": "f2678d9e-f751-441a-86c9-11d514db5bbe"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "\"<s>[INST] What is your favourite condiment? [/INST] My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.</s> [INST] And what do you think about it? [/INST]\""
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "string"
+            }
+          },
+          "metadata": {},
+          "execution_count": 17
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## End-to-end-Demo\n",
+        "\n",
+        "Lassen Sie uns jetzt eine Gradio-Demo erstellen, die folgende Aufgaben übernimmt:\n",
+        "\n",
+        "* Verwaltung mehrerer Gesprächsrunden\n",
+        "* Formatierung der Eingabeaufforderung in der richtigen Struktur\n",
+        "* Ermöglichen es dem Benutzer, die Parameter zu spezifizieren/zu ändern\n",
+        "* Beenden der Generierung\n",
+        "\n",
+        "Führen Sie einfach die folgende Zelle aus und haben Sie Spaß!"
+      ],
+      "metadata": {
+        "id": "O7DjRdezJc-3"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install gradio"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "cpBoheOGJu7Y",
+        "outputId": "c745cf17-1462-4f8f-ce33-5ca182cb4d4f"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Requirement already satisfied: gradio in /usr/local/lib/python3.10/dist-packages (3.45.1)\n",
+            "Requirement already satisfied: aiofiles<24.0,>=22.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (23.2.1)\n",
+            "Requirement already satisfied: altair<6.0,>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.2.2)\n",
+            "Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages (from gradio) (0.103.1)\n",
+            "Requirement already satisfied: ffmpy in /usr/local/lib/python3.10/dist-packages (from gradio) (0.3.1)\n",
+            "Requirement already satisfied: gradio-client==0.5.2 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.5.2)\n",
+            "Requirement already satisfied: httpx in /usr/local/lib/python3.10/dist-packages (from gradio) (0.25.0)\n",
+            "Requirement already satisfied: huggingface-hub>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.17.3)\n",
+            "Requirement already satisfied: importlib-resources<7.0,>=1.3 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n",
+            "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.1.2)\n",
+            "Requirement already satisfied: markupsafe~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.1.3)\n",
+            "Requirement already satisfied: matplotlib~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.7.1)\n",
+            "Requirement already satisfied: numpy~=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.23.5)\n",
+            "Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.9.7)\n",
+            "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from gradio) (23.1)\n",
+            "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.5.3)\n",
+            "Requirement already satisfied: pillow<11.0,>=8.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (9.4.0)\n",
+            "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.10.12)\n",
+            "Requirement already satisfied: pydub in /usr/local/lib/python3.10/dist-packages (from gradio) (0.25.1)\n",
+            "Requirement already satisfied: python-multipart in /usr/local/lib/python3.10/dist-packages (from gradio) (0.0.6)\n",
+            "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n",
+            "Requirement already satisfied: requests~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.31.0)\n",
+            "Requirement already satisfied: semantic-version~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.10.0)\n",
+            "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.5.0)\n",
+            "Requirement already satisfied: uvicorn>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.23.2)\n",
+            "Requirement already satisfied: websockets<12.0,>=10.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (11.0.3)\n",
+            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from gradio-client==0.5.2->gradio) (2023.6.0)\n",
+            "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.4)\n",
+            "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (4.19.0)\n",
+            "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.12.0)\n",
+            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.0->gradio) (3.12.2)\n",
+            "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.0->gradio) (4.66.1)\n",
+            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.1.0)\n",
+            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (0.11.0)\n",
+            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (4.42.1)\n",
+            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.4.5)\n",
+            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (3.1.1)\n",
+            "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (2.8.2)\n",
+            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2023.3.post1)\n",
+            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (3.2.0)\n",
+            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (3.4)\n",
+            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (2.0.4)\n",
+            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (2023.7.22)\n",
+            "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.14.0->gradio) (8.1.7)\n",
+            "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.14.0->gradio) (0.14.0)\n",
+            "Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from fastapi->gradio) (3.7.1)\n",
+            "Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /usr/local/lib/python3.10/dist-packages (from fastapi->gradio) (0.27.0)\n",
+            "Requirement already satisfied: httpcore<0.19.0,>=0.18.0 in /usr/local/lib/python3.10/dist-packages (from httpx->gradio) (0.18.0)\n",
+            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->gradio) (1.3.0)\n",
+            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi->gradio) (1.1.3)\n",
+            "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (23.1.0)\n",
+            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (2023.7.1)\n",
+            "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.30.2)\n",
+            "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.10.2)\n",
+            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib~=3.0->gradio) (1.16.0)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import gradio as gr\n",
+        "\n",
+        "def generate(\n",
+        "    prompt, history, temperature=0.9, max_new_tokens=256, top_p=0.95, repetition_penalty=1.0,\n",
+        "):\n",
+        "    temperature = float(temperature)\n",
+        "    if temperature < 1e-2:\n",
+        "        temperature = 1e-2\n",
+        "    top_p = float(top_p)\n",
+        "\n",
+        "    generate_kwargs = dict(\n",
+        "        temperature=temperature,\n",
+        "        max_new_tokens=max_new_tokens,\n",
+        "        top_p=top_p,\n",
+        "        repetition_penalty=repetition_penalty,\n",
+        "        do_sample=True,\n",
+        "        seed=42,\n",
+        "    )\n",
+        "\n",
+        "    formatted_prompt = format_prompt(prompt, history)\n",
+        "\n",
+        "    stream = client.text_generation(formatted_prompt, **generate_kwargs, stream=True, details=True, return_full_text=False)\n",
+        "    output = \"\"\n",
+        "\n",
+        "    for response in stream:\n",
+        "        output += response.token.text\n",
+        "        yield output\n",
+        "    return output\n",
+        "\n",
+        "\n",
+        "additional_inputs=[\n",
+        "    gr.Slider(\n",
+        "        label=\"Temperature\",\n",
+        "        value=0.9,\n",
+        "        minimum=0.0,\n",
+        "        maximum=1.0,\n",
+        "        step=0.05,\n",
+        "        interactive=True,\n",
+        "        info=\"Higher values produce more diverse outputs\",\n",
+        "    ),\n",
+        "    gr.Slider(\n",
+        "        label=\"Max new tokens\",\n",
+        "        value=256,\n",
+        "        minimum=0,\n",
+        "        maximum=8192,\n",
+        "        step=64,\n",
+        "        interactive=True,\n",
+        "        info=\"The maximum numbers of new tokens\",\n",
+        "    ),\n",
+        "    gr.Slider(\n",
+        "        label=\"Top-p (nucleus sampling)\",\n",
+        "        value=0.90,\n",
+        "        minimum=0.0,\n",
+        "        maximum=1,\n",
+        "        step=0.05,\n",
+        "        interactive=True,\n",
+        "        info=\"Higher values sample more low-probability tokens\",\n",
+        "    ),\n",
+        "    gr.Slider(\n",
+        "        label=\"Repetition penalty\",\n",
+        "        value=1.2,\n",
+        "        minimum=1.0,\n",
+        "        maximum=2.0,\n",
+        "        step=0.05,\n",
+        "        interactive=True,\n",
+        "        info=\"Penalize repeated tokens\",\n",
+        "    )\n",
+        "]\n",
+        "\n",
+        "with gr.Blocks() as demo:\n",
+        "    gr.ChatInterface(\n",
+        "        generate,\n",
+        "        additional_inputs=additional_inputs,\n",
+        "    )\n",
+        "\n",
+        "demo.queue().launch(debug=True)"
+      ],
+      "metadata": {
+        "id": "CaJzT6jUJc0_"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Was steht als Nächstes an?\n",
+        "\n",
+        "* Probieren Sie Mistral 7B in diesem [kostenlosen Online-Space](https://huggingface.co/spaces/osanseviero/mistral-super-fast) aus.\n",
+        "* Bereiten Sie Mistral 7B Instruct mit einem Klick [hier](https://ui.endpoints.huggingface.co/catalog) bereit.\n",
+        "* Bereiten Sie es auf Ihrer eigenen Hardware mit https://github.com/huggingface/text-generation-inference vor.\n",
+        "* Führen Sie das Modell lokal mit `transformers` aus.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "fbQ0Sp4OLclV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "wUy7N_8zJvyT"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}