{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#

Introdução ao uso do vLLM

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Senior Data Scientist.: Dr. Eddy Giusepe Chirinos Isidro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Link de estudo:\n", "\n", "* [vllm-project](https://github.com/vllm-project/vllm?tab=readme-ov-file)\n", "\n", "* [vllm: quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`vLLM` é uma biblioteca rápida e fácil de usar para inferência e serviço de `LLM`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](https://pypi-camo.freetls.fastly.net/78b171d927e29d3adc6067494d26adffc78c8532/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f766c6c6d2d70726f6a6563742f766c6c6d2f6d61696e2f646f63732f736f757263652f6173736574732f6c6f676f732f766c6c6d2d6c6f676f2d746578742d6c696768742e706e67)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Você deve executar o seguinte comando no terminal (deixa ele rodando como você faz no `ollama`):\n", "\n", "```bash\n", "vllm serve Qwen/Qwen2.5-1.5B-Instruct \n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from openai import OpenAI\n", "\n", "# Modifique o OpenAI's API key e API base para usar o servidor API do vLLM:\n", "openai_api_key = \"EMPTY\"\n", "openai_api_base = \"http://localhost:8000/v1\"\n", "\n", "client = OpenAI(\n", " api_key=openai_api_key,\n", " base_url=openai_api_base,\n", ")\n", "completion = client.completions.create(model=\"Qwen/Qwen2.5-1.5B-Instruct\",\n", " prompt=\"San Francisco é uma\")\n", "\n", "print(\"Completion result:\", completion.choices[0].text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from openai import OpenAI\n", "\n", "# Modifique o OpenAI's API key e API base para usar o servidor API do vLLM:\n", "openai_api_key = \"EMPTY\"\n", "openai_api_base = \"http://localhost:8000/v1\"\n", "\n", "client = OpenAI(\n", " api_key=openai_api_key,\n", " base_url=openai_api_base,\n", ")\n", "\n", "chat_response = client.chat.completions.create(model=\"Qwen/Qwen2.5-1.5B-Instruct\",\n", " messages=[{\"role\": \"system\", \"content\": \"Você é um assistente útil.\"},\n", " {\"role\": \"user\", \"content\": \"Conta para mim uma piada.\"},\n", " ]\n", " )\n", "\n", "print(\"Chat response:\", chat_response.choices[0].message.content)\n" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 2 }