EddyGiusepe commited on
Commit
bbb2a4d
·
1 Parent(s): ebf7352

Usando vLLM com o modelo Qwen

Browse files
Files changed (1) hide show
  1. intro-vllm.ipynb +123 -0
intro-vllm.ipynb ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# <h1 align=\"center\"><font color=\"red\">Introdução ao uso do vLLM</font></h1>"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "<font color=\"pink\">Senior Data Scientist.: Dr. Eddy Giusepe Chirinos Isidro</font>"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {},
20
+ "source": [
21
+ "Link de estudo:\n",
22
+ "\n",
23
+ "* [vllm-project](https://github.com/vllm-project/vllm?tab=readme-ov-file)\n",
24
+ "\n",
25
+ "* [vllm: quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)"
26
+ ]
27
+ },
28
+ {
29
+ "cell_type": "markdown",
30
+ "metadata": {},
31
+ "source": [
32
+ "<font color=\"orange\">`vLLM` é uma biblioteca rápida e fácil de usar para inferência e serviço de `LLM`.</font>"
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "markdown",
37
+ "metadata": {},
38
+ "source": [
39
+ "![](https://pypi-camo.freetls.fastly.net/78b171d927e29d3adc6067494d26adffc78c8532/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f766c6c6d2d70726f6a6563742f766c6c6d2f6d61696e2f646f63732f736f757263652f6173736574732f6c6f676f732f766c6c6d2d6c6f676f2d746578742d6c696768742e706e67)"
40
+ ]
41
+ },
42
+ {
43
+ "cell_type": "markdown",
44
+ "metadata": {},
45
+ "source": [
46
+ "<font color=\"orange\">Você deve executar o seguinte comando no terminal (deixa ele rodando como você faz no `ollama`):</font>\n",
47
+ "\n",
48
+ "```bash\n",
49
+ "vllm serve Qwen/Qwen2.5-1.5B-Instruct \n",
50
+ "```"
51
+ ]
52
+ },
53
+ {
54
+ "cell_type": "code",
55
+ "execution_count": null,
56
+ "metadata": {},
57
+ "outputs": [],
58
+ "source": [
59
+ "from openai import OpenAI\n",
60
+ "\n",
61
+ "# Modifique o OpenAI's API key e API base para usar o servidor API do vLLM:\n",
62
+ "openai_api_key = \"EMPTY\"\n",
63
+ "openai_api_base = \"http://localhost:8000/v1\"\n",
64
+ "\n",
65
+ "client = OpenAI(\n",
66
+ " api_key=openai_api_key,\n",
67
+ " base_url=openai_api_base,\n",
68
+ ")\n",
69
+ "completion = client.completions.create(model=\"Qwen/Qwen2.5-1.5B-Instruct\",\n",
70
+ " prompt=\"San Francisco é uma\")\n",
71
+ "\n",
72
+ "print(\"Completion result:\", completion.choices[0].text)"
73
+ ]
74
+ },
75
+ {
76
+ "cell_type": "code",
77
+ "execution_count": null,
78
+ "metadata": {},
79
+ "outputs": [],
80
+ "source": [
81
+ "from openai import OpenAI\n",
82
+ "\n",
83
+ "# Modifique o OpenAI's API key e API base para usar o servidor API do vLLM:\n",
84
+ "openai_api_key = \"EMPTY\"\n",
85
+ "openai_api_base = \"http://localhost:8000/v1\"\n",
86
+ "\n",
87
+ "client = OpenAI(\n",
88
+ " api_key=openai_api_key,\n",
89
+ " base_url=openai_api_base,\n",
90
+ ")\n",
91
+ "\n",
92
+ "chat_response = client.chat.completions.create(model=\"Qwen/Qwen2.5-1.5B-Instruct\",\n",
93
+ " messages=[{\"role\": \"system\", \"content\": \"Você é um assistente útil.\"},\n",
94
+ " {\"role\": \"user\", \"content\": \"Conta para mim uma piada.\"},\n",
95
+ " ]\n",
96
+ " )\n",
97
+ "\n",
98
+ "print(\"Chat response:\", chat_response.choices[0].message.content)\n"
99
+ ]
100
+ }
101
+ ],
102
+ "metadata": {
103
+ "kernelspec": {
104
+ "display_name": ".venv",
105
+ "language": "python",
106
+ "name": "python3"
107
+ },
108
+ "language_info": {
109
+ "codemirror_mode": {
110
+ "name": "ipython",
111
+ "version": 3
112
+ },
113
+ "file_extension": ".py",
114
+ "mimetype": "text/x-python",
115
+ "name": "python",
116
+ "nbconvert_exporter": "python",
117
+ "pygments_lexer": "ipython3",
118
+ "version": "3.12.8"
119
+ }
120
+ },
121
+ "nbformat": 4,
122
+ "nbformat_minor": 2
123
+ }