htahir1 commited on
Commit
551af5c
β€’
1 Parent(s): 17cac4f

Upload folder using huggingface_hub

Browse files
_assets/feature_engineering_pipeline.png ADDED
_assets/inference_pipeline.png ADDED
_assets/inference_pipeline.png:Zone.Identifier ADDED
File without changes
_assets/pipeline_overview.png ADDED
_assets/training_pipeline.png ADDED
requirements.txt CHANGED
@@ -3,4 +3,7 @@ notebook
3
  scikit-learn<1.3
4
  s3fs>2022.3.0,<=2023.4.0
5
  boto3<=1.26.76
6
- aws-profile-manager
 
 
 
 
3
  scikit-learn<1.3
4
  s3fs>2022.3.0,<=2023.4.0
5
  boto3<=1.26.76
6
+ aws-profile-manager
7
+ mlflow>=2.1.1,<=2.9.2
8
+ mlserver>=1.3.3
9
+ mlserver-mlflow>=1.3.3
run_deploy.ipynb ADDED
@@ -0,0 +1,1122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "63ab391a",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Intro to MLOps using ZenML\n",
9
+ "\n",
10
+ "## 🌍 Overview\n",
11
+ "\n",
12
+ "This repository is a minimalistic MLOps project intended as a starting point to learn how to put ML workflows in production. It features: \n",
13
+ "\n",
14
+ "- A feature engineering pipeline that loads data and prepares it for training.\n",
15
+ "- A training pipeline that loads the preprocessed dataset and trains a model.\n",
16
+ "- A batch inference pipeline that runs predictions on the trained model with new data.\n",
17
+ "\n",
18
+ "Follow along this notebook to understand how you can use ZenML to productionalize your ML workflows!\n",
19
+ "\n",
20
+ "<img src=\"_assets/pipeline_overview.png\" width=\"50%\" alt=\"Pipelines Overview\">"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "markdown",
25
+ "id": "8f466b16",
26
+ "metadata": {},
27
+ "source": [
28
+ "## Run on Colab\n",
29
+ "\n",
30
+ "You can use Google Colab to see ZenML in action, no signup / installation\n",
31
+ "required!\n",
32
+ "\n",
33
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](\n",
34
+ "https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb)"
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "markdown",
39
+ "id": "66b2977c",
40
+ "metadata": {},
41
+ "source": [
42
+ "# πŸ‘Ά Step 0. Install Requirements\n",
43
+ "\n",
44
+ "Let's install ZenML to get started. First we'll install the latest version of\n",
45
+ "ZenML as well as the `sklearn` integration of ZenML:"
46
+ ]
47
+ },
48
+ {
49
+ "cell_type": "code",
50
+ "execution_count": null,
51
+ "id": "ce2f40eb",
52
+ "metadata": {},
53
+ "outputs": [],
54
+ "source": [
55
+ "!pip install \"zenml[server]\""
56
+ ]
57
+ },
58
+ {
59
+ "cell_type": "code",
60
+ "execution_count": null,
61
+ "id": "5aad397e",
62
+ "metadata": {},
63
+ "outputs": [],
64
+ "source": [
65
+ "from zenml.environment import Environment\n",
66
+ "\n",
67
+ "if Environment.in_google_colab():\n",
68
+ " # Install Cloudflare Tunnel binary\n",
69
+ " !wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb && dpkg -i cloudflared-linux-amd64.deb\n",
70
+ "\n",
71
+ " # Pull required modules from this example\n",
72
+ " !git clone -b main https://github.com/zenml-io/zenml\n",
73
+ " !cp -r zenml/examples/quickstart/* .\n",
74
+ " !rm -rf zenml\n"
75
+ ]
76
+ },
77
+ {
78
+ "cell_type": "code",
79
+ "execution_count": null,
80
+ "id": "f76f562e",
81
+ "metadata": {},
82
+ "outputs": [],
83
+ "source": [
84
+ "!zenml integration install sklearn -y\n",
85
+ "\n",
86
+ "import IPython\n",
87
+ "IPython.Application.instance().kernel.do_shutdown(restart=True)"
88
+ ]
89
+ },
90
+ {
91
+ "cell_type": "markdown",
92
+ "id": "3b044374",
93
+ "metadata": {},
94
+ "source": [
95
+ "Please wait for the installation to complete before running subsequent cells. At\n",
96
+ "the end of the installation, the notebook kernel will automatically restart."
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "markdown",
101
+ "id": "e3955ff1",
102
+ "metadata": {},
103
+ "source": [
104
+ "Optional: If you are using [ZenML Cloud](https://zenml.io/cloud), execute the following cell with your tenant URL. Otherwise ignore."
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "code",
109
+ "execution_count": null,
110
+ "id": "e2587315",
111
+ "metadata": {},
112
+ "outputs": [],
113
+ "source": [
114
+ "zenml_server_url = \"PLEASE_UPDATE_ME\" # in the form \"https://URL_TO_SERVER\"\n",
115
+ "\n",
116
+ "!zenml connect --url $zenml_server_url"
117
+ ]
118
+ },
119
+ {
120
+ "cell_type": "code",
121
+ "execution_count": null,
122
+ "id": "081d5616",
123
+ "metadata": {},
124
+ "outputs": [],
125
+ "source": [
126
+ "# Initialize ZenML and set the default stack\n",
127
+ "!zenml init\n",
128
+ "\n",
129
+ "!zenml stack set default"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "code",
134
+ "execution_count": null,
135
+ "id": "79f775f2",
136
+ "metadata": {},
137
+ "outputs": [],
138
+ "source": [
139
+ "# Do the imports at the top\n",
140
+ "from typing_extensions import Annotated\n",
141
+ "from sklearn.datasets import load_breast_cancer\n",
142
+ "\n",
143
+ "import random\n",
144
+ "import pandas as pd\n",
145
+ "from zenml import step, ExternalArtifact, pipeline, ModelVersion, get_step_context\n",
146
+ "from zenml.client import Client\n",
147
+ "from zenml.logger import get_logger\n",
148
+ "from uuid import UUID\n",
149
+ "\n",
150
+ "from typing import Optional, List\n",
151
+ "\n",
152
+ "from zenml import pipeline\n",
153
+ "\n",
154
+ "from steps import (\n",
155
+ " data_loader,\n",
156
+ " data_preprocessor,\n",
157
+ " data_splitter,\n",
158
+ " model_evaluator,\n",
159
+ " inference_preprocessor\n",
160
+ ")\n",
161
+ "\n",
162
+ "from zenml.logger import get_logger\n",
163
+ "\n",
164
+ "logger = get_logger(__name__)\n",
165
+ "\n",
166
+ "# Initialize the ZenML client to fetch objects from the ZenML Server\n",
167
+ "client = Client()"
168
+ ]
169
+ },
170
+ {
171
+ "cell_type": "markdown",
172
+ "id": "35e48460",
173
+ "metadata": {},
174
+ "source": [
175
+ "## πŸ₯‡ Step 1: Load your data and execute feature engineering\n",
176
+ "\n",
177
+ "We'll start off by importing our data. In this quickstart we'll be working with\n",
178
+ "[the Breast Cancer](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic) dataset\n",
179
+ "which is publicly available on the UCI Machine Learning Repository. The task is a classification\n",
180
+ "problem, to predict whether a patient is diagnosed with breast cancer or not.\n",
181
+ "\n",
182
+ "When you're getting started with a machine learning problem you'll want to do\n",
183
+ "something similar to this: import your data and get it in the right shape for\n",
184
+ "your training. ZenML mostly gets out of your way when you're writing your Python\n",
185
+ "code, as you'll see from the following cell.\n",
186
+ "\n",
187
+ "<img src=\".assets/feature_engineering_pipeline.png\" width=\"50%\" alt=\"Feature engineering pipeline\" />"
188
+ ]
189
+ },
190
+ {
191
+ "cell_type": "code",
192
+ "execution_count": null,
193
+ "id": "3cd974d1",
194
+ "metadata": {},
195
+ "outputs": [],
196
+ "source": [
197
+ "@step\n",
198
+ "def data_loader_simplified(\n",
199
+ " random_state: int, is_inference: bool = False, target: str = \"target\"\n",
200
+ ") -> Annotated[pd.DataFrame, \"dataset\"]: # We name the dataset \n",
201
+ " \"\"\"Dataset reader step.\"\"\"\n",
202
+ " dataset = load_breast_cancer(as_frame=True)\n",
203
+ " inference_size = int(len(dataset.target) * 0.05)\n",
204
+ " dataset: pd.DataFrame = dataset.frame\n",
205
+ " inference_subset = dataset.sample(inference_size, random_state=random_state)\n",
206
+ " if is_inference:\n",
207
+ " dataset = inference_subset\n",
208
+ " dataset.drop(columns=target, inplace=True)\n",
209
+ " else:\n",
210
+ " dataset.drop(inference_subset.index, inplace=True)\n",
211
+ " dataset.reset_index(drop=True, inplace=True)\n",
212
+ " logger.info(f\"Dataset with {len(dataset)} records loaded!\")\n",
213
+ " return dataset\n"
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "markdown",
218
+ "id": "1e8ba4c6",
219
+ "metadata": {},
220
+ "source": [
221
+ "The whole function is decorated with the `@step` decorator, which\n",
222
+ "tells ZenML to track this function as a step in the pipeline. This means that\n",
223
+ "ZenML will automatically version, track, and cache the data that is produced by\n",
224
+ "this function as an `artifact`. This is a very powerful feature, as it means that you can\n",
225
+ "reproduce your data at any point in the future, even if the original data source\n",
226
+ "changes or disappears. \n",
227
+ "\n",
228
+ "Note the use of the `typing` module's `Annotated` type hint in the output of the\n",
229
+ "step. We're using this to give a name to the output of the step, which will make\n",
230
+ "it possible to access it via a keyword later on.\n",
231
+ "\n",
232
+ "You'll also notice that we have included type hints for the outputs\n",
233
+ "to the function. These are not only useful for anyone reading your code, but\n",
234
+ "help ZenML process your data in a way appropriate to the specific data types."
235
+ ]
236
+ },
237
+ {
238
+ "cell_type": "markdown",
239
+ "id": "b6286b67",
240
+ "metadata": {},
241
+ "source": [
242
+ "ZenML is built in a way that allows you to experiment with your data and build\n",
243
+ "your pipelines as you work, so if you want to call this function to see how it\n",
244
+ "works, you can just call it directly. Here we take a look at the first few rows\n",
245
+ "of your training dataset."
246
+ ]
247
+ },
248
+ {
249
+ "cell_type": "code",
250
+ "execution_count": null,
251
+ "id": "d838e2ea",
252
+ "metadata": {},
253
+ "outputs": [],
254
+ "source": [
255
+ "df = data_loader_simplified(random_state=42)\n",
256
+ "df.head()"
257
+ ]
258
+ },
259
+ {
260
+ "cell_type": "markdown",
261
+ "id": "28c05291",
262
+ "metadata": {},
263
+ "source": [
264
+ "Everything looks as we'd expect and the values are all in the right format πŸ₯³.\n",
265
+ "\n",
266
+ "We're now at the point where can bring this step (and some others) together into a single\n",
267
+ "pipeline, the top-level organising entity for code in ZenML. Creating such a pipeline is\n",
268
+ "as simple as adding a `@pipeline` decorator to a function. This specific\n",
269
+ "pipeline doesn't return a value, but that option is available to you if you need."
270
+ ]
271
+ },
272
+ {
273
+ "cell_type": "code",
274
+ "execution_count": null,
275
+ "id": "b50a9537",
276
+ "metadata": {},
277
+ "outputs": [],
278
+ "source": [
279
+ "@pipeline\n",
280
+ "def feature_engineering(\n",
281
+ " test_size: float = 0.3,\n",
282
+ " drop_na: Optional[bool] = None,\n",
283
+ " normalize: Optional[bool] = None,\n",
284
+ " drop_columns: Optional[List[str]] = None,\n",
285
+ " target: Optional[str] = \"target\",\n",
286
+ " random_state: int = 17\n",
287
+ "):\n",
288
+ " \"\"\"Feature engineering pipeline.\"\"\"\n",
289
+ " # Link all the steps together by calling them and passing the output\n",
290
+ " # of one step as the input of the next step.\n",
291
+ " raw_data = data_loader(random_state=random_state, target=target)\n",
292
+ " dataset_trn, dataset_tst = data_splitter(\n",
293
+ " dataset=raw_data,\n",
294
+ " test_size=test_size,\n",
295
+ " )\n",
296
+ " dataset_trn, dataset_tst, _ = data_preprocessor(\n",
297
+ " dataset_trn=dataset_trn,\n",
298
+ " dataset_tst=dataset_tst,\n",
299
+ " drop_na=drop_na,\n",
300
+ " normalize=normalize,\n",
301
+ " drop_columns=drop_columns,\n",
302
+ " target=target,\n",
303
+ " random_state=random_state,\n",
304
+ " )"
305
+ ]
306
+ },
307
+ {
308
+ "cell_type": "markdown",
309
+ "id": "7cd73c23",
310
+ "metadata": {},
311
+ "source": [
312
+ "We're ready to run the pipeline now, which we can do just as with the step - by calling the\n",
313
+ "pipeline function itself:"
314
+ ]
315
+ },
316
+ {
317
+ "cell_type": "code",
318
+ "execution_count": null,
319
+ "id": "1e0aa9af",
320
+ "metadata": {},
321
+ "outputs": [],
322
+ "source": [
323
+ "feature_engineering()"
324
+ ]
325
+ },
326
+ {
327
+ "cell_type": "markdown",
328
+ "id": "1785c303",
329
+ "metadata": {},
330
+ "source": [
331
+ "Let's run this again with a slightly different test size, to create more datasets:"
332
+ ]
333
+ },
334
+ {
335
+ "cell_type": "code",
336
+ "execution_count": null,
337
+ "id": "658c0570-2607-4b97-a72d-d45c92633e48",
338
+ "metadata": {},
339
+ "outputs": [],
340
+ "source": [
341
+ "feature_engineering(test_size=0.25)"
342
+ ]
343
+ },
344
+ {
345
+ "cell_type": "markdown",
346
+ "id": "64bb7206",
347
+ "metadata": {},
348
+ "source": [
349
+ "Notice the second time around, the data loader step was **cached**, while the rest of the pipeline was rerun. \n",
350
+ "This is because ZenML automatically determined that nothing had changed in the data loader step, \n",
351
+ "so it didn't need to rerun it."
352
+ ]
353
+ },
354
+ {
355
+ "cell_type": "markdown",
356
+ "id": "5bc6849d-31ac-4c08-9ca2-cf7f5f35ccbf",
357
+ "metadata": {},
358
+ "source": [
359
+ "Let's run this again with a slightly different test size and random state, to disable the cache and to create more datasets:"
360
+ ]
361
+ },
362
+ {
363
+ "cell_type": "code",
364
+ "execution_count": null,
365
+ "id": "1e1d8546",
366
+ "metadata": {},
367
+ "outputs": [],
368
+ "source": [
369
+ "feature_engineering(test_size=0.25, random_state=104)"
370
+ ]
371
+ },
372
+ {
373
+ "cell_type": "markdown",
374
+ "id": "6c42078a",
375
+ "metadata": {},
376
+ "source": [
377
+ "At this point you might be interested to view your pipeline runs in the ZenML\n",
378
+ "Dashboard. In case you are not using a hosted instance of ZenML, you can spin this up by executing the next cell. This will start a\n",
379
+ "server which you can access by clicking on the link that appears in the output\n",
380
+ "of the cell.\n",
381
+ "\n",
382
+ "Log into the Dashboard using default credentials (username 'default' and\n",
383
+ "password left blank). From there you can inspect the pipeline or the specific\n",
384
+ "pipeline run.\n"
385
+ ]
386
+ },
387
+ {
388
+ "cell_type": "code",
389
+ "execution_count": null,
390
+ "id": "8cd3cc8c",
391
+ "metadata": {},
392
+ "outputs": [],
393
+ "source": [
394
+ "from zenml.environment import Environment\n",
395
+ "from zenml.zen_stores.rest_zen_store import RestZenStore\n",
396
+ "\n",
397
+ "\n",
398
+ "if not isinstance(client.zen_store, RestZenStore):\n",
399
+ " # Only spin up a local Dashboard in case you aren't already connected to a remote server\n",
400
+ " if Environment.in_google_colab():\n",
401
+ " # run ZenML through a cloudflare tunnel to get a public endpoint\n",
402
+ " !zenml up --port 8237 & cloudflared tunnel --url http://localhost:8237\n",
403
+ " else:\n",
404
+ " !zenml up"
405
+ ]
406
+ },
407
+ {
408
+ "cell_type": "markdown",
409
+ "id": "e8471f93",
410
+ "metadata": {},
411
+ "source": [
412
+ "We can also fetch the pipeline from the server and view the results directly in the notebook:"
413
+ ]
414
+ },
415
+ {
416
+ "cell_type": "code",
417
+ "execution_count": null,
418
+ "id": "f208b200",
419
+ "metadata": {},
420
+ "outputs": [],
421
+ "source": [
422
+ "client = Client()\n",
423
+ "run = client.get_pipeline(\"feature_engineering\").last_run\n",
424
+ "print(run.name)"
425
+ ]
426
+ },
427
+ {
428
+ "cell_type": "markdown",
429
+ "id": "a037f09d",
430
+ "metadata": {},
431
+ "source": [
432
+ "We can also see the data artifacts that were produced by the last step of the pipeline:"
433
+ ]
434
+ },
435
+ {
436
+ "cell_type": "code",
437
+ "execution_count": null,
438
+ "id": "34283e89",
439
+ "metadata": {},
440
+ "outputs": [],
441
+ "source": [
442
+ "run.steps[\"data_preprocessor\"].outputs"
443
+ ]
444
+ },
445
+ {
446
+ "cell_type": "code",
447
+ "execution_count": null,
448
+ "id": "bceb0312",
449
+ "metadata": {},
450
+ "outputs": [],
451
+ "source": [
452
+ "# Read one of the datasets. This is the one with a 0.25 test split\n",
453
+ "run.steps[\"data_preprocessor\"].outputs[\"dataset_trn\"].load()"
454
+ ]
455
+ },
456
+ {
457
+ "cell_type": "markdown",
458
+ "id": "26d26436",
459
+ "metadata": {},
460
+ "source": [
461
+ "We can also get the artifacts directly. Each time you create a new pipeline run, a new `artifact version` is created.\n",
462
+ "\n",
463
+ "You can fetch these artifact and their versions using the `client`: "
464
+ ]
465
+ },
466
+ {
467
+ "cell_type": "code",
468
+ "execution_count": null,
469
+ "id": "c8f90647",
470
+ "metadata": {},
471
+ "outputs": [],
472
+ "source": [
473
+ "# Get artifact version from our run\n",
474
+ "dataset_trn_artifact_version_via_run = run.steps[\"data_preprocessor\"].outputs[\"dataset_trn\"] \n",
475
+ "\n",
476
+ "# Get latest version from client directly\n",
477
+ "dataset_trn_artifact_version = client.get_artifact_version(\"dataset_trn\")\n",
478
+ "\n",
479
+ "# This should be true if our run is the latest run and no artifact has been produced\n",
480
+ "# in the intervening time\n",
481
+ "dataset_trn_artifact_version_via_run.id == dataset_trn_artifact_version.id"
482
+ ]
483
+ },
484
+ {
485
+ "cell_type": "code",
486
+ "execution_count": null,
487
+ "id": "3f9d3dfd",
488
+ "metadata": {},
489
+ "outputs": [],
490
+ "source": [
491
+ "# Fetch the rest of the artifacts\n",
492
+ "dataset_tst_artifact_version = client.get_artifact_version(\"dataset_tst\")\n",
493
+ "preprocessing_pipeline_artifact_version = client.get_artifact_version(\"preprocess_pipeline\")"
494
+ ]
495
+ },
496
+ {
497
+ "cell_type": "markdown",
498
+ "id": "7a7d1b04",
499
+ "metadata": {},
500
+ "source": [
501
+ "If you started with a fresh install, then you would have two versions corresponding\n",
502
+ "to the two pipelines that we ran above. We can even load a artifact version in memory: "
503
+ ]
504
+ },
505
+ {
506
+ "cell_type": "code",
507
+ "execution_count": null,
508
+ "id": "c82aca75",
509
+ "metadata": {},
510
+ "outputs": [],
511
+ "source": [
512
+ "# Load an artifact to verify you can fetch it\n",
513
+ "dataset_trn_artifact_version.load()"
514
+ ]
515
+ },
516
+ {
517
+ "cell_type": "markdown",
518
+ "id": "5963509e",
519
+ "metadata": {},
520
+ "source": [
521
+ "We'll use these artifacts from above in our next pipeline"
522
+ ]
523
+ },
524
+ {
525
+ "cell_type": "markdown",
526
+ "id": "8c28b474",
527
+ "metadata": {},
528
+ "source": [
529
+ "# ⌚ Step 2: Training pipeline"
530
+ ]
531
+ },
532
+ {
533
+ "cell_type": "markdown",
534
+ "id": "87909827",
535
+ "metadata": {},
536
+ "source": [
537
+ "Now that we have our data it makes sense to train some models to get a sense of\n",
538
+ "how difficult the task is. The Breast Cancer dataset is sufficiently large and complex \n",
539
+ "that it's unlikely we'll be able to train a model that behaves perfectly since the problem \n",
540
+ "is inherently complex, but we can get a sense of what a reasonable baseline looks like.\n",
541
+ "\n",
542
+ "We'll start with two simple models, a SGD Classifier and a Random Forest\n",
543
+ "Classifier, both batteries-included from `sklearn`. We'll train them both on the\n",
544
+ "same data and then compare their performance.\n",
545
+ "\n",
546
+ "<img src=\".assets/training_pipeline.png\" width=\"50%\" alt=\"Training pipeline\">"
547
+ ]
548
+ },
549
+ {
550
+ "cell_type": "code",
551
+ "execution_count": null,
552
+ "id": "fccf1bd9",
553
+ "metadata": {},
554
+ "outputs": [],
555
+ "source": [
556
+ "import pandas as pd\n",
557
+ "from sklearn.base import ClassifierMixin\n",
558
+ "from sklearn.ensemble import RandomForestClassifier\n",
559
+ "from sklearn.linear_model import SGDClassifier\n",
560
+ "from typing_extensions import Annotated\n",
561
+ "from zenml import ArtifactConfig, step\n",
562
+ "from zenml.logger import get_logger\n",
563
+ "\n",
564
+ "logger = get_logger(__name__)\n",
565
+ "\n",
566
+ "\n",
567
+ "@step\n",
568
+ "def model_trainer(\n",
569
+ " dataset_trn: pd.DataFrame,\n",
570
+ " model_type: str = \"sgd\",\n",
571
+ ") -> Annotated[ClassifierMixin, ArtifactConfig(name=\"sklearn_classifier\", is_model_artifact=True)]:\n",
572
+ " \"\"\"Configure and train a model on the training dataset.\"\"\"\n",
573
+ " target = \"target\"\n",
574
+ " if model_type == \"sgd\":\n",
575
+ " model = SGDClassifier()\n",
576
+ " elif model_type == \"rf\":\n",
577
+ " model = RandomForestClassifier()\n",
578
+ " else:\n",
579
+ " raise ValueError(f\"Unknown model type {model_type}\") \n",
580
+ "\n",
581
+ " logger.info(f\"Training model {model}...\")\n",
582
+ "\n",
583
+ " model.fit(\n",
584
+ " dataset_trn.drop(columns=[target]),\n",
585
+ " dataset_trn[target],\n",
586
+ " )\n",
587
+ " return model\n"
588
+ ]
589
+ },
590
+ {
591
+ "cell_type": "markdown",
592
+ "id": "73a00008",
593
+ "metadata": {},
594
+ "source": [
595
+ "Our two training steps both return different kinds of `sklearn` classifier\n",
596
+ "models, so we use the generic `ClassifierMixin` type hint for the return type."
597
+ ]
598
+ },
599
+ {
600
+ "cell_type": "markdown",
601
+ "id": "a5f22174",
602
+ "metadata": {},
603
+ "source": [
604
+ "ZenML allows you to load any version of any dataset that is tracked by the framework\n",
605
+ "directly into a pipeline using the `ExternalArtifact` interface. This is very convenient\n",
606
+ "in this case, as we'd like to send our preprocessed dataset from the older pipeline directly\n",
607
+ "into the training pipeline."
608
+ ]
609
+ },
610
+ {
611
+ "cell_type": "code",
612
+ "execution_count": null,
613
+ "id": "1aa98f2f",
614
+ "metadata": {},
615
+ "outputs": [],
616
+ "source": [
617
+ "@pipeline\n",
618
+ "def training(\n",
619
+ " train_dataset_id: Optional[UUID] = None,\n",
620
+ " test_dataset_id: Optional[UUID] = None,\n",
621
+ " model_type: str = \"sgd\",\n",
622
+ " min_train_accuracy: float = 0.0,\n",
623
+ " min_test_accuracy: float = 0.0,\n",
624
+ "):\n",
625
+ " \"\"\"Model training pipeline.\"\"\" \n",
626
+ " if train_dataset_id is None or test_dataset_id is None:\n",
627
+ " # If we dont pass the IDs, this will run the feature engineering pipeline \n",
628
+ " dataset_trn, dataset_tst = feature_engineering()\n",
629
+ " else:\n",
630
+ " # Load the datasets from an older pipeline\n",
631
+ " dataset_trn = ExternalArtifact(id=train_dataset_id)\n",
632
+ " dataset_tst = ExternalArtifact(id=test_dataset_id) \n",
633
+ "\n",
634
+ " trained_model = model_trainer(\n",
635
+ " dataset_trn=dataset_trn,\n",
636
+ " model_type=model_type,\n",
637
+ " )\n",
638
+ "\n",
639
+ " model_evaluator(\n",
640
+ " model=trained_model,\n",
641
+ " dataset_trn=dataset_trn,\n",
642
+ " dataset_tst=dataset_tst,\n",
643
+ " min_train_accuracy=min_train_accuracy,\n",
644
+ " min_test_accuracy=min_test_accuracy,\n",
645
+ " )"
646
+ ]
647
+ },
648
+ {
649
+ "cell_type": "markdown",
650
+ "id": "88b70fd3",
651
+ "metadata": {},
652
+ "source": [
653
+ "The end goal of this quick baseline evaluation is to understand which of the two\n",
654
+ "models performs better. We'll use the `evaluator` step to compare the two\n",
655
+ "models. This step takes in the model from the trainer step, and computes its score\n",
656
+ "over the testing set."
657
+ ]
658
+ },
659
+ {
660
+ "cell_type": "code",
661
+ "execution_count": null,
662
+ "id": "c64885ac",
663
+ "metadata": {},
664
+ "outputs": [],
665
+ "source": [
666
+ "# Use a random forest model with the chosen datasets.\n",
667
+ "# We need to pass the ID's of the datasets into the function\n",
668
+ "training(\n",
669
+ " model_type=\"rf\",\n",
670
+ " train_dataset_id=dataset_trn_artifact_version.id,\n",
671
+ " test_dataset_id=dataset_tst_artifact_version.id\n",
672
+ ")\n",
673
+ "\n",
674
+ "rf_run = client.get_pipeline(\"training\").last_run"
675
+ ]
676
+ },
677
+ {
678
+ "cell_type": "code",
679
+ "execution_count": null,
680
+ "id": "4300c82f",
681
+ "metadata": {},
682
+ "outputs": [],
683
+ "source": [
684
+ "# Use a SGD classifier\n",
685
+ "sgd_run = training(\n",
686
+ " model_type=\"sgd\",\n",
687
+ " train_dataset_id=dataset_trn_artifact_version.id,\n",
688
+ " test_dataset_id=dataset_tst_artifact_version.id\n",
689
+ ")\n",
690
+ "\n",
691
+ "sgd_run = client.get_pipeline(\"training\").last_run"
692
+ ]
693
+ },
694
+ {
695
+ "cell_type": "markdown",
696
+ "id": "43f1a68a",
697
+ "metadata": {},
698
+ "source": [
699
+ "You can see from the logs already how our model training went: the\n",
700
+ "`RandomForestClassifier` performed considerably better than the `SGDClassifier`.\n",
701
+ "We can use the ZenML `Client` to verify this:"
702
+ ]
703
+ },
704
+ {
705
+ "cell_type": "code",
706
+ "execution_count": null,
707
+ "id": "d95810b1",
708
+ "metadata": {},
709
+ "outputs": [],
710
+ "source": [
711
+ "# The evaluator returns a float value with the accuracy\n",
712
+ "rf_run.steps[\"model_evaluator\"].output.load() > sgd_run.steps[\"model_evaluator\"].output.load()"
713
+ ]
714
+ },
715
+ {
716
+ "cell_type": "markdown",
717
+ "id": "e256d145",
718
+ "metadata": {},
719
+ "source": [
720
+ "# πŸ’― Step 3: Associating a model with your pipeline"
721
+ ]
722
+ },
723
+ {
724
+ "cell_type": "markdown",
725
+ "id": "927978f3",
726
+ "metadata": {},
727
+ "source": [
728
+ "You can see it is relatively easy to train ML models using ZenML pipelines. But it can be somewhat clunky to track\n",
729
+ "all the models produced as you develop your experiments and use-cases. Luckily, ZenML offers a *Model Control Plane*,\n",
730
+ "which is a central register of all your ML models.\n",
731
+ "\n",
732
+ "You can easily create a ZenML `Model` and associate it with your pipelines using the `ModelVersion` object:"
733
+ ]
734
+ },
735
+ {
736
+ "cell_type": "code",
737
+ "execution_count": null,
738
+ "id": "99ca00c0",
739
+ "metadata": {},
740
+ "outputs": [],
741
+ "source": [
742
+ "pipeline_settings = {}\n",
743
+ "\n",
744
+ "# Lets add some metadata to the model to make it identifiable\n",
745
+ "pipeline_settings[\"model_version\"] = ModelVersion(\n",
746
+ " name=\"breast_cancer_classifier\",\n",
747
+ " license=\"Apache 2.0\",\n",
748
+ " description=\"A breast cancer classifier\",\n",
749
+ " tags=[\"breast_cancer\", \"classifier\"],\n",
750
+ ")"
751
+ ]
752
+ },
753
+ {
754
+ "cell_type": "code",
755
+ "execution_count": null,
756
+ "id": "0e78a520",
757
+ "metadata": {},
758
+ "outputs": [],
759
+ "source": [
760
+ "# Let's train the SGD model and set the version name to \"sgd\"\n",
761
+ "pipeline_settings[\"model_version\"].version = \"sgd\"\n",
762
+ "\n",
763
+ "# the `with_options` method allows us to pass in pipeline settings\n",
764
+ "# and returns a configured pipeline\n",
765
+ "training_configured = training.with_options(**pipeline_settings)\n",
766
+ "\n",
767
+ "# We can now run this as usual\n",
768
+ "training_configured(\n",
769
+ " model_type=\"sgd\",\n",
770
+ " train_dataset_id=dataset_trn_artifact_version.id,\n",
771
+ " test_dataset_id=dataset_tst_artifact_version.id\n",
772
+ ")"
773
+ ]
774
+ },
775
+ {
776
+ "cell_type": "code",
777
+ "execution_count": null,
778
+ "id": "9b8e0002",
779
+ "metadata": {},
780
+ "outputs": [],
781
+ "source": [
782
+ "# Let's train the RF model and set the version name to \"rf\"\n",
783
+ "pipeline_settings[\"model_version\"].version = \"rf\"\n",
784
+ "\n",
785
+ "# the `with_options` method allows us to pass in pipeline settings\n",
786
+ "# and returns a configured pipeline\n",
787
+ "training_configured = training.with_options(**pipeline_settings)\n",
788
+ "\n",
789
+ "# Let's run it again to make sure we have two versions\n",
790
+ "training_configured(\n",
791
+ " model_type=\"rf\",\n",
792
+ " train_dataset_id=dataset_trn_artifact_version.id,\n",
793
+ " test_dataset_id=dataset_tst_artifact_version.id\n",
794
+ ")"
795
+ ]
796
+ },
797
+ {
798
+ "cell_type": "markdown",
799
+ "id": "09597223",
800
+ "metadata": {},
801
+ "source": [
802
+ "This time, running both pipelines has created two associated **model versions**.\n",
803
+ "You can list your ZenML model and their versions as follows:"
804
+ ]
805
+ },
806
+ {
807
+ "cell_type": "code",
808
+ "execution_count": null,
809
+ "id": "fbb25913",
810
+ "metadata": {},
811
+ "outputs": [],
812
+ "source": [
813
+ "zenml_model = client.get_model(\"breast_cancer_classifier\")\n",
814
+ "print(zenml_model)\n",
815
+ "\n",
816
+ "print(f\"Model {zenml_model.name} has {len(zenml_model.versions)} versions\")\n",
817
+ "\n",
818
+ "zenml_model.versions[0].version, zenml_model.versions[1].version"
819
+ ]
820
+ },
821
+ {
822
+ "cell_type": "markdown",
823
+ "id": "e82cfac2",
824
+ "metadata": {},
825
+ "source": [
826
+ "The interesting part is that ZenML went ahead and linked all artifacts produced by the\n",
827
+ "pipelines to that model version, including the two pickle files that represent our\n",
828
+ "SGD and RandomForest classifier. We can see all artifacts directly from the model\n",
829
+ "version object:"
830
+ ]
831
+ },
832
+ {
833
+ "cell_type": "code",
834
+ "execution_count": null,
835
+ "id": "31211413",
836
+ "metadata": {},
837
+ "outputs": [],
838
+ "source": [
839
+ "# Let's load the RF version\n",
840
+ "rf_zenml_model_version = client.get_model_version(\"breast_cancer_classifier\", \"rf\")\n",
841
+ "\n",
842
+ "# We can now load our classifier directly as well\n",
843
+ "random_forest_classifier = rf_zenml_model_version.get_artifact(\"sklearn_classifier\").load()\n",
844
+ "\n",
845
+ "random_forest_classifier"
846
+ ]
847
+ },
848
+ {
849
+ "cell_type": "markdown",
850
+ "id": "53517a9a",
851
+ "metadata": {},
852
+ "source": [
853
+ "If you are a [ZenML Cloud](https://zenml.io/cloud) user, you can see all of this visualized in the dashboard:\n",
854
+ "\n",
855
+ "<img src=\".assets/cloud_mcp_screenshot.png\" width=\"70%\" alt=\"Model Control Plane\">"
856
+ ]
857
+ },
858
+ {
859
+ "cell_type": "markdown",
860
+ "id": "eb645dde",
861
+ "metadata": {},
862
+ "source": [
863
+ "There is a lot more you can do with ZenML models, including the ability to\n",
864
+ "track metrics by adding metadata to it, or having them persist in a model\n",
865
+ "registry. However, these topics can be explored more in the\n",
866
+ "[ZenML docs](https://docs.zenml.io).\n",
867
+ "\n",
868
+ "For now, we will use the ZenML model control plane to promote our best\n",
869
+ "model to `production`. You can do this by simply setting the `stage` of\n",
870
+ "your chosen model version to the `production` tag."
871
+ ]
872
+ },
873
+ {
874
+ "cell_type": "code",
875
+ "execution_count": null,
876
+ "id": "26b718f8",
877
+ "metadata": {},
878
+ "outputs": [],
879
+ "source": [
880
+ "# Set our best classifier to production\n",
881
+ "rf_zenml_model_version.set_stage(\"production\", force=True)"
882
+ ]
883
+ },
884
+ {
885
+ "cell_type": "markdown",
886
+ "id": "9fddf3d0",
887
+ "metadata": {},
888
+ "source": [
889
+ "Of course, normally one would only promote the model by comparing to all other model\n",
890
+ "versions and doing some other tests. But that's a bit more advanced use-case. See the\n",
891
+ "[e2e_batch example](https://github.com/zenml-io/zenml/tree/main/examples/e2e) to get\n",
892
+ "more insight into that sort of flow!"
893
+ ]
894
+ },
895
+ {
896
+ "cell_type": "markdown",
897
+ "id": "2ecbc8cf",
898
+ "metadata": {},
899
+ "source": [
900
+ "<img src=\".assets/cloud_mcp.png\" width=\"60%\" alt=\"Model Control Plane\">"
901
+ ]
902
+ },
903
+ {
904
+ "cell_type": "markdown",
905
+ "id": "8f1146db",
906
+ "metadata": {},
907
+ "source": [
908
+ "Once the model is promoted, we can now consume the right model version in our\n",
909
+ "batch inference pipeline directly. Let's see how that works."
910
+ ]
911
+ },
912
+ {
913
+ "cell_type": "markdown",
914
+ "id": "d6306f14",
915
+ "metadata": {},
916
+ "source": [
917
+ "# πŸ«… Step 4: Consuming the model in production"
918
+ ]
919
+ },
920
+ {
921
+ "cell_type": "markdown",
922
+ "id": "b51f3108",
923
+ "metadata": {},
924
+ "source": [
925
+ "The batch inference pipeline simply takes the model marked as `production` and runs inference on it\n",
926
+ "with `live data`. The critical step here is the `inference_predict` step, where we load the model in memory\n",
927
+ "and generate predictions:\n",
928
+ "\n",
929
+ "<img src=\".assets/inference_pipeline.png\" width=\"45%\" alt=\"Inference pipeline\">"
930
+ ]
931
+ },
932
+ {
933
+ "cell_type": "code",
934
+ "execution_count": null,
935
+ "id": "92c4c7dc",
936
+ "metadata": {},
937
+ "outputs": [],
938
+ "source": [
939
+ "@step\n",
940
+ "def inference_predict(dataset_inf: pd.DataFrame) -> Annotated[pd.Series, \"predictions\"]:\n",
941
+ " \"\"\"Predictions step\"\"\"\n",
942
+ " # Get the model_version\n",
943
+ " model_version = get_step_context().model_version\n",
944
+ "\n",
945
+ " # run prediction from memory\n",
946
+ " predictor = model_version.load_artifact(\"sklearn_classifier\")\n",
947
+ " predictions = predictor.predict(dataset_inf)\n",
948
+ "\n",
949
+ " predictions = pd.Series(predictions, name=\"predicted\")\n",
950
+ "\n",
951
+ " return predictions\n"
952
+ ]
953
+ },
954
+ {
955
+ "cell_type": "markdown",
956
+ "id": "3aeb227b",
957
+ "metadata": {},
958
+ "source": [
959
+ "Apart from the loading the model, we must also load the preprocessing pipeline that we ran in feature engineering,\n",
960
+ "so that we can do the exact steps that we did on training time, in inference time. Let's bring it all together:"
961
+ ]
962
+ },
963
+ {
964
+ "cell_type": "code",
965
+ "execution_count": null,
966
+ "id": "37c409bd",
967
+ "metadata": {},
968
+ "outputs": [],
969
+ "source": [
970
+ "@pipeline\n",
971
+ "def inference(preprocess_pipeline_id: UUID):\n",
972
+ " \"\"\"Model batch inference pipeline\"\"\"\n",
973
+ " # random_state = client.get_artifact_version(id=preprocess_pipeline_id).metadata[\"random_state\"].value\n",
974
+ " # target = client.get_artifact_version(id=preprocess_pipeline_id).run_metadata['target'].value\n",
975
+ " random_state = 42\n",
976
+ " target = \"target\"\n",
977
+ "\n",
978
+ " df_inference = data_loader(\n",
979
+ " random_state=random_state, is_inference=True\n",
980
+ " )\n",
981
+ " df_inference = inference_preprocessor(\n",
982
+ " dataset_inf=df_inference,\n",
983
+ " # We use the preprocess pipeline from the feature engineering pipeline\n",
984
+ " preprocess_pipeline=ExternalArtifact(id=preprocess_pipeline_id),\n",
985
+ " target=target,\n",
986
+ " )\n",
987
+ " inference_predict(\n",
988
+ " dataset_inf=df_inference,\n",
989
+ " )\n"
990
+ ]
991
+ },
992
+ {
993
+ "cell_type": "markdown",
994
+ "id": "c7afe7be",
995
+ "metadata": {},
996
+ "source": [
997
+ "The way to load the right model is to pass in the `production` stage into the `ModelVersion` config this time.\n",
998
+ "This will ensure to always load the production model, decoupled from all other pipelines:"
999
+ ]
1000
+ },
1001
+ {
1002
+ "cell_type": "code",
1003
+ "execution_count": null,
1004
+ "id": "61bf5939",
1005
+ "metadata": {},
1006
+ "outputs": [],
1007
+ "source": [
1008
+ "pipeline_settings = {\"enable_cache\": False}\n",
1009
+ "\n",
1010
+ "# Lets add some metadata to the model to make it identifiable\n",
1011
+ "pipeline_settings[\"model_version\"] = ModelVersion(\n",
1012
+ " name=\"breast_cancer_classifier\",\n",
1013
+ " version=\"production\", # We can pass in the stage name here!\n",
1014
+ " license=\"Apache 2.0\",\n",
1015
+ " description=\"A breast cancer classifier\",\n",
1016
+ " tags=[\"breast_cancer\", \"classifier\"],\n",
1017
+ ")"
1018
+ ]
1019
+ },
1020
+ {
1021
+ "cell_type": "code",
1022
+ "execution_count": null,
1023
+ "id": "ff3402f1",
1024
+ "metadata": {},
1025
+ "outputs": [],
1026
+ "source": [
1027
+ "# the `with_options` method allows us to pass in pipeline settings\n",
1028
+ "# and returns a configured pipeline\n",
1029
+ "inference_configured = inference.with_options(**pipeline_settings)\n",
1030
+ "\n",
1031
+ "# Let's run it again to make sure we have two versions\n",
1032
+ "# We need to pass in the ID of the preprocessing done in the feature engineering pipeline\n",
1033
+ "# in order to avoid training-serving skew\n",
1034
+ "inference_configured(\n",
1035
+ " preprocess_pipeline_id=preprocessing_pipeline_artifact_version.id\n",
1036
+ ")"
1037
+ ]
1038
+ },
1039
+ {
1040
+ "cell_type": "markdown",
1041
+ "id": "2935d1fa",
1042
+ "metadata": {},
1043
+ "source": [
1044
+ "ZenML automatically links all artifacts to the `production` model version as well, including the predictions\n",
1045
+ "that were returned in the pipeline. This completes the MLOps loop of training to inference:"
1046
+ ]
1047
+ },
1048
+ {
1049
+ "cell_type": "code",
1050
+ "execution_count": null,
1051
+ "id": "e191d019",
1052
+ "metadata": {},
1053
+ "outputs": [],
1054
+ "source": [
1055
+ "# Fetch production model\n",
1056
+ "production_model_version = client.get_model_version(\"breast_cancer_classifier\", \"production\")\n",
1057
+ "\n",
1058
+ "# Get the predictions artifact\n",
1059
+ "production_model_version.get_artifact(\"predictions\").load()"
1060
+ ]
1061
+ },
1062
+ {
1063
+ "cell_type": "markdown",
1064
+ "id": "b0a73cdf",
1065
+ "metadata": {},
1066
+ "source": [
1067
+ "You can also see all predictions ever created as a complete history in the dashboard:\n",
1068
+ "\n",
1069
+ "<img src=\".assets/cloud_mcp_predictions.png\" width=\"70%\" alt=\"Model Control Plane\">"
1070
+ ]
1071
+ },
1072
+ {
1073
+ "cell_type": "markdown",
1074
+ "id": "594ee4fc-f102-4b99-bdc3-2f1670c87679",
1075
+ "metadata": {},
1076
+ "source": [
1077
+ "## Congratulations!\n",
1078
+ "\n",
1079
+ "You're a legit MLOps engineer now! You trained two models, evaluated them against\n",
1080
+ "a test set, registered the best one with the ZenML model control plane,\n",
1081
+ "and served some predictions. You also learned how to iterate on your models and\n",
1082
+ "data by using some of the ZenML utility abstractions. You saw how to view your\n",
1083
+ "artifacts and models via the client as well as the ZenML Dashboard.\n",
1084
+ "\n",
1085
+ "## Further exploration\n",
1086
+ "\n",
1087
+ "This was just the tip of the iceberg of what ZenML can do; check out the [**docs**](https://docs.zenml.io/) to learn more\n",
1088
+ "about the capabilities of ZenML. For example, you might want to:\n",
1089
+ "\n",
1090
+ "- [Deploy ZenML](https://docs.zenml.io/user-guide/production-guide/connect-deployed-zenml) to collaborate with your colleagues.\n",
1091
+ "- Run the same pipeline on a [cloud MLOps stack in production](https://docs.zenml.io/user-guide/production-guide/cloud-stack).\n",
1092
+ "- Track your metrics in an experiment tracker like [MLflow](https://docs.zenml.io/stacks-and-components/component-guide/experiment-trackers/mlflow).\n",
1093
+ "\n",
1094
+ "## What next?\n",
1095
+ "\n",
1096
+ "* If you have questions or feedback... join our [**Slack Community**](https://zenml.io/slack) and become part of the ZenML family!\n",
1097
+ "* If you want to quickly get started with ZenML, check out the [ZenML Cloud](https://zenml.io/cloud)."
1098
+ ]
1099
+ }
1100
+ ],
1101
+ "metadata": {
1102
+ "kernelspec": {
1103
+ "display_name": "Python 3 (ipykernel)",
1104
+ "language": "python",
1105
+ "name": "python3"
1106
+ },
1107
+ "language_info": {
1108
+ "codemirror_mode": {
1109
+ "name": "ipython",
1110
+ "version": 3
1111
+ },
1112
+ "file_extension": ".py",
1113
+ "mimetype": "text/x-python",
1114
+ "name": "python",
1115
+ "nbconvert_exporter": "python",
1116
+ "pygments_lexer": "ipython3",
1117
+ "version": "3.8.10"
1118
+ }
1119
+ },
1120
+ "nbformat": 4,
1121
+ "nbformat_minor": 5
1122
+ }