Spaces:

htahir1
/

zenml_breast_cancer_classifier

Runtime error

App Files Files Community

htahir1 commited on Jan 4, 2024

Commit

551af5c

1 Parent(s): 17cac4f

Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

_assets/feature_engineering_pipeline.png +0 -0
_assets/inference_pipeline.png +0 -0
_assets/inference_pipeline.png:Zone.Identifier +0 -0
_assets/pipeline_overview.png +0 -0
_assets/training_pipeline.png +0 -0
requirements.txt +4 -1
run_deploy.ipynb +1122 -0

_assets/feature_engineering_pipeline.png ADDED Viewed

_assets/inference_pipeline.png ADDED Viewed

_assets/inference_pipeline.png:Zone.Identifier ADDED Viewed

File without changes

_assets/pipeline_overview.png ADDED Viewed

_assets/training_pipeline.png ADDED Viewed

requirements.txt CHANGED Viewed

@@ -3,4 +3,7 @@ notebook
 scikit-learn<1.3
 s3fs>2022.3.0,<=2023.4.0
 boto3<=1.26.76
-aws-profile-manager

 scikit-learn<1.3
 s3fs>2022.3.0,<=2023.4.0
 boto3<=1.26.76
+aws-profile-manager
+mlflow>=2.1.1,<=2.9.2
+mlserver>=1.3.3
+mlserver-mlflow>=1.3.3

run_deploy.ipynb ADDED Viewed

	@@ -0,0 +1,1122 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "63ab391a",
+   "metadata": {},
+   "source": [
+    "# Intro to MLOps using ZenML\n",
+    "\n",
+    "## 🌍 Overview\n",
+    "\n",
+    "This repository is a minimalistic MLOps project intended as a starting point to learn how to put ML workflows in production. It features: \n",
+    "\n",
+    "- A feature engineering pipeline that loads data and prepares it for training.\n",
+    "- A training pipeline that loads the preprocessed dataset and trains a model.\n",
+    "- A batch inference pipeline that runs predictions on the trained model with new data.\n",
+    "\n",
+    "Follow along this notebook to understand how you can use ZenML to productionalize your ML workflows!\n",
+    "\n",
+    "<img src=\"_assets/pipeline_overview.png\" width=\"50%\" alt=\"Pipelines Overview\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f466b16",
+   "metadata": {},
+   "source": [
+    "## Run on Colab\n",
+    "\n",
+    "You can use Google Colab to see ZenML in action, no signup / installation\n",
+    "required!\n",
+    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](\n",
+    "https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66b2977c",
+   "metadata": {},
+   "source": [
+    "# 👶 Step 0. Install Requirements\n",
+    "\n",
+    "Let's install ZenML to get started. First we'll install the latest version of\n",
+    "ZenML as well as the `sklearn` integration of ZenML:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce2f40eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install \"zenml[server]\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5aad397e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from zenml.environment import Environment\n",
+    "\n",
+    "if Environment.in_google_colab():\n",
+    "    # Install Cloudflare Tunnel binary\n",
+    "    !wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb && dpkg -i cloudflared-linux-amd64.deb\n",
+    "\n",
+    "    # Pull required modules from this example\n",
+    "    !git clone -b main https://github.com/zenml-io/zenml\n",
+    "    !cp -r zenml/examples/quickstart/* .\n",
+    "    !rm -rf zenml\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f76f562e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!zenml integration install sklearn -y\n",
+    "\n",
+    "import IPython\n",
+    "IPython.Application.instance().kernel.do_shutdown(restart=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b044374",
+   "metadata": {},
+   "source": [
+    "Please wait for the installation to complete before running subsequent cells. At\n",
+    "the end of the installation, the notebook kernel will automatically restart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3955ff1",
+   "metadata": {},
+   "source": [
+    "Optional: If you are using [ZenML Cloud](https://zenml.io/cloud), execute the following cell with your tenant URL. Otherwise ignore."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e2587315",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "zenml_server_url = \"PLEASE_UPDATE_ME\"  # in the form \"https://URL_TO_SERVER\"\n",
+    "\n",
+    "!zenml connect --url $zenml_server_url"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "081d5616",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Initialize ZenML and set the default stack\n",
+    "!zenml init\n",
+    "\n",
+    "!zenml stack set default"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "79f775f2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Do the imports at the top\n",
+    "from typing_extensions import Annotated\n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "\n",
+    "import random\n",
+    "import pandas as pd\n",
+    "from zenml import step, ExternalArtifact, pipeline, ModelVersion, get_step_context\n",
+    "from zenml.client import Client\n",
+    "from zenml.logger import get_logger\n",
+    "from uuid import UUID\n",
+    "\n",
+    "from typing import Optional, List\n",
+    "\n",
+    "from zenml import pipeline\n",
+    "\n",
+    "from steps import (\n",
+    "    data_loader,\n",
+    "    data_preprocessor,\n",
+    "    data_splitter,\n",
+    "    model_evaluator,\n",
+    "    inference_preprocessor\n",
+    ")\n",
+    "\n",
+    "from zenml.logger import get_logger\n",
+    "\n",
+    "logger = get_logger(__name__)\n",
+    "\n",
+    "# Initialize the ZenML client to fetch objects from the ZenML Server\n",
+    "client = Client()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35e48460",
+   "metadata": {},
+   "source": [
+    "## 🥇 Step 1: Load your data and execute feature engineering\n",
+    "\n",
+    "We'll start off by importing our data. In this quickstart we'll be working with\n",
+    "[the Breast Cancer](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic) dataset\n",
+    "which is publicly available on the UCI Machine Learning Repository. The task is a classification\n",
+    "problem, to predict whether a patient is diagnosed with breast cancer or not.\n",
+    "\n",
+    "When you're getting started with a machine learning problem you'll want to do\n",
+    "something similar to this: import your data and get it in the right shape for\n",
+    "your training. ZenML mostly gets out of your way when you're writing your Python\n",
+    "code, as you'll see from the following cell.\n",
+    "\n",
+    "<img src=\".assets/feature_engineering_pipeline.png\" width=\"50%\" alt=\"Feature engineering pipeline\" />"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3cd974d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@step\n",
+    "def data_loader_simplified(\n",
+    "    random_state: int, is_inference: bool = False, target: str = \"target\"\n",
+    ") -> Annotated[pd.DataFrame, \"dataset\"]:  # We name the dataset \n",
+    "    \"\"\"Dataset reader step.\"\"\"\n",
+    "    dataset = load_breast_cancer(as_frame=True)\n",
+    "    inference_size = int(len(dataset.target) * 0.05)\n",
+    "    dataset: pd.DataFrame = dataset.frame\n",
+    "    inference_subset = dataset.sample(inference_size, random_state=random_state)\n",
+    "    if is_inference:\n",
+    "        dataset = inference_subset\n",
+    "        dataset.drop(columns=target, inplace=True)\n",
+    "    else:\n",
+    "        dataset.drop(inference_subset.index, inplace=True)\n",
+    "    dataset.reset_index(drop=True, inplace=True)\n",
+    "    logger.info(f\"Dataset with {len(dataset)} records loaded!\")\n",
+    "    return dataset\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e8ba4c6",
+   "metadata": {},
+   "source": [
+    "The whole function is decorated with the `@step` decorator, which\n",
+    "tells ZenML to track this function as a step in the pipeline. This means that\n",
+    "ZenML will automatically version, track, and cache the data that is produced by\n",
+    "this function as an `artifact`. This is a very powerful feature, as it means that you can\n",
+    "reproduce your data at any point in the future, even if the original data source\n",
+    "changes or disappears. \n",
+    "\n",
+    "Note the use of the `typing` module's `Annotated` type hint in the output of the\n",
+    "step. We're using this to give a name to the output of the step, which will make\n",
+    "it possible to access it via a keyword later on.\n",
+    "\n",
+    "You'll also notice that we have included type hints for the outputs\n",
+    "to the function. These are not only useful for anyone reading your code, but\n",
+    "help ZenML process your data in a way appropriate to the specific data types."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6286b67",
+   "metadata": {},
+   "source": [
+    "ZenML is built in a way that allows you to experiment with your data and build\n",
+    "your pipelines as you work, so if you want to call this function to see how it\n",
+    "works, you can just call it directly. Here we take a look at the first few rows\n",
+    "of your training dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d838e2ea",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = data_loader_simplified(random_state=42)\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28c05291",
+   "metadata": {},
+   "source": [
+    "Everything looks as we'd expect and the values are all in the right format 🥳.\n",
+    "\n",
+    "We're now at the point where can bring this step (and some others) together into a single\n",
+    "pipeline, the top-level organising entity for code in ZenML. Creating such a pipeline is\n",
+    "as simple as adding a `@pipeline` decorator to a function. This specific\n",
+    "pipeline doesn't return a value, but that option is available to you if you need."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b50a9537",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@pipeline\n",
+    "def feature_engineering(\n",
+    "    test_size: float = 0.3,\n",
+    "    drop_na: Optional[bool] = None,\n",
+    "    normalize: Optional[bool] = None,\n",
+    "    drop_columns: Optional[List[str]] = None,\n",
+    "    target: Optional[str] = \"target\",\n",
+    "    random_state: int = 17\n",
+    "):\n",
+    "    \"\"\"Feature engineering pipeline.\"\"\"\n",
+    "    # Link all the steps together by calling them and passing the output\n",
+    "    # of one step as the input of the next step.\n",
+    "    raw_data = data_loader(random_state=random_state, target=target)\n",
+    "    dataset_trn, dataset_tst = data_splitter(\n",
+    "        dataset=raw_data,\n",
+    "        test_size=test_size,\n",
+    "    )\n",
+    "    dataset_trn, dataset_tst, _ = data_preprocessor(\n",
+    "        dataset_trn=dataset_trn,\n",
+    "        dataset_tst=dataset_tst,\n",
+    "        drop_na=drop_na,\n",
+    "        normalize=normalize,\n",
+    "        drop_columns=drop_columns,\n",
+    "        target=target,\n",
+    "        random_state=random_state,\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cd73c23",
+   "metadata": {},
+   "source": [
+    "We're ready to run the pipeline now, which we can do just as with the step - by calling the\n",
+    "pipeline function itself:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1e0aa9af",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feature_engineering()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1785c303",
+   "metadata": {},
+   "source": [
+    "Let's run this again with a slightly different test size, to create more datasets:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "658c0570-2607-4b97-a72d-d45c92633e48",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feature_engineering(test_size=0.25)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64bb7206",
+   "metadata": {},
+   "source": [
+    "Notice the second time around, the data loader step was **cached**, while the rest of the pipeline was rerun. \n",
+    "This is because ZenML automatically determined that nothing had changed in the data loader step, \n",
+    "so it didn't need to rerun it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bc6849d-31ac-4c08-9ca2-cf7f5f35ccbf",
+   "metadata": {},
+   "source": [
+    "Let's run this again with a slightly different test size and random state, to disable the cache and to create more datasets:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1e1d8546",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feature_engineering(test_size=0.25, random_state=104)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c42078a",
+   "metadata": {},
+   "source": [
+    "At this point you might be interested to view your pipeline runs in the ZenML\n",
+    "Dashboard. In case you are not using a hosted instance of ZenML, you can spin this up by executing the next cell. This will start a\n",
+    "server which you can access by clicking on the link that appears in the output\n",
+    "of the cell.\n",
+    "\n",
+    "Log into the Dashboard using default credentials (username 'default' and\n",
+    "password left blank). From there you can inspect the pipeline or the specific\n",
+    "pipeline run.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8cd3cc8c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from zenml.environment import Environment\n",
+    "from zenml.zen_stores.rest_zen_store import RestZenStore\n",
+    "\n",
+    "\n",
+    "if not isinstance(client.zen_store, RestZenStore):\n",
+    "    # Only spin up a local Dashboard in case you aren't already connected to a remote server\n",
+    "    if Environment.in_google_colab():\n",
+    "        # run ZenML through a cloudflare tunnel to get a public endpoint\n",
+    "        !zenml up --port 8237 & cloudflared tunnel --url http://localhost:8237\n",
+    "    else:\n",
+    "        !zenml up"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8471f93",
+   "metadata": {},
+   "source": [
+    "We can also fetch the pipeline from the server and view the results directly in the notebook:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f208b200",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = Client()\n",
+    "run = client.get_pipeline(\"feature_engineering\").last_run\n",
+    "print(run.name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a037f09d",
+   "metadata": {},
+   "source": [
+    "We can also see the data artifacts that were produced by the last step of the pipeline:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "34283e89",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "run.steps[\"data_preprocessor\"].outputs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bceb0312",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Read one of the datasets. This is the one with a 0.25 test split\n",
+    "run.steps[\"data_preprocessor\"].outputs[\"dataset_trn\"].load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26d26436",
+   "metadata": {},
+   "source": [
+    "We can also get the artifacts directly. Each time you create a new pipeline run, a new `artifact version` is created.\n",
+    "\n",
+    "You can fetch these artifact and their versions using the `client`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c8f90647",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get artifact version from our run\n",
+    "dataset_trn_artifact_version_via_run = run.steps[\"data_preprocessor\"].outputs[\"dataset_trn\"] \n",
+    "\n",
+    "# Get latest version from client directly\n",
+    "dataset_trn_artifact_version = client.get_artifact_version(\"dataset_trn\")\n",
+    "\n",
+    "# This should be true if our run is the latest run and no artifact has been produced\n",
+    "#  in the intervening time\n",
+    "dataset_trn_artifact_version_via_run.id == dataset_trn_artifact_version.id"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f9d3dfd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fetch the rest of the artifacts\n",
+    "dataset_tst_artifact_version = client.get_artifact_version(\"dataset_tst\")\n",
+    "preprocessing_pipeline_artifact_version = client.get_artifact_version(\"preprocess_pipeline\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a7d1b04",
+   "metadata": {},
+   "source": [
+    "If you started with a fresh install, then you would have two versions corresponding\n",
+    "to the two pipelines that we ran above. We can even load a artifact version in memory:   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c82aca75",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load an artifact to verify you can fetch it\n",
+    "dataset_trn_artifact_version.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5963509e",
+   "metadata": {},
+   "source": [
+    "We'll use these artifacts from above in our next pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c28b474",
+   "metadata": {},
+   "source": [
+    "# ⌚ Step 2: Training pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87909827",
+   "metadata": {},
+   "source": [
+    "Now that we have our data it makes sense to train some models to get a sense of\n",
+    "how difficult the task is. The Breast Cancer dataset is sufficiently large and complex \n",
+    "that it's unlikely we'll be able to train a model that behaves perfectly since the problem \n",
+    "is inherently complex, but we can get a sense of what a reasonable baseline looks like.\n",
+    "\n",
+    "We'll start with two simple models, a SGD Classifier and a Random Forest\n",
+    "Classifier, both batteries-included from `sklearn`. We'll train them both on the\n",
+    "same data and then compare their performance.\n",
+    "\n",
+    "<img src=\".assets/training_pipeline.png\" width=\"50%\" alt=\"Training pipeline\">"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fccf1bd9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "from sklearn.base import ClassifierMixin\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.linear_model import SGDClassifier\n",
+    "from typing_extensions import Annotated\n",
+    "from zenml import ArtifactConfig, step\n",
+    "from zenml.logger import get_logger\n",
+    "\n",
+    "logger = get_logger(__name__)\n",
+    "\n",
+    "\n",
+    "@step\n",
+    "def model_trainer(\n",
+    "    dataset_trn: pd.DataFrame,\n",
+    "    model_type: str = \"sgd\",\n",
+    ") -> Annotated[ClassifierMixin, ArtifactConfig(name=\"sklearn_classifier\", is_model_artifact=True)]:\n",
+    "    \"\"\"Configure and train a model on the training dataset.\"\"\"\n",
+    "    target = \"target\"\n",
+    "    if model_type == \"sgd\":\n",
+    "        model = SGDClassifier()\n",
+    "    elif model_type == \"rf\":\n",
+    "        model = RandomForestClassifier()\n",
+    "    else:\n",
+    "        raise ValueError(f\"Unknown model type {model_type}\")   \n",
+    "\n",
+    "    logger.info(f\"Training model {model}...\")\n",
+    "\n",
+    "    model.fit(\n",
+    "        dataset_trn.drop(columns=[target]),\n",
+    "        dataset_trn[target],\n",
+    "    )\n",
+    "    return model\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73a00008",
+   "metadata": {},
+   "source": [
+    "Our two training steps both return different kinds of `sklearn` classifier\n",
+    "models, so we use the generic `ClassifierMixin` type hint for the return type."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5f22174",
+   "metadata": {},
+   "source": [
+    "ZenML allows you to load any version of any dataset that is tracked by the framework\n",
+    "directly into a pipeline using the `ExternalArtifact` interface. This is very convenient\n",
+    "in this case, as we'd like to send our preprocessed dataset from the older pipeline directly\n",
+    "into the training pipeline."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1aa98f2f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@pipeline\n",
+    "def training(\n",
+    "    train_dataset_id: Optional[UUID] = None,\n",
+    "    test_dataset_id: Optional[UUID] = None,\n",
+    "    model_type: str = \"sgd\",\n",
+    "    min_train_accuracy: float = 0.0,\n",
+    "    min_test_accuracy: float = 0.0,\n",
+    "):\n",
+    "    \"\"\"Model training pipeline.\"\"\" \n",
+    "    if train_dataset_id is None or test_dataset_id is None:\n",
+    "        # If we dont pass the IDs, this will run the feature engineering pipeline   \n",
+    "        dataset_trn, dataset_tst = feature_engineering()\n",
+    "    else:\n",
+    "        # Load the datasets from an older pipeline\n",
+    "        dataset_trn = ExternalArtifact(id=train_dataset_id)\n",
+    "        dataset_tst = ExternalArtifact(id=test_dataset_id) \n",
+    "\n",
+    "    trained_model = model_trainer(\n",
+    "        dataset_trn=dataset_trn,\n",
+    "        model_type=model_type,\n",
+    "    )\n",
+    "\n",
+    "    model_evaluator(\n",
+    "        model=trained_model,\n",
+    "        dataset_trn=dataset_trn,\n",
+    "        dataset_tst=dataset_tst,\n",
+    "        min_train_accuracy=min_train_accuracy,\n",
+    "        min_test_accuracy=min_test_accuracy,\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88b70fd3",
+   "metadata": {},
+   "source": [
+    "The end goal of this quick baseline evaluation is to understand which of the two\n",
+    "models performs better. We'll use the `evaluator` step to compare the two\n",
+    "models. This step takes in the model from the trainer step, and computes its score\n",
+    "over the testing set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c64885ac",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Use a random forest model with the chosen datasets.\n",
+    "# We need to pass the ID's of the datasets into the function\n",
+    "training(\n",
+    "    model_type=\"rf\",\n",
+    "    train_dataset_id=dataset_trn_artifact_version.id,\n",
+    "    test_dataset_id=dataset_tst_artifact_version.id\n",
+    ")\n",
+    "\n",
+    "rf_run = client.get_pipeline(\"training\").last_run"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4300c82f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Use a SGD classifier\n",
+    "sgd_run = training(\n",
+    "    model_type=\"sgd\",\n",
+    "    train_dataset_id=dataset_trn_artifact_version.id,\n",
+    "    test_dataset_id=dataset_tst_artifact_version.id\n",
+    ")\n",
+    "\n",
+    "sgd_run = client.get_pipeline(\"training\").last_run"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43f1a68a",
+   "metadata": {},
+   "source": [
+    "You can see from the logs already how our model training went: the\n",
+    "`RandomForestClassifier` performed considerably better than the `SGDClassifier`.\n",
+    "We can use the ZenML `Client` to verify this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d95810b1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The evaluator returns a float value with the accuracy\n",
+    "rf_run.steps[\"model_evaluator\"].output.load() > sgd_run.steps[\"model_evaluator\"].output.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e256d145",
+   "metadata": {},
+   "source": [
+    "# 💯 Step 3: Associating a model with your pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "927978f3",
+   "metadata": {},
+   "source": [
+    "You can see it is relatively easy to train ML models using ZenML pipelines. But it can be somewhat clunky to track\n",
+    "all the models produced as you develop your experiments and use-cases. Luckily, ZenML offers a *Model Control Plane*,\n",
+    "which is a central register of all your ML models.\n",
+    "\n",
+    "You can easily create a ZenML `Model` and associate it with your pipelines using the `ModelVersion` object:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "99ca00c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline_settings = {}\n",
+    "\n",
+    "# Lets add some metadata to the model to make it identifiable\n",
+    "pipeline_settings[\"model_version\"] = ModelVersion(\n",
+    "    name=\"breast_cancer_classifier\",\n",
+    "    license=\"Apache 2.0\",\n",
+    "    description=\"A breast cancer classifier\",\n",
+    "    tags=[\"breast_cancer\", \"classifier\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0e78a520",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's train the SGD model and set the version name to \"sgd\"\n",
+    "pipeline_settings[\"model_version\"].version = \"sgd\"\n",
+    "\n",
+    "# the `with_options` method allows us to pass in pipeline settings\n",
+    "#  and returns a configured pipeline\n",
+    "training_configured = training.with_options(**pipeline_settings)\n",
+    "\n",
+    "# We can now run this as usual\n",
+    "training_configured(\n",
+    "    model_type=\"sgd\",\n",
+    "    train_dataset_id=dataset_trn_artifact_version.id,\n",
+    "    test_dataset_id=dataset_tst_artifact_version.id\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9b8e0002",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's train the RF model and set the version name to \"rf\"\n",
+    "pipeline_settings[\"model_version\"].version = \"rf\"\n",
+    "\n",
+    "# the `with_options` method allows us to pass in pipeline settings\n",
+    "#  and returns a configured pipeline\n",
+    "training_configured = training.with_options(**pipeline_settings)\n",
+    "\n",
+    "# Let's run it again to make sure we have two versions\n",
+    "training_configured(\n",
+    "    model_type=\"rf\",\n",
+    "    train_dataset_id=dataset_trn_artifact_version.id,\n",
+    "    test_dataset_id=dataset_tst_artifact_version.id\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09597223",
+   "metadata": {},
+   "source": [
+    "This time, running both pipelines has created two associated **model versions**.\n",
+    "You can list your ZenML model and their versions as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fbb25913",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "zenml_model = client.get_model(\"breast_cancer_classifier\")\n",
+    "print(zenml_model)\n",
+    "\n",
+    "print(f\"Model {zenml_model.name} has {len(zenml_model.versions)} versions\")\n",
+    "\n",
+    "zenml_model.versions[0].version, zenml_model.versions[1].version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e82cfac2",
+   "metadata": {},
+   "source": [
+    "The interesting part is that ZenML went ahead and linked all artifacts produced by the\n",
+    "pipelines to that model version, including the two pickle files that represent our\n",
+    "SGD and RandomForest classifier. We can see all artifacts directly from the model\n",
+    "version object:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31211413",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's load the RF version\n",
+    "rf_zenml_model_version = client.get_model_version(\"breast_cancer_classifier\", \"rf\")\n",
+    "\n",
+    "# We can now load our classifier directly as well\n",
+    "random_forest_classifier = rf_zenml_model_version.get_artifact(\"sklearn_classifier\").load()\n",
+    "\n",
+    "random_forest_classifier"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53517a9a",
+   "metadata": {},
+   "source": [
+    "If you are a [ZenML Cloud](https://zenml.io/cloud) user, you can see all of this visualized in the dashboard:\n",
+    "\n",
+    "<img src=\".assets/cloud_mcp_screenshot.png\" width=\"70%\" alt=\"Model Control Plane\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb645dde",
+   "metadata": {},
+   "source": [
+    "There is a lot more you can do with ZenML models, including the ability to\n",
+    "track metrics by adding metadata to it, or having them persist in a model\n",
+    "registry. However, these topics can be explored more in the\n",
+    "[ZenML docs](https://docs.zenml.io).\n",
+    "\n",
+    "For now, we will use the ZenML model control plane to promote our best\n",
+    "model to `production`. You can do this by simply setting the `stage` of\n",
+    "your chosen model version to the `production` tag."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "26b718f8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Set our best classifier to production\n",
+    "rf_zenml_model_version.set_stage(\"production\", force=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fddf3d0",
+   "metadata": {},
+   "source": [
+    "Of course, normally one would only promote the model by comparing to all other model\n",
+    "versions and doing some other tests. But that's a bit more advanced use-case. See the\n",
+    "[e2e_batch example](https://github.com/zenml-io/zenml/tree/main/examples/e2e) to get\n",
+    "more insight into that sort of flow!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ecbc8cf",
+   "metadata": {},
+   "source": [
+    "<img src=\".assets/cloud_mcp.png\" width=\"60%\" alt=\"Model Control Plane\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f1146db",
+   "metadata": {},
+   "source": [
+    "Once the model is promoted, we can now consume the right model version in our\n",
+    "batch inference pipeline directly. Let's see how that works."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6306f14",
+   "metadata": {},
+   "source": [
+    "# 🫅 Step 4: Consuming the model in production"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b51f3108",
+   "metadata": {},
+   "source": [
+    "The batch inference pipeline simply takes the model marked as `production` and runs inference on it\n",
+    "with `live data`. The critical step here is the `inference_predict` step, where we load the model in memory\n",
+    "and generate predictions:\n",
+    "\n",
+    "<img src=\".assets/inference_pipeline.png\" width=\"45%\" alt=\"Inference pipeline\">"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "92c4c7dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@step\n",
+    "def inference_predict(dataset_inf: pd.DataFrame) -> Annotated[pd.Series, \"predictions\"]:\n",
+    "    \"\"\"Predictions step\"\"\"\n",
+    "    # Get the model_version\n",
+    "    model_version = get_step_context().model_version\n",
+    "\n",
+    "    # run prediction from memory\n",
+    "    predictor = model_version.load_artifact(\"sklearn_classifier\")\n",
+    "    predictions = predictor.predict(dataset_inf)\n",
+    "\n",
+    "    predictions = pd.Series(predictions, name=\"predicted\")\n",
+    "\n",
+    "    return predictions\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aeb227b",
+   "metadata": {},
+   "source": [
+    "Apart from the loading the model, we must also load the preprocessing pipeline that we ran in feature engineering,\n",
+    "so that we can do the exact steps that we did on training time, in inference time. Let's bring it all together:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37c409bd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@pipeline\n",
+    "def inference(preprocess_pipeline_id: UUID):\n",
+    "    \"\"\"Model batch inference pipeline\"\"\"\n",
+    "    # random_state = client.get_artifact_version(id=preprocess_pipeline_id).metadata[\"random_state\"].value\n",
+    "    # target = client.get_artifact_version(id=preprocess_pipeline_id).run_metadata['target'].value\n",
+    "    random_state = 42\n",
+    "    target = \"target\"\n",
+    "\n",
+    "    df_inference = data_loader(\n",
+    "        random_state=random_state, is_inference=True\n",
+    "    )\n",
+    "    df_inference = inference_preprocessor(\n",
+    "        dataset_inf=df_inference,\n",
+    "        # We use the preprocess pipeline from the feature engineering pipeline\n",
+    "        preprocess_pipeline=ExternalArtifact(id=preprocess_pipeline_id),\n",
+    "        target=target,\n",
+    "    )\n",
+    "    inference_predict(\n",
+    "        dataset_inf=df_inference,\n",
+    "    )\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7afe7be",
+   "metadata": {},
+   "source": [
+    "The way to load the right model is to pass in the `production` stage into the `ModelVersion` config this time.\n",
+    "This will ensure to always load the production model, decoupled from all other pipelines:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "61bf5939",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline_settings = {\"enable_cache\": False}\n",
+    "\n",
+    "# Lets add some metadata to the model to make it identifiable\n",
+    "pipeline_settings[\"model_version\"] = ModelVersion(\n",
+    "    name=\"breast_cancer_classifier\",\n",
+    "    version=\"production\", # We can pass in the stage name here!\n",
+    "    license=\"Apache 2.0\",\n",
+    "    description=\"A breast cancer classifier\",\n",
+    "    tags=[\"breast_cancer\", \"classifier\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff3402f1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the `with_options` method allows us to pass in pipeline settings\n",
+    "#  and returns a configured pipeline\n",
+    "inference_configured = inference.with_options(**pipeline_settings)\n",
+    "\n",
+    "# Let's run it again to make sure we have two versions\n",
+    "# We need to pass in the ID of the preprocessing done in the feature engineering pipeline\n",
+    "# in order to avoid training-serving skew\n",
+    "inference_configured(\n",
+    "    preprocess_pipeline_id=preprocessing_pipeline_artifact_version.id\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2935d1fa",
+   "metadata": {},
+   "source": [
+    "ZenML automatically links all artifacts to the `production` model version as well, including the predictions\n",
+    "that were returned in the pipeline. This completes the MLOps loop of training to inference:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e191d019",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fetch production model\n",
+    "production_model_version = client.get_model_version(\"breast_cancer_classifier\", \"production\")\n",
+    "\n",
+    "# Get the predictions artifact\n",
+    "production_model_version.get_artifact(\"predictions\").load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0a73cdf",
+   "metadata": {},
+   "source": [
+    "You can also see all predictions ever created as a complete history in the dashboard:\n",
+    "\n",
+    "<img src=\".assets/cloud_mcp_predictions.png\" width=\"70%\" alt=\"Model Control Plane\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "594ee4fc-f102-4b99-bdc3-2f1670c87679",
+   "metadata": {},
+   "source": [
+    "## Congratulations!\n",
+    "\n",
+    "You're a legit MLOps engineer now! You trained two models, evaluated them against\n",
+    "a test set, registered the best one with the ZenML model control plane,\n",
+    "and served some predictions. You also learned how to iterate on your models and\n",
+    "data by using some of the ZenML utility abstractions. You saw how to view your\n",
+    "artifacts and models via the client as well as the ZenML Dashboard.\n",
+    "\n",
+    "## Further exploration\n",
+    "\n",
+    "This was just the tip of the iceberg of what ZenML can do; check out the [**docs**](https://docs.zenml.io/) to learn more\n",
+    "about the capabilities of ZenML. For example, you might want to:\n",
+    "\n",
+    "- [Deploy ZenML](https://docs.zenml.io/user-guide/production-guide/connect-deployed-zenml) to collaborate with your colleagues.\n",
+    "- Run the same pipeline on a [cloud MLOps stack in production](https://docs.zenml.io/user-guide/production-guide/cloud-stack).\n",
+    "- Track your metrics in an experiment tracker like [MLflow](https://docs.zenml.io/stacks-and-components/component-guide/experiment-trackers/mlflow).\n",
+    "\n",
+    "## What next?\n",
+    "\n",
+    "* If you have questions or feedback... join our [**Slack Community**](https://zenml.io/slack) and become part of the ZenML family!\n",
+    "* If you want to quickly get started with ZenML, check out the [ZenML Cloud](https://zenml.io/cloud)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}