{"id":341,"date":"2025-09-16T16:49:46","date_gmt":"2025-09-16T16:49:46","guid":{"rendered":"https:\/\/blog.ngocha.biz\/?p=341"},"modified":"2025-09-16T16:49:46","modified_gmt":"2025-09-16T16:49:46","slug":"deploy-ml-model-kubernetes-kserve","status":"publish","type":"post","link":"https:\/\/blog.ngocha.biz\/?p=341","title":{"rendered":"Deploy ML Model on Kubernetes with KServe (Step-by-Step Guide"},"content":{"rendered":"<p>In this guide, you will learn how to deploy a machine learning model on a Kubernetes cluster using <strong>KServe<\/strong> model serving.<\/p>\n<p>Here is what we will cover in this guide.<\/p>\n<ul>\n<li>What is Kserve?<\/li>\n<li>Deploying KServe on Kubernetes<\/li>\n<li>Deploying a sample scikit-learn ML model using KServe<\/li>\n<li>Test the deployed model using its inferencing endpoint.<\/li>\n<\/ul>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83c\udfaf<\/div>\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Learning Focus<\/strong><\/b>: The goal of this guide is to understand how to deploy ML models with KServe, not how to build ML models. <\/p>\n<p>We have given a simple model so you can concentrate on the Kubernetes and KServe deployment concepts.<\/p>\n<p>Anyone can try this guide without AI\/Ml background.<\/p><\/div>\n<\/div>\n<h2 id=\"what-is-kserve\">What is KServe?<\/h2>\n<p>When we train a machine learning model, the next step is to <strong>serve<\/strong> it so others can use it for predictions. <\/p>\n<p>Serving means loading the trained model, running it inside an inference server, and exposing an endpoint for apps or users to send requests and get results.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Inference in machine learning means using a trained model to make predictions on new and unseen data.<\/div>\n<\/div>\n<p>You cannot deploy an ML model on Kubernetes the same way you deploy regular workloads. Models need an inference server that handles prediction requests. KServe makes this process simple.<\/p>\n<p><a href=\"https:\/\/kserve.github.io\/website\/?ref=devopscube.com\" rel=\"noreferrer\">KServe<\/a> is an open-source ML model serving tool for <a href=\"https:\/\/devopscube.com\/kubernetes-tutorials-beginners\/\" rel=\"noreferrer\">Kubernetes<\/a>, which helps you serve your ML models on a Kubernetes cluster with minimal effort.<\/p>\n<p>You can deploy Kserve in two modes.<\/p>\n<ol>\n<li><strong>Knative (default mode):<\/strong>  This mode reqires Knative components and this is particularly good for advanced setups.<\/li>\n<li><strong>RawDeployment mode:<\/strong> If you plan to deploy your models in a simple setup RawDeployment is the best option.<\/li>\n<\/ol>\n<p>In this guide, we will focus on the Kserver installation using the RawDeployment mode and show you how to serve a simple ML model on Kubernetes.<\/p>\n<h2 id=\"kserve-model-serving-workflow\">Kserve Model Serving Workflow <\/h2>\n<p>To get started with Kserve, we are going to use the following Kserve workflow to serve a model.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/09\/kserve-3.png\" class=\"kg-image\" alt=\"\" loading=\"lazy\" width=\"2000\" height=\"1564\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/09\/kserve-3.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/09\/kserve-3.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2025\/09\/kserve-3.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/09\/kserve-3.png 2036w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Here is how in works.<\/p>\n<ol>\n<li>The KServe controller running in the Kubernetes cluster continuously looks for <strong><code>InferenceService<\/code><\/strong> resources that are created.<\/li>\n<li>When a user creates the <strong><code>InferenceService<\/code><\/strong> resource, KServe detects it and creates the following required objects.\n<ol>\n<ul>\n<li>A Deployment with a <strong>Pod<\/strong> to run the model server.<\/li>\n<li>A <strong>Service<\/strong> to expose the Pod as an endpoint.<\/li>\n<li><strong>HPA <\/strong>to<strong> <\/strong>scale up\/down based on load. <\/li>\n<\/ul>\n<\/ol>\n<\/li>\n<li>The Pod then pulls a SKlearn model server <strong>image<\/strong> from the <strong>container registry<\/strong>. This image contains the libraries to serving the model.<\/li>\n<li>The pod then uses the PVC URI to access the model.pkl file.<\/li>\n<li>As explained in second point, KServe automatically exposes a <strong>Kubernetes Service<\/strong> endpoint. This becomes the URL where clients can send <strong>API requests<\/strong> for predictions.<\/li>\n<li>Finally, you or an app can send data to the <strong>Model Endpoint<\/strong> for inferencing.<\/li>\n<\/ol>\n<p>In Kserver, you can store your model in following ways:<\/p>\n<ol>\n<li>It can be stored in an object storage like <a href=\"https:\/\/devopscube.com\/configure-loki-s3\/\" rel=\"noreferrer\">AWS S3<\/a> or Azure Blob Storage.<\/li>\n<li>Store it as a <a href=\"https:\/\/devopscube.com\/build-docker-image-kubernetes-pod\/\" rel=\"noreferrer\">container image<\/a>.<\/li>\n<li>Store it in your clusters <a href=\"https:\/\/devopscube.com\/provsion-persistent-volume-on-eks\/\" rel=\"noreferrer\">Persistent Volume<\/a>.<\/li>\n<\/ol>\n<p>For other storage options, refer the <a href=\"https:\/\/kserve.github.io\/website\/docs\/model-serving\/storage\/overview?ref=devopscube.com\">offical documentation<\/a>.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">In this guide, we are going to store the <b><strong style=\"white-space: pre-wrap;\">model in a PVC<\/strong><\/b> and use it for the deployment.<\/div>\n<\/div>\n<h2 id=\"ml-model-details-context\">ML Model Details &amp; Context<\/h2>\n<p>For this setup, <\/p>\n<p>We will use a sample model that is a <a href=\"https:\/\/scikit-learn.org\/stable\/?ref=devopscube.com\" rel=\"noreferrer\"><strong>scikit-learn<\/strong><\/a><strong> text classification<\/strong> pipeline that categorizes words into three types.<\/p>\n<ul>\n<li><strong>Animals<\/strong> (label: 0)<\/li>\n<li><strong>Birds<\/strong> (label: 1)<\/li>\n<li><strong>Plants<\/strong> (label: 2)<\/li>\n<\/ul>\n<p>You give a word to the model and it tells you whether that word represents an animal, bird, or plant. This is a common type of machine learning problem called <strong>supervised learning classification<\/strong>.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">The pre-trained <code spellcheck=\"false\" style=\"white-space: pre-wrap;\">model.pkl<\/code> file is already included in the GitHub repository, so you don&#8217;t need to create or train any model yourself.<\/p>\n<p>This model is provided <b><strong style=\"white-space: pre-wrap;\">purely for learning purposes<\/strong><\/b> to demonstrate KServe deployment concepts.<\/div>\n<\/div>\n<h2 id=\"setup-prerequisites\">Setup Prerequisites<\/h2>\n<p>Following are the pre-requisites to follow this setup.<\/p>\n<ol>\n<li><a href=\"https:\/\/devopscube.com\/setup-kubernetes-cluster-kubeadm\/\" rel=\"noreferrer\">Kubernetes Cluster<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/kubectl-set-context\/\" rel=\"noreferrer\">Kubectl<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/what-is-docker\/\" rel=\"noreferrer\">Docker<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/install-configure-helm-kubernetes\/\" rel=\"noreferrer\">Helm<\/a><\/li>\n<\/ol>\n<p>Lets get started.<\/p>\n<h2 id=\"install-kserve-in-kubernetes\">Install KServe in Kubernetes<\/h2>\n<p>Follow the steps below to install KServe on your Cluster.<\/p>\n<h3 id=\"step-1-install-cert-manager\">Step 1: Install Cert Manager<\/h3>\n<p>Let&#8217;s start with installing Cert Manager, which is essential for creating and managing <a href=\"https:\/\/devopscube.com\/configure-ingress-tls-kubernetes\/\" rel=\"noreferrer\">TLS certificates<\/a> for KServe.<\/p>\n<p>You refer to the official site for the <a href=\"https:\/\/cert-manager.io\/docs\/releases\/?ref=devopscube.com\">latest version<\/a>.<\/p>\n<p>Run the following command to install <a href=\"https:\/\/devopscube.com\/nginx-ingress-with-cert-manager\/\" rel=\"noreferrer\">Cert Manager<\/a>. All the components gets deployed in the <strong><code>cert-manager<\/code><\/strong> namespace.<\/p>\n<pre><code class=\"language-bash\">kubectl apply -f https:\/\/github.com\/cert-manager\/cert-manager\/releases\/download\/v1.19.0\/cert-manager.yaml<\/code><\/pre>\n<p>Ensure all the three cert-manager components are in running state.<\/p>\n<pre><code class=\"language-bash\">kubectl get po -n cert-manager<\/code><\/pre>\n<h3 id=\"step-2-install-kserve-crds\">Step 2: Install KServe CRDs<\/h3>\n<p>Once the Cert Manager is installed, install KServe. <\/p>\n<p>First, start by installing the required CRDs of KServe using helm.<\/p>\n<p>To install the CRDs, run the following command. It also creates the kserve namespace where the <strong><code>kserve<\/code><\/strong> controller will be deployed.<\/p>\n<pre><code class=\"language-bash\">helm install kserve-crd oci:\/\/ghcr.io\/kserve\/charts\/kserve-crd --version v0.16.0 -n kserve --create-namespace<\/code><\/pre>\n<p>Now, verify the kserver CRD&#8217;s<\/p>\n<pre><code class=\"language-bash\">kubectl get crds | grep kserve<\/code><\/pre>\n<h3 id=\"step-3-deploy-kserve-controller\">Step 3: Deploy KServe Controller<\/h3>\n<p>Now run the following command to install the KServe controller.<\/p>\n<pre><code class=\"language-bash\">helm install kserve oci:\/\/ghcr.io\/kserve\/charts\/kserve --version v0.16.0 \\\n --set kserve.controller.deploymentMode=Standard \\\n -n kserve<\/code><\/pre>\n<p>In the above command, you can see a flag that specifies the deployment mode to <code>Standard<\/code>.<\/p>\n<p>Verify if kserve contoller is in running state. The controller pod runs <strong><code>kube-rbac-proxy<\/code><\/strong> and the controller containers.<\/p>\n<pre><code class=\"language-bash\">$ kubectl get po -n kserve\n\nNAME                                        READY   STATUS    RESTARTS   AGE\nkserve-controller-manager-59d84566d-grswq   2\/2     Running   0          103s<\/code><\/pre>\n<p>Now that the setup is done, let&#8217;s move on to the model deployment.<\/p>\n<h2 id=\"kserve-sample-project-repository\">KServe Sample Project Repository<\/h2>\n<p>All the files and model we are going to use in this guide are from our GitHub repository.<\/p>\n<p>Run the following command to clone the repository.<\/p>\n<pre><code class=\"language-bash\">git clone https:\/\/github.com\/devopscube\/predictor-model.git<\/code><\/pre>\n<p>You can see the following directory structure.<\/p>\n<pre><code class=\"language-bash\">predictor-model\n    \u251c\u2500\u2500 Dockerfile\n    \u251c\u2500\u2500 README.md\n    \u251c\u2500\u2500 inference.yaml\n    \u251c\u2500\u2500 job.yaml\n    \u2514\u2500\u2500 model\n        \u2514\u2500\u2500 model.pkl<\/code><\/pre>\n<ol>\n<li><strong>Dockerfile<\/strong>&nbsp;&#8211; For dockerizing the model.<\/li>\n<li><strong>inference.yaml<\/strong> &#8211; Manifest file to create a <a href=\"https:\/\/devopscube.com\/kubernetes-resoruces\/\" rel=\"noreferrer\">kubernetes resource<\/a> that hosts the model on Kubernetes using KServe.<\/li>\n<li><strong>job.yaml<\/strong> &#8211; Manifest that create a PVC and a job that copies the model into the PVC.<\/li>\n<\/ol>\n<p>CD into the <code>predictor-model<\/code> directory and follow the steps below.<\/p>\n<pre><code class=\"language-bash\">cd predictor-model<\/code><\/pre>\n<h2 id=\"deploy-a-sample-ml-model-with-kserve\">Deploy a Sample ML Model with KServe<\/h2>\n<p>Follow the steps given below to deploy the model.<\/p>\n<h3 id=\"step-1-dockerize-the-model-optional\">Step 1: Dockerize the Model (Optional)<\/h3>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\u26a0\ufe0f<\/div>\n<div class=\"kg-callout-text\">Use this section if you are going to create your own container image. <\/p>\n<p>If you dont want to build your own image, use our <code spellcheck=\"false\" style=\"white-space: pre-wrap;\">devopscube\/predictor-model:1.0<\/code> image to follow the tutorial.<\/div>\n<\/div>\n<p>The Dockerfile used to Dockerize the model is given below.<\/p>\n<pre><code class=\"language-Dockerfile\">FROM alpine:latest\nWORKDIR \/app\nCOPY model\/ .\/model\/<\/code><\/pre>\n<p>Here is the <a href=\"https:\/\/devopscube.com\/create-dockerfile-using-docker-init\/\" rel=\"noreferrer\">Dockerfile<\/a> explanation.<\/p>\n<ol>\n<li>The Dockerfile uses the <code>alpine:latest<\/code> as the base image.<\/li>\n<li>It sets the <code>\/app<\/code> directory as the work directory and copies the model directory and the model file inside it to the <code>\/app<\/code> directory.<\/li>\n<\/ol>\n<p>Now, run the following command to dockerize the model.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\u26a0\ufe0f<\/div>\n<div class=\"kg-callout-text\">Update your Docker registry name in below command or use our image.<\/div>\n<\/div>\n<pre><code class=\"language-bash\">docker build -t devopscube\/predictor-model:1.0 .\n<\/code><\/pre>\n<p>Once it&#8217;s built, run the following command to push the image to the registry.<\/p>\n<pre><code class=\"language-bash\">docker push devopscube\/predictor-model:1.0<\/code><\/pre>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">In real-world production setups, models are usually stored in <b><strong style=\"white-space: pre-wrap;\">cloud storage<\/strong><\/b>. This is because ML models can be very large (hundreds of MBs or even GBs), and storing them in object storage allows for better scalability and retrieval.<\/p>\n<p>Another Kubernete native ML feature called <a href=\"https:\/\/devopscube.com\/oci-image-volume-kubernetes-pods\/\" rel=\"noreferrer\">imageVolumes<\/a> let you package models as contianer image and mount as volumes in Pods.<\/div>\n<\/div>\n<h3 id=\"step-2-store-the-model-in-pvc\">Step 2: Store the Model in PVC<\/h3>\n<p>To copy the model into a PV, we are going to create a PVC and a <a href=\"https:\/\/devopscube.com\/create-kubernetes-jobs-cron-jobs\/\" rel=\"noreferrer\">job<\/a> that copies the model to the PVC<\/p>\n<p>Here is the <strong><code>job.yaml<\/code><\/strong> with PVC manifest.<\/p>\n<pre><code class=\"language-yaml\">apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: predictor-model-pvc\nspec:\n  accessModes:\n  - ReadWriteOnce\n  resources:\n    requests:\n      storage: 5Gi\n---\napiVersion: batch\/v1\nkind: Job\nmetadata:\n  name: predictor-model-copy-job\nspec:\n  ttlSecondsAfterFinished: 10\n  backoffLimit: 1\n  template:\n    spec:\n      restartPolicy: OnFailure\n      containers:\n      - name: model-writer\n        image: devopscube\/predictor-model:1.0\n        command: [ \"\/bin\/sh\", \"-c\" ]\n        args:\n        - |\n          echo \"&gt;&gt;&gt; Copying model to PVC...\";\n          cp -r \/app\/model\/* \/mnt\/models\/;\n          echo \"&gt;&gt;&gt; Verifying contents in PVC...\";\n          ls -lh \/mnt\/models;\n          echo \"&gt;&gt;&gt; Verification complete. Job finished.\";\n        volumeMounts:\n        - name: model-storage\n          mountPath: \/mnt\/models\n      volumes:\n      - name: model-storage\n        persistentVolumeClaim:\n          claimName: predictor-model-pvc\n<\/code><\/pre>\n<p>This file creates a PVC, a Job that copies the model into a PV, and the pod gets deleted after 10 seconds once the job is completed.<\/p>\n<p>Run the following to apply the manifest<\/p>\n<pre><code class=\"language-bash\">kubeclt apply -f job.yaml<\/code><\/pre>\n<p>If you check the logs of the pod you will get the following output.<\/p>\n<pre><code class=\"language-bash\">$ kubectl logs job\/predictor-model-copy-job\n\n&gt;&gt;&gt; Copying model to PVC...\n&gt;&gt;&gt; Verifying contents in PVC...\ntotal 4K     \n-rw-r--r--    1 root     root        1.7K Sep 16 06:57 model.pkl\n&gt;&gt;&gt; Verification complete. Job finished.<\/code><\/pre>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\u26a0\ufe0f<\/div>\n<div class=\"kg-callout-text\">If the job pod is stuck in Pending state, check PVC and storage class and ensure the PVC is created and in bound state.<\/div>\n<\/div>\n<h3 id=\"step-3-deploy-inferenceservice-resource\">Step 3: Deploy InferenceService Resource<\/h3>\n<p>To deploy the model, we are going to apply the <strong><code>inference.yaml<\/code><\/strong> file to create KServe&#8217;s inference resource on the cluster.<\/p>\n<pre><code class=\"language-yaml\">apiVersion: serving.kserve.io\/v1beta1\nkind: InferenceService\nmetadata:\n  name: model\nspec:\n  predictor:\n    sklearn:\n      storageUri: pvc:\/\/predictor-model-pvc\n      resources:\n        requests:\n          cpu: 500m\n          memory: 1Gi\n<\/code><\/pre>\n<p>In the above manifest,<\/p>\n<ul>\n<li>The <code>spec.predictor<\/code> section defines how the model will be served<\/li>\n<li>Since the model we are using is based on scikit-learn, we are using <strong>sklearn <\/strong>framework.<\/li>\n<li><code>storageUri: pvc:\/\/predictor-model-pvc<\/code> means the model files are stored on a Persistent Volume Claim (PVC) named &#8220;predictor-model-pvc&#8221;.<\/li>\n<\/ul>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Since we are using the <code spellcheck=\"false\" style=\"white-space: pre-wrap;\">sklearn<\/code> predictor block, KServe knows that we want to use the <b><strong style=\"white-space: pre-wrap;\">built-in SKLearn model server image<\/strong><\/b>.<\/p>\n<p>KServe will pull the default <b><code spellcheck=\"false\" style=\"white-space: pre-wrap;\"><strong>kserve\/sklearnserver:&lt;version&gt;<\/strong><\/code><\/b> and run the model in it.<\/div>\n<\/div>\n<p>Run the following command to apply the inference service manifest.<\/p>\n<pre><code class=\"language-bash\">kubectl apply -f inference.yaml<\/code><\/pre>\n<p>Then run the following command to check if the related objects are created by the inferenceService object.<\/p>\n<pre><code class=\"language-bash\">kubectl get po,svc,hpa,inferenceservice<\/code><\/pre>\n<p>You will get the following output.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/09\/image-12.png\" class=\"kg-image\" alt=\"verifing if the objects are created\" loading=\"lazy\" width=\"1272\" height=\"694\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/09\/image-12.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/09\/image-12.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/09\/image-12.png 1272w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\u26a0\ufe0f<\/div>\n<div class=\"kg-callout-text\">If the model-predictor pod is stuck in Pending state check the node CPU and memory resources. Ensure atleast 1 GB memory is present in the nodes. <\/div>\n<\/div>\n<p>The internal endpoint will be:<\/p>\n<pre><code class=\"language-bash\">http:\/\/model-predictor.default.svc.cluster.local\/v1\/models\/model:predict<\/code><\/pre>\n<p>In the above endpoint:<\/p>\n<ul>\n<li><strong>model-predictor.default.svc.cluster.local<\/strong> &#8211; Internal DNS of the service attached to the <a href=\"https:\/\/devopscube.com\/kubernetes-pod\/\" rel=\"noreferrer\">pod<\/a>.<\/li>\n<li><strong>v1\/models\/<\/strong> &#8211; API version<\/li>\n<li><strong>model<\/strong> &#8211; Name of the inference service<\/li>\n<li><strong>predict<\/strong> &#8211; It&#8217;s a standard endpoint for predictor models.<\/li>\n<\/ul>\n<h2 id=\"test-the-kserve-inference-endpoint\">Test the KServe Inference Endpoint<\/h2>\n<p>To test the model, we are going to port forward the deployment service and send a request to it using curl.<\/p>\n<p>Run the following command to port-forward the service.<\/p>\n<pre><code class=\"language-bash\">kubectl port-forward service\/model-predictor 8000:80<\/code><\/pre>\n<p>Then, run the following command to send the request for prediction.<\/p>\n<pre><code class=\"language-bash\">curl -X POST \\\n     -H \"Content-Type: application\/json\" \\\n     -d '{\n           \"instances\": [\n             \"sparrow\",\n             \"elephant\",\n             \"sunflower\"\n           ]\n         }' \\\n     \"http:\/\/localhost:8000\/v1\/models\/model:predict\"\n<\/code><\/pre>\n<p>You will get the following output.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/09\/image-10.png\" class=\"kg-image\" alt=\"checking output from the curl request\" loading=\"lazy\" width=\"2000\" height=\"1343\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/09\/image-10.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/09\/image-10.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2025\/09\/image-10.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/09\/image-10.png 2032w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>In the prediction model, <code>0<\/code> is for animal, <code>1<\/code> is for bird, and <code>2<\/code> is for the plant.<\/p>\n<p>And the predictor output is correct based on the input.<\/p>\n<p>Thats it! <\/p>\n<p>You have deployed a model using Kserver and made a inference request!<\/p>\n<h2 id=\"kserve-faqs\">Kserve FAQ&#8217;s<\/h2>\n<p>Following are some of the frequently asked questions about Kserve.<\/p>\n<h3 id=\"what-is-the-difference-between-kserve-rawdeployment-and-knative-mode\">What is the difference between KServe RawDeployment and Knative mode?<\/h3>\n<p>RawDeployment is lightweight and easier deployment option for beginners. While, Knative mode supports advanced use cases with autoscaling and networking functionalities.<\/p>\n<h3 id=\"how-can-i-test-a-deployed-kserve-model\">How can I test a deployed KServe model?<\/h3>\n<p>The easiest way is to port-forward the inference service and send test requests with <code>curl<\/code> or a program to the inference endpoint.<\/p>\n<h3 id=\"can-i-deploy-models-other-than-scikit-learn-with-kserve\">Can I deploy models other than scikit-learn with KServe?<\/h3>\n<p>Yes. KServe supports TensorFlow, PyTorch, XGBoost, ONNX, and more.<\/p>\n<h3 id=\"what-should-i-do-if-my-kserve-pod-is-stuck-in-pending-state\">What should I do if my KServe pod is stuck in Pending state?<\/h3>\n<p>Check PVC storage, node CPU and memory, and controller logs. Ensure at least 1 GB memory is available while deploying simple models.<\/p>\n<h3 id=\"is-kserve-used-by-companies-in-production\">Is KServe used by companies in production?<\/h3>\n<p>Yes. Companies like Bloomberg, IBM, Red Hat, Gojek, and Cisco use KServe in their production environments.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>You have just taken your first step into MLOps. The practice of deploying and maintaining machine learning system.<\/p>\n<p>In summary, you have learned how to serve ML models using KServe and how to check if the model is served successfully by sending a curl request to it.<\/p>\n<p>Give it a try and if you have any doubts or face any issues, drop a comment below. We will help you out.<\/p>\n<p>Also, if you want to know more about AI\/ML features of Kubernetes, refer the <a href=\"https:\/\/devopscube.com\/kubernetes-ai-ml-features\/\" rel=\"noreferrer\">Kubernetes AI\/ML features<\/a> blog.<\/p>\n<hr>\n<p><strong>Ngu\u1ed3n:<\/strong> <a href=\"https:\/\/devopscube.com\/deploy-ml-model-kubernetes-kserve\/\" target=\"_blank\" rel=\"noopener noreferrer\">Deploy ML Model on Kubernetes with KServe (Step-by-Step Guide \u2014 DevOpsCube<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: https:\/\/devopscube.com\/deploy-ml-model-kubernetes-kserve\/<\/p>\n","protected":false},"author":1,"featured_media":342,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-341","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/341","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=341"}],"version-history":[{"count":0,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/341\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/media\/342"}],"wp:attachment":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=341"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=341"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}