{"id":299,"date":"2026-03-04T04:35:47","date_gmt":"2026-03-04T04:35:47","guid":{"rendered":"https:\/\/blog.ngocha.biz\/?p=299"},"modified":"2026-03-04T04:35:47","modified_gmt":"2026-03-04T04:35:47","slug":"setup-gpu-operator-kubernetes","status":"publish","type":"post","link":"https:\/\/blog.ngocha.biz\/?p=299","title":{"rendered":"Setup GPU Operator on Kubernetes (Detailed Guide)"},"content":{"rendered":"<p>Looking for a guide to setup GPU operator on Kubernetes or to understand how to use GPUs with Kubernetes? This blog is for you.<\/p>\n<p>By the end of this blog, you will have a clear understanding of:<\/p>\n<ul>\n<li>Need for GPU operator on Kubernetes<\/li>\n<li>Setting up NVIDIA GPU Operator on a <a href=\"https:\/\/devopscube.com\/production-ready-kubernetes-cluster\/\" rel=\"noreferrer\">Kubernetes cluster<\/a><\/li>\n<li>Verify if Kubernetes detects GPUs<\/li>\n<li>How to deploy a real GPU-based workload to validate the full stack.<\/li>\n<\/ul>\n<p><!--kg-card-begin: html--><br \/>\n<iframe loading=\"lazy\" src=\"https:\/\/embeds.beehiiv.com\/2a495ef4-3de7-4600-8a0d-de5dc968b372\" data-test-id=\"beehiiv-embed\" width=\"100%\" height=\"320\" frameborder=\"0\" scrolling=\"no\" style=\"border-radius: 4px; border: 2px solid #e5e7eb; margin: 0; background-color: transparent;\"><\/iframe><br \/>\n<!--kg-card-end: html--><\/p>\n<p>Lets get started.<\/p>\n<h2 id=\"why-kubernetes-cant-see-your-gpu\">Why Kubernetes Can&#8217;t See Your GPU<\/h2>\n<p>By default, <a href=\"https:\/\/devopscube.com\/kubernetes-tutorials-beginners\/\" rel=\"noreferrer\">Kubernetes<\/a> will only have knowledge about CPUs and memory. <\/p>\n<p>If you are provisioning GPU nodes for <a href=\"https:\/\/devopscube.com\/kubernetes-architecture-explained\/\" rel=\"noreferrer\">Kubernetes<\/a>, it will not have knowledge about whether a GPU is attached to the node. The reason os GPU&#8217;s are vendor specific hardware (NVIDIA, AMD, intel etc.<\/p>\n<p><a href=\"https:\/\/devopscube.com\/kubernetes-ai-ml-features\/#kubernetes-device-plugins-stable\" rel=\"noreferrer\">Kubernetes Device Plugins<\/a> solves this by letting Kubernetes know about the GPU node details. Meaning, it helps hardware vendors&nbsp; (like NVIDIA or AMD GPUs) register their devices with the kubelet.<\/p>\n<p>Now, installing device plugin and drivers for the GPU nodes manually is a complex task. This is where Kubernetes GPU operators come in. The GPU Operators will automate the installation of device plugins, drivers, <a href=\"https:\/\/devopscube.com\/node-feature-discovery\/\" rel=\"noreferrer\">Node Feature Discovery<\/a>, runtimes, etc. <\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-4.png\" class=\"kg-image\" alt=\"\" loading=\"lazy\" width=\"2000\" height=\"1294\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/image-4.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/03\/image-4.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/03\/image-4.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w2400\/2026\/03\/image-4.png 2400w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Here is an important thing. <\/p>\n<p>There is no standard operator for GPU&#8217;s. You have to choose the <strong>operator based on the hardware<\/strong> you are using. For example, if the GPU you are using is NVIDIA, you have to install the <strong>NVIDIA GPU Operator <\/strong>(one we are using in this guide).<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Note<\/strong><\/b>: If you are using GPU nodes of managed Kubernetes services like <a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/ml-eks-optimized-ami.html?ref=devopscube.com#eks-amis-nvidia-al2023\" rel=\"noreferrer\">EKS<\/a>, <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/aks\/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool&#038;ref=devopscube.com#:~:text=By%20default%2C%20Microsoft%20automatically%20maintains%20the%20version%20of%20the%20NVIDIA%20drivers%20as%20part%20of%20the%20node%20image%20deployment%2C%20and%20AKS%20supports%20and%20manages%20it.%20While%20the%20NVIDIA%20drivers%20are%20installed%20by%20default%20on%20GPU%20capable%20nodes%2C%20you%20need%20to%20install%20the%20device%20plugin.\" rel=\"noreferrer\">AKS<\/a>, <a href=\"https:\/\/docs.digitalocean.com\/products\/kubernetes\/?ref=devopscube.com\" rel=\"noreferrer\">DOKS<\/a>, they may have pre installed GPU drivers on the GPU nodes.<\/p>\n<p>In that case, the NVIDIA GPU Operator will skip the driver installation.<\/p><\/div>\n<\/div>\n<h2 id=\"how-gpu-operator-works\">How GPU Operator works<\/h2>\n<p>Before we get in to hands-on lets understand how a GPU Operator works.<\/p>\n<p>Here is how it works.<\/p>\n<ul>\n<li>Kubelet collects the hardware details of the node (via cAdvisor) and sends them to the Scheduler. Kubelet has no awareness about the GPU at this stage.<\/li>\n<li>Once we install the GPU operator, it installs a device plugin that collects the GPU details and registers&nbsp;<code>nvidia.com\/gpu<\/code>&nbsp;as an extended resource with kubelet.<\/li>\n<li>Once the GPU details are registered, you can deploy GPU based applications and request for GPU resources (For example, llama model).<\/li>\n<\/ul>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">&nbsp;kubelet does NOT query GPU hardware directly. It relies entirely on the device plugin to discover and register GPU resources.&nbsp;<\/div>\n<\/div>\n<p>The following diagram illustrated how GPU nodes gets exposed to Kubernetes.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/gpu-operator10.png\" class=\"kg-image\" alt=\"high level overview of how GPU nodes will be exposed to Kubernetes\" loading=\"lazy\" width=\"2000\" height=\"1189\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/gpu-operator10.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/03\/gpu-operator10.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/03\/gpu-operator10.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w2400\/2026\/03\/gpu-operator10.png 2400w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>In the diagram, not all operator components are shown; only a few important components are shown.<\/p>\n<p>These are the components that will be deployed by the NVIDIA GPU Operator.<\/p>\n<ul>\n<li><code>gpu-operator<\/code> &#8211; Main controller of the operator which install and manages all components.<\/li>\n<li><code>nvidia-driver-daemonset<\/code> &#8211; Install the NVIDIA GPU drivers on each GPU node so the node can use the GPU.<\/li>\n<li><code>gpu-operator-node-feature-discovery.*<\/code> &#8211; Detects hardware details and adds as node labels.<\/li>\n<li><code>nvidia-container-toolkit-daemonset<\/code> &#8211; Runs on all nodes and installed GPU runtimes for containers to use GPUs.<\/li>\n<li><code>nvidia-cuda-validator<\/code> &#8211; Runs tests on the nodes to verify if drivers, plugins, and runtimes are installed on the node.<\/li>\n<li><code>nvidia-dcgm-exporter<\/code> &#8211; This exports GPU metrics which can be scraped using Prometheus.<\/li>\n<li><code>nvidia-device-plugin-daemonset<\/code> &#8211; This is the component that registers the GPU details to kubernetes<\/li>\n<li><code>nvidia-operator-validator<\/code> &#8211; Checks the health of  GPU operator<\/li>\n<\/ul>\n<p>Let&#8217;s start the Nvidia GPU Operator setup.<\/p>\n<h2 id=\"setup-prerequisites\">Setup Prerequisites<\/h2>\n<p>Below are the prerequisites required for this blog.<\/p>\n<ul>\n<li><a href=\"https:\/\/devopscube.com\/kubernetes-kind-cluster-tutorial-setup-and-deploy-apps\/\" rel=\"noreferrer\">Kubernetes cluster<\/a> with GPU-enabled nodes<\/li>\n<li><a href=\"https:\/\/devopscube.com\/kubectl-set-context\/\" rel=\"noreferrer\">kubectl<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/helm-best-practices-essential-tips-to-know\/\" rel=\"noreferrer\">Helm<\/a><\/li>\n<\/ul>\n<h2 id=\"set-up-the-nvidia-gpu-operator\">Set Up the NVIDIA GPU Operator<\/h2>\n<p>Follow the steps given below to setup the operator.<\/p>\n<h3 id=\"step-1-label-and-taint-gpu-nodes\">Step 1: Label and Taint GPU Nodes<\/h3>\n<p>In mixed clusters (GPU + non-GPU nodes), you must prevent non-GPU workloads from consuming GPU node resources. We can use taints and labels to enforce this segreration.<\/p>\n<p>Use the following command to add a taint and label the node.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Update your GPU nodes name in below command before running it.<\/div>\n<\/div>\n<pre><code class=\"language-bash\">kubectl taint node &lt;your-gpu-node-name&gt; nvidia.com\/gpu=present:NoSchedule\n\nkubectl label node &lt;your-gpu-node-name&gt; node-type=gpu<\/code><\/pre>\n<p>This is one of the best practises for GPU nodes.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">In managed GPU nodes (EKS, AKS etc the taints <code spellcheck=\"false\" style=\"white-space: pre-wrap;\">nvidia.com\/gpu=present:NoSchedule<\/code> will be added as default.<\/div>\n<\/div>\n<h3 id=\"step-2-add-nvidia-gpu-operator-helm-repo\">Step 2: Add Nvidia GPU Operator Helm Repo<\/h3>\n<p>We are going to use Helm to install the Nvidia GPU operator, lets add the Helm repo for the chart.<\/p>\n<p>Run the following command to add the repo and update the Helm repos.<\/p>\n<pre><code class=\"language-bash\">helm repo add nvidia https:\/\/helm.ngc.nvidia.com\/nvidia \\\n    &amp;&amp; helm repo update<\/code><\/pre>\n<p>And run the following command to verify if the NVIDIA repo has added.<\/p>\n<pre><code class=\"language-bash\">helm search repo nvidia<\/code><\/pre>\n<p>You will see the following charts.<\/p>\n<pre><code class=\"language-bash\">NAME                               \tCHART VERSION\tAPP VERSION\tDESCRIPTION\nnvidia\/nvidia-device-plugin        \t0.9.0        \t0.9.0      \t\nnvidia\/nvidia-dra-driver-gpu       \t25.12.0      \t25.12.0    \t\nnvidia\/cybersecurity-dfp           \t0.2.1        \t23.07      \t\nnvidia\/cybersecurity-sp            \t0.1.0        \t23.07      \t\nnvidia\/deepstream-its              \t0.2.0        \t1.0        \t\nnvidia\/dps                         \t0.7.8        \t0.7.8      \t\nnvidia\/dps-bmc-simulator           \t0.7.8        \t0.7.8      \t\nnvidia\/ds-face-mask-detection      \t1.0.0        \t1.2        \t\nnvidia\/ds-lipactivity              \t0.0.1        \t0.0.1      \t\nnvidia\/fed-svr-3                   \t0.9.0        \t1.0        \t\nnvidia\/fed-wrk-3                   \t0.9.0        \t1.0        \t\nnvidia\/gpu-operator                \tv25.10.1     \tv25.10.1   \t\nnvidia\/harbor-reef-operator        \t1.0.1        \t1.0.0      \t\nnvidia\/isaac-lab-teleop            \t2.2.0        \t0.0.0      \t\nnvidia\/k8s-nim-operator            \t3.0.2        \t3.0.2      \t\nnvidia\/network-operator            \t25.10.0      \tv25.10.0   \t\nnvidia\/nspect_test_policy_org_chart\t1            \t1.16.0     \t\nnvidia\/nvsm                        \t1.0.1        \t1.0.1      \t\nnvidia\/tensorrt-inference-server   \t1.0.0        \t1.0        \t\nnvidia\/tensorrtinferenceserver     \t1.0.0        \t1.0        \t\nnvidia\/tritoninferenceserver_aws   \t0.1.0        \t1.16.0     \t\nnvidia\/video-analytics-demo        \t0.1.9        \t1.2        \t\nnvidia\/video-analytics-demo-l4t    \t0.1.3        \t0.1.3      \t<\/code><\/pre>\n<h3 id=\"step-3-install-nvidia-gpu-operator\">Step 3: Install Nvidia GPU Operator<\/h3>\n<p>Let&#8217;s install the Nvidia GPU Operator.<\/p>\n<p>The GPU operator handles everything like installing and managing drivers, plugins, runtimes, etc.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">The Nvidia GPU Operator Helm chart has toleration for the taint we added in the previous step by default. So there is no need for any additional configuration.<\/div>\n<\/div>\n<p>Now, use the following Helm command to install the operator.<\/p>\n<pre><code class=\"language-bash\">helm install --wait gpu-operator \\\n  -n gpu-operator --create-namespace \\\n  nvidia\/gpu-operator<\/code><\/pre>\n<p>Then, run the following command to check if the pods are up and running.<\/p>\n<pre><code class=\"language-bash\">kubectl get po -n gpu-operator<\/code><\/pre>\n<p>You will get the following output.<\/p>\n<pre><code class=\"language-bash\">gpu-feature-discovery-kbcth                                 1\/1     Running     0          43s\ngpu-operator-7569f8b499-dgzql                               1\/1     Running     0          57s\ngpu-operator-node-feature-discovery-gc-55ffc49ccc-6bvmc     1\/1     Running     0          57s\ngpu-operator-node-feature-discovery-master-6b5787f695-lg584 1\/1     Running     0          57s\ngpu-operator-node-feature-discovery-worker-f87lc            1\/1     Running     0          57s\ngpu-operator-node-feature-discovery-worker-nggs5            1\/1     Running     0          57s\nnvidia-driver-daemonset-dgzql                               1\/1     Running     0          57s\nnvidia-container-toolkit-daemonset-92vlv                    1\/1     Running     0          44s\nnvidia-cuda-validator-6pwtz                                 0\/1     Completed   0          37s\nnvidia-dcgm-exporter-l92vx                                  1\/1     Running     0          43s\nnvidia-device-plugin-daemonset-5t52d                        1\/1     Running     0          44s\nnvidia-operator-validator-4wcc6                             1\/1     Running     0          44s<\/code><\/pre>\n<h3 id=\"step-4-verify-gpu-detection\">Step 4: Verify GPU Detection <\/h3>\n<p>Now, let&#8217;s verify if the GPU nodes are detected by Kubernetes.<\/p>\n<p>Run the following command to check for the GPU capacity.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Update your GPU nodes name in below command before running it.<\/div>\n<\/div>\n<pre><code class=\"language-bash\">kubectl describe node &lt;gpu-node-name&gt; | grep -A6 \"Capacity\"<\/code><\/pre>\n<p>You will get the following output.<\/p>\n<pre><code class=\"language-bash\">Capacity:\n  cpu:                24\n  ephemeral-storage:  742911020Ki\n  hugepages-1Gi:      0\n  hugepages-2Mi:      0\n  memory:             247413472Ki\n  nvidia.com\/gpu:     1<\/code><\/pre>\n<p>You can see the <code>nvidia.com\/gpu=1<\/code>, which means 1 GPU is available. The Nvidia GPU operator setup is completed now. <\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Debugging tip:<\/strong><\/b>&nbsp;If&nbsp;<code spellcheck=\"false\" style=\"white-space: pre-wrap;\">nvidia.com\/gpu<\/code>&nbsp;is missing, the device plugin is not running correctly. Check logs with:&nbsp;<code spellcheck=\"false\" style=\"white-space: pre-wrap;\">kubectl logs -n gpu-operator -l app=nvidia-device-plugin-daemonset<\/code><\/div>\n<\/div>\n<p>Next we will deploy a GPU based workload and schedule it on the GPU node. <\/p>\n<h2 id=\"deploy-a-gpu-workload-ollama-llama\">Deploy a GPU Workload: Ollama + Llama<\/h2>\n<p>Let&#8217;s validate the full stack by deploying a real GPU workload. <\/p>\n<p>We will use&nbsp;<a href=\"https:\/\/ollama.com\/?ref=devopscube.com\" rel=\"noreferrer\"><strong>Ollama<\/strong><\/a><strong>, <\/strong>a lightweight runtime for open-source LLMs to run Llama 3.2 (3B parameters) on our GPU node.<\/p>\n<p>Let&#8217;s start the Ollama deployment.<\/p>\n<h3 id=\"step-1-add-ollama-helm-repo\">Step 1: Add Ollama Helm Repo<\/h3>\n<p>To install Ollama, we will use a community Helm chart.<\/p>\n<p>Run the following command to add the repo and update the Helm repos.<\/p>\n<pre><code class=\"language-bash\">helm repo add otwld https:\/\/helm.otwld.com\/<\/code><\/pre>\n<p>And run the following command to verify if the NVIDIA repo has added.<\/p>\n<pre><code>helm search repo ollama<\/code><\/pre>\n<p>You will get the following output.<\/p>\n<pre><code class=\"language-bash\">NAME        \tCHART VERSION\tAPP VERSION\notwld\/ollama\t1.43.0       \t0.16.1<\/code><\/pre>\n<h3 id=\"step-2-create-custom-helm-chart\">Step 2: Create Custom Helm Chart<\/h3>\n<p>To install Ollama, we need to make some configuration changes in the values file, like enabling GPU, adding toleration, selecting a node, etc.<\/p>\n<p>Below is the custom values file we used to <a href=\"https:\/\/devopscube.com\/deploying-llama-with-docker-and-vllm\/\">deploy Ollama<\/a>.<\/p>\n<p>Create a <code>ollama.yaml<\/code>  file and copy the following content.<\/p>\n<pre><code class=\"language-yaml\">ollama:\n  gpu:\n    enabled: true\n    type: nvidia\n    number: 1\n\n  models:\n    pull:\n      - llama3.2:3b\n\nnodeSelector:\n  node-type: gpu\n\ntolerations:\n  - key: \"nvidia.com\/gpu\"\n    operator: \"Exists\"\n    effect: \"NoSchedule\"\n\npersistentVolume:\n  enabled: true\n  size: 30Gi\n\nresources:\n  limits:\n    nvidia.com\/gpu: 1<\/code><\/pre>\n<p>If you want to check more configurations in the default values file, use the following command to get the values files.<\/p>\n<pre><code class=\"language-bash\">helm show values otwld\/ollama &gt; values.yaml<\/code><\/pre>\n<h3 id=\"step-3-deploy-ollama\">Step 3: Deploy Ollama<\/h3>\n<p>Lets use the custom values file and install Ollama.<\/p>\n<p>Use the following command to install it.<\/p>\n<pre><code class=\"language-bash\">helm install ollama otwld\/ollama \\\n  --namespace ollama \\\n  --create-namespace \\\n  -f ollama.yaml<\/code><\/pre>\n<p>Once the command is run, use the following command to verify if the pods are up and running.<\/p>\n<pre><code>kubectl get po -n ollama<\/code><\/pre>\n<p>You will get the following output.<\/p>\n<pre><code class=\"language-bash\">NAME                     READY   STATUS    RESTARTS   AGE\n\nollama-cc9b55478-v9ddq   1\/1     Running   0          7m50s<\/code><\/pre>\n<h3 id=\"step-4-verify-gpu-usage\">Step 4: Verify GPU Usage<\/h3>\n<p>To verify if the pod is using a GPU, use the following command.<\/p>\n<pre><code class=\"language-bash\">kubectl exec -n ollama deployment\/ollama -- nvidia-smi<\/code><\/pre>\n<p>You can see the GPUs name as shown below. <\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-1.png\" class=\"kg-image\" alt=\"output to show that the pod is using GPU\" loading=\"lazy\" width=\"1288\" height=\"714\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/image-1.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/03\/image-1.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-1.png 1288w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<h3 id=\"step-5-query-the-model\">Step 5: Query the Model<\/h3>\n<p>Now, Ollama pods are up and running, let&#8217;s send a query to the llama model running on Ollama.<\/p>\n<p>Since the services are in Cluster IP, we will expose them using port forwarding.<\/p>\n<p>Use the following command to port forward the service.<\/p>\n<pre><code>kubectl port-forward svc\/ollama 11434:11434 -n ollama\n<\/code><\/pre>\n<p>Once it&#8217;s port forwarded, run the following command with the query as shown below.<\/p>\n<pre><code>curl -s http:\/\/localhost:11434\/api\/generate \\\n  -d '{\n    \"model\": \"llama3.2:3b\",\n    \"prompt\": \"What do you know about Kubernetes?\",\n    \"stream\": false\n  }' | jq -r '.response'<\/code><\/pre>\n<p>You will get an output similar to what is shown below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image.png\" class=\"kg-image\" alt=\"query output give my the llm model\" loading=\"lazy\" width=\"733\" height=\"418\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/image.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image.png 733w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>You now have a complete picture of GPU scheduling on Kubernetes.<\/p>\n<p>You have learned about setting up a GPU operator on Kubernetes and deployed a real LLM on a GPU node. <\/p>\n<p>When it comes to <a href=\"https:\/\/devopscube.com\/devops-to-mlops\/\" rel=\"noreferrer\">MLOps<\/a>, understanding GPU node on Kubernetes is very important as many ML workloads need GPU based clusters.<\/p>\n<p>Having said that, getting GPUs working is step one. There are different strategies in managing GPUs in Kubernetes for the efficient use of GPU resources. For example, <strong>GPU Time-Slicing, Multi instance GPU (MIG)<\/strong> etc.. <\/p>\n<p>We will look at in the upcoming hands-on blogs.<\/p>\n<p>If you are facing issues when setting up GPU operator, do let us know in the comments.<\/p>\n<hr>\n<p><strong>Ngu\u1ed3n:<\/strong> <a href=\"https:\/\/devopscube.com\/setup-gpu-operator-kubernetes\/\" target=\"_blank\" rel=\"noopener noreferrer\">Setup GPU Operator on Kubernetes (Detailed Guide) \u2014 DevOpsCube<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: https:\/\/devopscube.com\/setup-gpu-operator-kubernetes\/<\/p>\n","protected":false},"author":1,"featured_media":300,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-299","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/299","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=299"}],"version-history":[{"count":0,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/299\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/media\/300"}],"wp:attachment":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}