{"id":244,"date":"2026-05-02T06:12:53","date_gmt":"2026-05-02T06:12:53","guid":{"rendered":"https:\/\/blog.ngocha.biz\/?p=244"},"modified":"2026-05-02T06:12:53","modified_gmt":"2026-05-02T06:12:53","slug":"kubeflow-docker-image-optmization","status":"publish","type":"post","link":"https:\/\/blog.ngocha.biz\/?p=244","title":{"rendered":"From 3.17 GB to 354 MB: How I Reduced My Kubeflow Docker Image by 89%"},"content":{"rendered":"<p>In this blog, I shared how I reduced a Kubeflow pipeline Docker image from <strong>3.17 GB down to 354 MB<\/strong>  (an 89% reduction) and the reasoning behind every change I made.<\/p>\n<p>A little backstory.<\/p>\n<p>As part of our <a href=\"https:\/\/devopscube.com\/devops-to-mlops\/\" rel=\"noreferrer\">MLOps<\/a> workflow, I took over an existing Kubeflow pipeline developed by a different ML engineer.<\/p>\n<p>While working on the pipeline, when I checked the image size, it was 3.17 GB. During the discussion with the DevOps team, we agreed this was unacceptable for a pipeline that runs on every model retrain.<\/p>\n<p>So my only goals were,<\/p>\n<ul>\n<li>To <a href=\"https:\/\/devopscube.com\/reduce-docker-image-size\/\" rel=\"noreferrer\">reduce the size of the Docker image<\/a> significantly <\/li>\n<li>Make sure the pipeline still runs without errors<\/li>\n<\/ul>\n<p>Here is what I learned along the way including a few things I wish I had known before I started.<\/p>\n<h2 id=\"the-starting-point-a-bloated-317-gb-docker-image\">The Starting Point: A Bloated 3.17 GB Docker Image<\/h2>\n<p>For running my Kubeflow pipeline, I need to install the following packages and dependencies, which are contained in my <strong><code>requirements.txt<\/code><\/strong> file.<\/p>\n<p>This was the <strong><code>requirements.txt<\/code><\/strong> file before optimization.<\/p>\n<pre><code class=\"language-python\"># training\nscikit-learn\nnumpy\npandas\npandera\nmatplotlib\nseaborn\nboto3\npython-dotenv\n\n# (optional)\npyarrow\njoblib\n# Feast\/pandas can read s3:\/\/ URLs:\ns3fs\n\n# kubeflow\nkfp\nkfp-kubernetes\nkubernetes\nkubeflow\nkubeflow-training\n\n# mlflow\nmlflow\n\n# API \/ Serving \nflask\nflask-cors\n\n# feast-feature-store\nfeast\nfeast[redis]\nfeast[postgres]\n\ntorch<\/code><\/pre>\n<p>And, this was the <strong><code>Dockerfile<\/code><\/strong> before optimization.<\/p>\n<pre><code class=\"language-YAML\">FROM python:3.11-slim\n\n# Install system dependencies\nRUN apt-get update &amp;&amp; apt-get install -y git &amp;&amp; apt-get clean\n\n# Set working directory\nWORKDIR \/app\n\n# Copy entire project\nCOPY . .\n\n# Install dependencies\nRUN pip install --no-cache-dir -r requirements.txt\n\n# Install your project as a package\nRUN pip install --no-cache-dir .<\/code><\/pre>\n<p>When I ran the Docker build command, it built an image of <strong>3.17 GB. <\/strong>And also it took a time of nearly <strong>800 seconds ( 13 minutes ) <\/strong>to build the image.<\/p>\n<pre><code class=\"language-bash\">$ docker images\n\nIMAGE                    ID              DISK USAGE    CONTENT SIZE\nkubeflow_pipeline:v1   09e88c766290        9.47GB        3.17GB<\/code><\/pre>\n<pre><code>[+] Building 797.2s (12\/12) FINISHED<\/code><\/pre>\n<h2 id=\"audit-packages-and-dependencies\">Audit packages and dependencies<\/h2>\n<p>Let&#8217;s see the history of the image build, So that we can <strong>understand at which layer of the build it took more of the size<\/strong>. And remove what packages we dont need at runtime.<\/p>\n<p>For we need to run the command.<\/p>\n<pre><code class=\"language-bash\">docker history kubeflow_pipeline:v1 --no-trunc --format \"table {{.Size}}\\t{{.CreatedBy}}\"<\/code><\/pre>\n<p>Here is the output.<\/p>\n<pre><code class=\"language-Bash\">$ docker history kubeflow_pipeline:v1 --no-trunc --format \"table {{.Size}}\\t{{.CreatedBy}}\"\n\nSIZE      CREATED BY\n319kB     RUN \/bin\/sh -c pip install --no-cache-dir . # buildkit\n5.98GB    RUN \/bin\/sh -c pip install --no-cache-dir -r requirements.txt # buildkit\n10.3MB    COPY . . # buildkit\n8.19kB    WORKDIR \/app\n138MB     RUN \/bin\/sh -c apt-get update &amp;&amp; apt-get install -y git &amp;&amp; apt-get clean # buildkit\n0B        CMD [\"python3\"]<\/code><\/pre>\n<p>As you can see that the <code>requirements.txt<\/code> layer in the docker file takes the most size. <\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Always remember: <\/strong><\/b> If your Docker image size is huge, do not start with the <a href=\"https:\/\/devopscube.com\/create-dockerfile-using-docker-init\/\" rel=\"noreferrer\">Dockerfile<\/a>, start with optimizing your packages and dependencies.<\/div>\n<\/div>\n<p>Now, Let&#8217;s go through the strategies for Image Optimization step by step:<\/p>\n<h2 id=\"two-strategies-for-image-optimization\">Two Strategies for Image Optimization<\/h2>\n<p>To reduce the size of my Docker image, I followed a few key steps.<\/p>\n<ol>\n<li>Packages and dependencies optimization<\/li>\n<li>Multistage build (separate build artifacts from runtime)<\/li>\n<li><code>dockerignore<\/code> file<\/li>\n<\/ol>\n<h3 id=\"step-1-packages-and-dependency-optimization\">Step 1: Packages and dependency Optimization<\/h3>\n<p>During development stages, the ML engineer added many dependencies to <code>requirements.txt<\/code> file. From an analysis, I found many package are not required for the kubeflow training pipeline.<\/p>\n<p>So, from the <code>requirements.txt<\/code> file, I removed many unwanted and duplicate packages and dependencies.<\/p>\n<p>Now you might wonder.<\/p>\n<p>How do I analyse, that a specific package is not needed in my project?<\/p>\n<p>There are 2 ways to find out:<\/p>\n<p>We can use the <code>grep<\/code> command to find whether the installed packages from the requirements.txt file is used in our project or not <\/p>\n<pre><code class=\"language-Bash\">kubeflow-training-pipeline % grep -R \"import numpy\".<\/code><\/pre>\n<p>You will get an output<\/p>\n<pre><code>.\/src\/model_development\/_09_evaluation.py:import numpy as np<\/code><\/pre>\n<p>This tells us that, the installed package <code>numpy<\/code> is imported and used at the specific path<\/p>\n<p>In this way, I removed some of the unwanted packages from the requirements.txt file like <code>torch<\/code><\/p>\n<p>But we should not rely on the <code>grep<\/code> command, Because when we run the grep command on <code>kfp-kubernetes<\/code> It shows no output. So don&#8217;t remove the <code>kfp-kubernetes<\/code> from the requirements.txt file.<\/p>\n<p>Because, some packages are needed for:<\/p>\n<ul>\n<li>pipeline compilation<\/li>\n<li>kubernetes execution<\/li>\n<li>SDK behavior<\/li>\n<\/ul>\n<p>That are not necessarily imported in <code>.py<\/code> files<\/p>\n<p>Here comes the next method .<\/p>\n<p><strong>Validation method:<\/strong> In this method, we will remove a package from the <code>requirements.txt<\/code> file, Rebuild the <a href=\"https:\/\/devopscube.com\/run-docker-containers-as-non-root-user\/\" rel=\"noreferrer\">container<\/a>, Run the pipeline<\/p>\n<p>Like, I remove the <code>kfp-kubernetes<\/code> , I rebuild the container and Run the pipeline.<\/p>\n<p>If I see any errors like:<\/p>\n<ul>\n<li>Component not found<\/li>\n<li>Kubernetes configuration issues<\/li>\n<li>Pipeline execution failed<\/li>\n<\/ul>\n<p>In this way I can understand that, the <code>kfp-kubernetes<\/code> package is mandatorily needed for our Kubeflow pipeline. <\/p>\n<p>Now let&#8217;s see What Packages and dependencies were removed for the optimization, and why?<\/p>\n<p>Lets go through it step by step :<\/p>\n<ul>\n<li>I removed <code>torch<\/code> because Torch alone can get up to a size of nearly <strong>700 MB to 2 GB <\/strong><\/li>\n<li>If the pipeline component is only responsible for <strong>data preprocessing<\/strong>, <strong>model evaluation<\/strong>, or orchestration, a <a href=\"https:\/\/devopscube.com\/setup-gpu-operator-kubernetes\/\" rel=\"noreferrer\">GPU<\/a> training package like Torch is not needed inside the image<\/li>\n<\/ul>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">You can check this <a href=\"https:\/\/download.pytorch.org\/whl\/cpu?ref=devopscube.com\">link<\/a> to see which GPU packages are installed by torch.<\/div>\n<\/div>\n<ul>\n<li><code>matplotlib<\/code> and <code>seaborn<\/code>belong are data visualization libraries, visualizations belongs in the notebooks or a separate analysis step. Removing them saves space since matplotlib carries several heavy dependencies inside.<\/li>\n<li><code>kubeflow<\/code> and <code>kubeflow-training<\/code> are also not needed, the <code>kfp<\/code> and <code>kfp-kubernetes<\/code> already gives us everything we need to define, compile, and run the Kubeflow pipelines. The Kubeflow and Kubeflow-training packages are high-level SDKs used for training operators like PyTorchJob or TFJob<\/li>\n<li><code>flask<\/code> and <code>flask-cors<\/code> , these are web frameworks used for serving the API&#8217;s. A Kubeflow-training pipeline container is not an API server. It just runs a job and exits.<\/li>\n<li>Before optimization <code>feast<\/code> was duplicated into separate 3 lines. <\/li>\n<\/ul>\n<pre><code>feast\nfeast[redis]\nfeast[postgres]<\/code><\/pre>\n<p>We made it to one line <code>feast[redis,postgres]<\/code> Because, multiple lines of the same package can cause conflicts and redundant installs.<\/p>\n<p>And after optimization, my final requirements.txt file looks like this:<\/p>\n<pre><code>numpy\npandas\nscikit-learn\njoblib\npandera\npython-dotenv\ns3fs\nboto3\nbotocore\nkfp\nkfp-kubernetes\nkubernetes\nmlflow\nfeast[redis,postgres]\npyarrow\nfsspec<\/code><\/pre>\n<p>After I checked my Docker image size:<\/p>\n<pre><code class=\"language-Bash\">$ docker images\n\nIMAGE                    ID              DISK USAGE    CONTENT SIZE\nkubeflow_pipeline:test   09e88c766290      1.8GB         410MB<\/code><\/pre>\n<p>And the build time reduced nearly half <strong>400 seconds ( 6 minutes )<\/strong><\/p>\n<pre><code>[+] Building 402.9s (12\/12) FINISHED<\/code><\/pre>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Key Takeaway:<\/strong><\/b><\/p>\n<p>In a MLOps environment, package auditing is the first line of defense against bloated images in pipelines.<\/div>\n<\/div>\n<h3 id=\"step-2-implementing-multi-stage-build\">Step 2: Implementing Multi-Stage Build<\/h3>\n<p>After removing the unwanted dependencies. Next is to create a <strong>multi-stage<\/strong>&nbsp;Dockerfile.<\/p>\n<p>Its not an application, its just a base image for a pipeline, so why should I use multi-staging?<\/p>\n<p>The answer is, it is not mandatory, but its really useful.<\/p>\n<p>As you know, multi-stage means build one stage, copy only what we need, and run on a clean stage.<\/p>\n<p>So I rewrote my Dockerfile to multi-stage:<\/p>\n<pre><code class=\"language-Docker file\"># Stage 1: Build\nFROM python:3.11-slim AS builder\n\nWORKDIR \/install\nCOPY requirements.txt .\nRUN pip install --prefix=\/install --no-cache-dir -r requirements.txt\n\n# Stage 2: Runtime (lighter)\nFROM python:3.11-slim\n\nWORKDIR \/app\n\nCOPY --from=builder \/install \/usr\/local\n\nCOPY setup.py .\nCOPY src\/ .\/src\/\nCOPY _feast\/ .\/_feast\/\nCOPY _kubeflow\/ .\/_kubeflow\/\nCOPY _mlflow\/ .\/_mlflow\/\n\nRUN pip install --no-cache-dir .\n<\/code><\/pre>\n<p>Stage 1 is a temporary stage (or) workspace.<\/p>\n<p>It installs the <a href=\"https:\/\/devopscube.com\/python-for-devops\/\" rel=\"noreferrer\">Python<\/a> dependencies from <code>requirements.txt<\/code> into a separate <code>install<\/code> folder. This stage is cleaned away after the build, so it never ends up in your final image.<\/p>\n<p>The second stage is the actual final image<strong>, <\/strong>where it starts with a clean <a href=\"https:\/\/devopscube.com\/python-numpy-tutorial\/\" rel=\"noreferrer\">Py<\/a><a href=\"https:\/\/devopscube.com\/python-numpy-tutorial\/\" rel=\"noreferrer\">thon<\/a> environment and only copies the installed packages from Stage 1, which means,  no build tools, no cache, no junk.<\/p>\n<p>Then it copies your project folders into the <code>\/app<\/code><strong> <\/strong>directory inside the container.<\/p>\n<p>Finally, It runs the <code>pip install<\/code> command which ensures that the project folder is globally recognized within the container. <\/p>\n<pre><code class=\"language-bash\">$ docker images\n\nIMAGE                    ID              DISK USAGE    CONTENT SIZE\nkubeflow_pipeline:v2   09e88c766290        1.59GB        353MB<\/code><\/pre>\n<p>As you can see, after multi-stage building my Docker image, the size was reduced from <strong>410Mb to 353Mb<\/strong> , ie <strong>60MB<\/strong> <strong>reduction<\/strong>. That is not much but still helpful.<\/p>\n<p>But the build time is reduced to just <strong>200 seconds ( 3 minutes )<\/strong><\/p>\n<pre><code>[+] Building 197.4s (12\/12) FINISHED<\/code><\/pre>\n<h3 id=\"step-3-docker-ignore\">Step 3 : Docker ignore <\/h3>\n<p>The Docker ignore file acts as a Control center, that decides what are the files and folders that should not go inside our docker image.<\/p>\n<p>In our previous docker file, we can see a line:<\/p>\n<pre><code class=\"language-Docker file\">COPY . .<\/code><\/pre>\n<p>When this line runs from your docker file, It not just only copy files into the image.<\/p>\n<p>It sends your entire project directory to your Docker daemon, we call it as build context.<\/p>\n<h3 id=\"what-if-there-is-no-dockerignore-file\">What if there is no dockerignore file?<\/h3>\n<p>Without a <strong><code>dockerignore<\/code><\/strong> file, Docker includes each and everything inside your project directory.<\/p>\n<ul>\n<li><code>.git<\/code><\/li>\n<li>virtual environments (<code>venv\/<\/code> , <code>.venv\/<\/code>)<\/li>\n<li>environment files ( <code>.env<\/code> )<\/li>\n<li>cache files (<code>__pycache__<\/code>)<\/li>\n<\/ul>\n<p>Even if we dont use them, they will still get uploaded and copied to our image. <\/p>\n<h3 id=\"verify-dockerignore-is-working\">Verify dockerignore is working<\/h3>\n<p>We have a command to verify, the <strong><code>dockerignore<\/code><\/strong> file.<\/p>\n<pre><code class=\"language-Bash\">docker build --no-cache --progress=plain -t test .<\/code><\/pre>\n<p>This command will give us an output, <\/p>\n<pre><code>=&gt; =&gt; transferring context:249B<\/code><\/pre>\n<p>This is we looked for, If we see another output like:<\/p>\n<pre><code>Sending build context to Docker daemon  847MB<\/code><\/pre>\n<p>This means Large files are being sent, it includes the directories that I mentioned early ( <code>.git<\/code> , virtual environments, etc )<\/p>\n<p>So we should recheck our dockerignore file before <a href=\"https:\/\/devopscube.com\/build-docker-image\/\" rel=\"noreferrer\">building the Docker image.<\/a><\/p>\n<h2 id=\"final-results\">Final Results<\/h2>\n<p>The following table shows the final before and after optmization results.<\/p>\n<p><!--kg-card-begin: html--><\/p>\n<table>\n<thead>\n<tr>\n<th>What Improved<\/th>\n<th>Before<\/th>\n<th>After<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Image Size<\/td>\n<td>3.17 GB<\/td>\n<td>354 MB (\u2193 89%)<\/td>\n<\/tr>\n<tr>\n<td>Build Time<\/td>\n<td>13 minutes<\/td>\n<td>3 minutes (\u2193 75%)<\/td>\n<\/tr>\n<tr>\n<td>Dependencies<\/td>\n<td>Unused + duplicate packages<\/td>\n<td>Minimal required packages<\/td>\n<\/tr>\n<tr>\n<td>Dockerfile<\/td>\n<td>Single-stage<\/td>\n<td>Multi-stage<\/td>\n<\/tr>\n<tr>\n<td>Build Context<\/td>\n<td>Large<\/td>\n<td>Optimized with .dockerignore<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!--kg-card-end: html--><\/p>\n<h2 id=\"real-world-impact\">Real World Impact<\/h2>\n<p>The image size is reduced, But does this matter?<\/p>\n<p>Yes it does.<\/p>\n<ul>\n<li><strong>Reduced Build time:<\/strong>  As you can see, the more smaller the image the less time it takes to build, We reduced the build time <strong>from 13 minutes to just 3 minutes.<\/strong><\/li>\n<li><strong>Faster pulls: <\/strong>We will get faster pull time in new EKS nodes<\/li>\n<li>Avoids unwanted registry storage costs.<\/li>\n<\/ul>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>By combining multi-stage Docker builds with a leaner, deduplicated <code>requirements.txt<\/code>, I reduced the image size from 3.17 GB to just 354 MB, <strong>almost a 89% reduction. <\/strong><\/p>\n<p>The key takeaway is simple, as a developer, <strong>only include what your pipeline actually needs<\/strong> at runtime. So image optimization is a colloaborate effort between Data Scientists, Developer and a <a href=\"https:\/\/devopscube.com\/become-devops-engineer\/\" rel=\"noreferrer\">DevOps engineer.<\/a><\/p>\n<p>A smaller image means faster builds, quicker deployments, and a cleaner production environment, all without sacrificing functionalities.<\/p>\n<hr>\n<p><strong>Ngu\u1ed3n:<\/strong> <a href=\"https:\/\/devopscube.com\/kubeflow-docker-image-optmization\/\" target=\"_blank\" rel=\"noopener noreferrer\">From 3.17 GB to 354 MB: How I Reduced My Kubeflow Docker Image by 89% \u2014 DevOpsCube<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: https:\/\/devopscube.com\/kubeflow-docker-image-optmization\/<\/p>\n","protected":false},"author":1,"featured_media":245,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-244","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/244","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=244"}],"version-history":[{"count":0,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/244\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/media\/245"}],"wp:attachment":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=244"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=244"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}