{"id":291,"date":"2026-04-09T06:46:46","date_gmt":"2026-04-09T06:46:46","guid":{"rendered":"https:\/\/blog.ngocha.biz\/?p=291"},"modified":"2026-04-09T06:46:46","modified_gmt":"2026-04-09T06:46:46","slug":"apache-airflow-on-kubernetes","status":"publish","type":"post","link":"https:\/\/blog.ngocha.biz\/?p=291","title":{"rendered":"How to Deploy Apache Airflow on Kubernetes (Production Guide)"},"content":{"rendered":"<p>Apache Airflow is an open source workflow orchestration tool used for data, workflows. And many organizations today, run it on Kubernetes for scalability and flexibility.<\/p>\n<p>Also, <a href=\"https:\/\/airflow.apache.org\/blog\/airflow-three-point-oh-is-here\/?ref=devopscube.com\" rel=\"noreferrer\">Apache Airflow 3<\/a> is total redesign that expands its capabilities to <strong>support complex AI, ML<\/strong>, and near real-time data workloads.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Key Insight <\/strong><\/b>\ud83d\udca1<br \/>80,000 organizations use Airflow, with over 30% of users running <a href=\"https:\/\/devopscube.com\/devops-to-mlops\/\" rel=\"noreferrer\">MLOps<\/a> workloads and 10% using it for GenAI workflows<\/div>\n<\/div>\n<p>In this blog we covered Apache Airflow on a Kubernetes and configure it to automatically sync DAGs from GitHub.<\/p>\n<p>At the end of this blog, you will have learned<\/p>\n<ol>\n<li>What is Apache Airflow<\/li>\n<li>What is a DAG<\/li>\n<li>Understand Airflow executors.<\/li>\n<li>Installation of Apache Airflow on Kubernetes<\/li>\n<li>Configure GitSync for DAGS &amp; Kubernetes executors.<\/li>\n<li>Insights of few Airflow Day 2 operations and more..<\/li>\n<\/ol>\n<p>Lets get started.<\/p>\n<h2 id=\"what-is-apache-airflow\">What is Apache Airflow?<\/h2>\n<p>Apache Airflow is an open source application that helps build and manage complex workflows and <strong>data pipelines<\/strong>.<\/p>\n<p>A data pipeline is a series of steps to collect data from sources, process it, and then store it in a storage. These steps include collecting, cleaning, transforming, filtering, and storing data. (ETL Process)<\/p>\n<p>In the data pipeline, each individual step (collecting, cleaning etc) is called a Task<strong>. <\/strong>A <strong>Task is simply a single unit of work<\/strong> that Airflow executes.<\/p>\n<p>Now, these tasks don&#8217;t run individually. They need to run in a specific order. <em>Why? <\/em>Well, the logic is simple. You cant clean data before collecting it, and you can&#8217;t store data before transforming it. <\/p>\n<p>This is where a <strong>DAG (Directed Acyclic Graph)<\/strong> comes in. It is a foundational concept in Data engineering and AI\/ML workflows. It is <strong>used to define how tasks depend on each other<\/strong> and in what order they should run.<\/p>\n<h2 id=\"directed-acyclic-graph-dag\">Directed Acyclic Graph (DAG)<\/h2>\n<p>A DAG is basically a definition of your entire pipeline <strong>written in Python<\/strong>. It describes all the tasks and <strong>the order in which they should run<\/strong>. Think of it as the blueprint of your data pipeline . Here is what DAG means.<\/p>\n<ul>\n<li><strong>Directed<\/strong>: Tasks flow in one direction (collect, clean, transform and then store)<\/li>\n<li><strong>Acyclic<\/strong>: No cycles exist. Once a task is completed, the workflow does not go back to a previous step.<\/li>\n<li><strong>Graph<\/strong>: The pipeline is represented as a graph. Each task is a node and each dependency is an edge. This defines how tasks are connected and which task runs after which.<\/li>\n<\/ul>\n<p>So in simple terms, a <strong><em>DAG is your pipeline, a Task is each step inside it.<\/em><\/strong><\/p>\n<p>If you have worked with <strong>CI\/CD pipelines<\/strong>, this is exactly the same concept. One failure stops everything downstream. <strong>Airflow applies that exact model<\/strong> to data pipelines.<\/p>\n<p>The following image illustrates how a sample DAG is structured in code. It shows the DAG definition, Tasks, Nodes, and Edges. This is the actual code you write in Airflow to build any pipeline.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-95.png\" class=\"kg-image\" alt=\"dag structure explained\" loading=\"lazy\" width=\"2000\" height=\"2431\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/image-95.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/03\/image-95.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/03\/image-95.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w2400\/2026\/03\/image-95.png 2400w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-text\"> \ud83d\udccc <b><strong style=\"white-space: pre-wrap;\">DevOps Insight <\/strong><\/b><br \/>Terraform already uses this concept. It internally builds a resource dependency graph (also a DAG) before running <code spellcheck=\"false\" style=\"white-space: pre-wrap;\">terraform apply<\/code>. That is how it knows to create the VPC before the subnet.<\/div>\n<\/div>\n<h2 id=\"airflow-kubernetes-executors\">Airflow Kubernetes Executors<\/h2>\n<p><em>So where does this Python code (DAG) actually run? <\/em><\/p>\n<p>That is what the <strong>Airflow Executor decides<\/strong>. As the name suggests, the Executor determines how and where your DAG gets executed.<\/p>\n<p>Just like Jenkins or GitHub Actions uses agents\/runners to execute CI\/CD pipelines, Airflow uses Executors.<\/p>\n<p>There are different types of Executors, and you choose one based on where you are running Airflow and how you want to execute your DAGs.<\/p>\n<p>In our case, we will use the <strong>KubernetesExecutor<\/strong>. This executor spins up a <a href=\"https:\/\/devopscube.com\/kubernetes-pod\/\" rel=\"noreferrer\">pod<\/a> in the <a href=\"https:\/\/devopscube.com\/setup-kubernetes-cluster-kubeadm\/\" rel=\"noreferrer\">Kubernetes cluster<\/a> for every task in the DAG when you trigger the pipeline.<\/p>\n<p>It is similar to how you select an agent or runner to execute your <a href=\"https:\/\/devopscube.com\/learning-ci-cd-tools\/\" rel=\"noreferrer\">CI\/CD<\/a> job in <a href=\"https:\/\/devopscube.com\/jenkins-2-tutorials-getting-started-guide\/\" rel=\"noreferrer\">Jenkins<\/a> or GitHub Actions. <\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-94.png\" class=\"kg-image\" alt=\"how KubernetesExecutor runs tasks\" loading=\"lazy\" width=\"1899\" height=\"1402\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/image-94.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/03\/image-94.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/03\/image-94.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-94.png 1899w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Think of it as the dedicated environment where your DAG tasks actually run. The above image illustrates it better.<\/p>\n<p>There are multiple types of executors available for Airflow. We can choose one according to our needs. Here is a comparison table for the executors.<\/p>\n<p><!--kg-card-begin: html--><\/p>\n<table class=\"auto-wrap\" style=\"width: 100%;\">\n<tbody>\n<tr>\n<td><strong>Executor Types<\/strong><\/td>\n<td><strong>Local Executor<\/strong><\/td>\n<td><strong>Queued \/ Batch Executors<\/strong><\/td>\n<td><strong>Containerized Executors<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Available Executors<\/td>\n<td>LocalExecutor<\/td>\n<td>CeleryExecutor, BatchExecutor, EdgeExecutor<\/td>\n<td>KubernetesExecutor, EcsExecutor<\/td>\n<\/tr>\n<tr>\n<td>Where Tasks Run<\/td>\n<td>Inside scheduler pod<\/td>\n<td>Persistent worker pods<\/td>\n<td>Ephemeral pods \/ containers<\/td>\n<\/tr>\n<tr>\n<td>Workers Always Running<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td>Task Isolation<\/td>\n<td>No<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td>Startup Speed<\/td>\n<td>Instant<\/td>\n<td>Fast<\/td>\n<td>Slower (pod startup)<\/td>\n<\/tr>\n<tr>\n<td>Scales Horizontally<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td>Cost When Idle<\/td>\n<td>Low<\/td>\n<td>Higher<\/td>\n<td>Low<\/td>\n<\/tr>\n<tr>\n<td>Best For<\/td>\n<td>Dev \/ small setups<\/td>\n<td>High volume, speed matters<\/td>\n<td>Isolated, on-demand workloads<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!--kg-card-end: html--><\/p>\n<p>Now that you have the fair idea about Airflow, lets get started with he setup.<\/p>\n<h2 id=\"prerequisites\">Prerequisites <\/h2>\n<p>To install Airflow on the Kubernetes cluster, we need some requirements.<\/p>\n<ol>\n<li><a href=\"https:\/\/devopscube.com\/create-aws-eks-cluster-eksctl\/\" rel=\"noreferrer\">EKS cluster<\/a> with volume provisioners<\/li>\n<li><a href=\"https:\/\/devopscube.com\/kubectl-set-context\/\" rel=\"noreferrer\">Kubectl<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/create-helm-chart\/\" rel=\"noreferrer\">Helm<\/a><\/li>\n<li>GitHub Repository<\/li>\n<\/ol>\n<p>Once you ensure all the above prerequisites are available, we can start the installation.<\/p>\n<h2 id=\"how-airflow-works-on-kubernetes\">How Airflow Works on Kubernetes<\/h2>\n<p>Before we move to the setup, we need to understand the core components of Airflow.<\/p>\n<p>Here are some of the core components that you should understand.<\/p>\n<ol>\n<li><strong>Scheduler<\/strong> &#8211; This is the component that assigns the tasks to the executors to run jobs.<\/li>\n<li><strong>Executor<\/strong> &#8211; This is the component that actually runs the tasks (Celery, Kubernetes Executor, Edge etc)<\/li>\n<li><strong>DAG Processor<\/strong> &#8211; Gets DAG files from the configured repo.<\/li>\n<li><strong>Database<\/strong> &#8211; Stores data of DAGs, task state, metadata, etc.<\/li>\n<\/ol>\n<p>The following diagram illustrates the high level overview of running Airflow on Kubernetes.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/03\/image-93.png\" class=\"kg-image\" alt=\"Airflow Kubernetes Setup Overview\" loading=\"lazy\" width=\"2000\" height=\"2459\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/03\/image-93.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/03\/image-93.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/03\/image-93.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w2400\/2026\/03\/image-93.png 2400w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Here is how the overall workflow looks like.<\/p>\n<ul>\n<li>Developers push DAG files to GitHub through CI\/CD. The entire pipeline code lives in Git.<\/li>\n<li><strong>GitSync<\/strong> keeps Airflow in sync with GitHub. The DAG Processor runs a GitSync sidecar that continuously watches the repo. When a new DAG is pushed, Airflow picks it up automatically.<\/li>\n<li>The Scheduler reads the DAG and decides what tasks should run and when.<\/li>\n<li>The <strong>Kubernetes Executor<\/strong> creates a separate pod for the task. These are short-lived pods. Once the task is executed, it gets deleted.<\/li>\n<li>Worker pods execute the data-related tasks defined in the DAG.<\/li>\n<\/ul>\n<div class=\"kg-card kg-callout-card kg-callout-card-yellow\">\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Note:<\/strong><\/b> We wont be using S3 in this guide, it only to show how worker pod uses Pod Identity mapped to an IAM Role to securly access S3.<\/div>\n<\/div>\n<h2 id=\"fork-the-github-repository\">Fork the GitHub Repository<\/h2>\n<p>Fork the following repository, which has all the files we are going to use.<\/p>\n<pre><code class=\"language-bash\">https:\/\/github.com\/techiescamp\/airflow-setup.git<\/code><\/pre>\n<p>Use the following command to clone your forked repository.<\/p>\n<pre><code class=\"language-bash\">git clone &lt;your-repo-url&gt;<\/code><\/pre>\n<p>Then move into the <code>airflow-setup<\/code> folder using the following command.<\/p>\n<pre><code class=\"language-bash\">cd airflow-setup<\/code><\/pre>\n<p>And you can see the DAG file and custom values file we are going to use in this guide as shown below.<\/p>\n<pre><code class=\"language-bash\">airflow-setup\n    \u251c\u2500\u2500 README.md\n    \u251c\u2500\u2500 dags\n    \u2502   \u2514\u2500\u2500 etl-pipeline.py\n    \u2514\u2500\u2500 helm\n        \u2514\u2500\u2500 custom-values.yaml<\/code><\/pre>\n<h2 id=\"set-up-apache-airflow-using-helm\">Set up Apache Airflow using Helm<\/h2>\n<p>We are going to install Apache Airflow using the official Helm chart.<\/p>\n<p>Lets get started.<\/p>\n<h3 id=\"add-airflow-helm-repository\">Add Airflow Helm Repository<\/h3>\n<p>To install Airflow using Helm, first we need to add its Helm repository.<\/p>\n<p>Use the following command to add the Airflow Helm repository.<\/p>\n<pre><code class=\"language-bash\">helm repo add apache-airflow https:\/\/airflow.apache.org<\/code><\/pre>\n<p>Then use the following command to verify if the repository is added.<\/p>\n<pre><code class=\"language-bash\">$ helm search repo apache-airflow\n\nNAME                  \tCHART VERSION\tAPP VERSION DESCRIPTION\n\napache-airflow\/airflow\t1.20.0       \t3.1.8       The official Helm chart to deploy Apache Airflo...<\/code><\/pre>\n<h3 id=\"connecting-to-private-git-repositories\">Connecting to Private Git Repositories<\/h3>\n<p>In enterprise projects only private Git repositories are used. To access those from Airflow, you need to create a <strong>Kubernetes Secret with GitHub credentials.<\/strong><\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Note:<\/strong><\/b> You can get the token from GitHub \u2014&gt; Settings \u2014&gt; Developer settings \u2014&gt; Personal access tokens<\/div>\n<\/div>\n<p>Run the following command by <strong>replacing the values with your GitHub username and token.<\/strong><\/p>\n<pre><code class=\"language-bash\">kubectl -n airflow create secret generic git-credentials \\\n  --from-literal=GIT_SYNC_USERNAME=&lt;your-github-username&gt; \\\n  --from-literal=GIT_SYNC_PASSWORD=&lt;your-github-token&gt; \\\n  --from-literal=GITSYNC_USERNAME=&lt;your-github-username&gt; \\\n  --from-literal=GITSYNC_PASSWORD=&lt;your-github-token&gt;<\/code><\/pre>\n<h3 id=\"create-helm-custom-values-file\">Create Helm Custom Values File<\/h3>\n<p>We need to add <strong>gitSync configurations <\/strong>to Airflow<strong> <\/strong>to sync the DAG files from Github. We also  need to configure Airflow to use the <strong><code>KubernetesExecutor<\/code><\/strong> because by default it is set to Celery executor.<\/p>\n<p>To do that, we need to customize the Helm values file.<\/p>\n<p>You can find the custom values file <code>custom-values.yaml<\/code> in the <code>airflow-setup\/helm<\/code> directory as shown below.<\/p>\n<pre><code class=\"language-yaml\">executor: KubernetesExecutor\n\ndags:\n  gitSync:\n    enabled: true\n    repo: https:\/\/github.com\/techiescamp\/airflow-setup.git\n    branch: main\n    subPath: \"dags\"\n    period: 60s\n    credentialsSecret: git-credentials<\/code><\/pre>\n<div class=\"kg-card kg-callout-card kg-callout-card-yellow\">\n<div class=\"kg-callout-emoji\">\u26a0\ufe0f<\/div>\n<div class=\"kg-callout-text\">Replace the repository URL under gitSync and push it to your Git repository<\/div>\n<\/div>\n<p>If you need more configuration details, download the default values file using the following command and check it.<\/p>\n<pre><code class=\"language-bash\">helm show values apache-airflow\/airflow &gt; values.yaml<\/code><\/pre>\n<h3 id=\"deploy-airflow\">Deploy Airflow<\/h3>\n<p>Now, we are ready to install Airflow on Kubernetes.<\/p>\n<p>Before running the installation command, make sure you are inside the helm folder.<\/p>\n<pre><code class=\"language-bash\">cd helm<\/code><\/pre>\n<p>Now, run the following command to install Airflow using the custom values file.<\/p>\n<pre><code class=\"language-bash\">helm install airflow apache-airflow\/airflow -n airflow --create-namespace -f custom-values.yaml\n<\/code><\/pre>\n<p>Once the command has run, check whether all the components are properly deployed.<\/p>\n<pre><code class=\"language-bash\">$ kubectl -n airflow get po\n\nNAME                                    READY   STATUS    RESTARTS   AGE\nairflow-api-server-8bb48f6b9-7wjvk      1\/1     Running   0          5m\nairflow-dag-processor-d7686b7b4-zq6fx   3\/3     Running   0          5m\nairflow-postgresql-0                    1\/1     Running   0          5m\nairflow-scheduler-594986c799-bwjqp      2\/2     Running   0          5m\nairflow-statsd-587445b5b-8fcvg          1\/1     Running   0          5m\nairflow-triggerer-0                     3\/3     Running   0          5m<\/code><\/pre>\n<p>The output ensures that all the Airflow components are running without any issues.<\/p>\n<h3 id=\"access-apache-airflow-ui\">Access Apache Airflow UI<\/h3>\n<p>Now, we can open the Airflow UI. For that, we need to perform port forwarding so that we can access from the web browser of the local machine.<\/p>\n<p>Use the following command to port forward Apache Airflow and see the UI.<\/p>\n<pre><code class=\"language-bash\">kubectl -n airflow port-forward svc\/airflow-api-server 8080:8080\n\nForwarding from 127.0.0.1:8080 -&gt; 8080\nForwarding from [::1]:8080 -&gt; 8080<\/code><\/pre>\n<p>Now, open any of the web browsers and paste the URL of <code>http:\/\/localhost:8080<\/code><\/p>\n<p>You will be prompted to log in. By default, <strong>both the username and password are <\/strong><code>admin<\/code><strong>.<\/strong><\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Important Note:<\/strong><\/b> In production, the UI would be exposed via <a href=\"https:\/\/devopscube.com\/kubernetes-ingress-tutorial\/\" rel=\"noreferrer\">Ingress<\/a> or <a href=\"https:\/\/devopscube.com\/kubernetes-gateway-api\/\" rel=\"noreferrer\">Gateway API<\/a>. Also, authentication methods such as LDAP or OAuth, along with SSL\/TLS is used to protect access to the Airflow web server.<\/div>\n<\/div>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/07\/image-273.png\" class=\"kg-image\" alt=\"The login page of the apache airflow\" loading=\"lazy\" width=\"1754\" height=\"1274\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/07\/image-273.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/07\/image-273.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2025\/07\/image-273.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/07\/image-273.png 1754w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Once you log in, you will see a home page similar to the one below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/07\/image-274.png\" class=\"kg-image\" alt=\"the home page of the apache airflow \" loading=\"lazy\" width=\"2000\" height=\"1284\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/07\/image-274.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/07\/image-274.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2025\/07\/image-274.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/07\/image-274.png 2090w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<h3 id=\"access-the-dag\">Access the DAG<\/h3>\n<p>Now, go to the <strong>DAGs<\/strong> section in the Airflow UI. You should see the DAG listed there.<\/p>\n<p>Since we configured <code>gitSync<\/code> with the Github repository through Helm, Airflow <strong>automatically syncs the DAG<\/strong> from the repository as shown below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-16.png\" class=\"kg-image\" alt=\"dag files\" loading=\"lazy\" width=\"814\" height=\"451\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-16.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-16.png 814w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>If you click on the DAG and <strong>open the Graph View<\/strong>, you can see the defined tasks in order, as shown below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-26.png\" class=\"kg-image\" alt=\"airflow dag task order\" loading=\"lazy\" width=\"1224\" height=\"540\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-26.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/04\/image-26.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-26.png 1224w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<h3 id=\"trigger-the-dag\">Trigger the DAG<\/h3>\n<p>Now, let&#8217;s trigger the DAG using the Trigger button and select the single run option, and then click the Trigger button as shown below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-28.png\" class=\"kg-image\" alt=\"triggering a dag file manually\" loading=\"lazy\" width=\"1011\" height=\"485\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-28.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/04\/image-28.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-28.png 1011w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>It will <strong>take a few minutes for the entire pipeline<\/strong> to run. You can check the status of each task, as shown below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-27.png\" class=\"kg-image\" alt=\"dag tasks running one by one\" loading=\"lazy\" width=\"1250\" height=\"574\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-27.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/04\/image-27.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-27.png 1250w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\"><b><strong style=\"white-space: pre-wrap;\">Production DAG Triggering:<\/strong><\/b><br \/>In real-world data and MLOps pipelines, DAGs are not always triggered the same way. There are three common patterns. <b><strong style=\"white-space: pre-wrap;\">Scheduled<\/strong><\/b> (cron-based), <b><strong style=\"white-space: pre-wrap;\">Event-driven<\/strong><\/b> (Eg,. on data arrival) and <b><strong style=\"white-space: pre-wrap;\">API-triggered<\/strong><\/b> for ad-hoc runs from CI\/CD pipelines via the Airflow REST API<\/div>\n<\/div>\n<p>Also, as we mentioned earlier, we used <code>KubernetesPodOperator<\/code> for this example, which allows you to run tasks in separate pods with a custom Docker image.<\/p>\n<p>There are also other operators available for different use cases. For example,<\/p>\n<ol>\n<li>BashOperator<\/li>\n<li>PythonOperator<\/li>\n<li>EmailOperator<\/li>\n<li>and more<\/li>\n<\/ol>\n<p>You can check all available operators from the <a href=\"https:\/\/airflow.apache.org\/docs\/apache-airflow\/stable\/core-concepts\/operators.html?ref=devopscube.com\" rel=\"noreferrer\">official documentation.<\/a><\/p>\n<h2 id=\"cloning-repo-from-dags\">Cloning Repo From DAGs<\/h2>\n<p>There are use cases where you might want to clone the repo and push to repo from the DAG itself. For example, if you integrate <a href=\"https:\/\/devopscube.com\/dvc-tutorial-for-beginners\/\" rel=\"noreferrer\">DVC<\/a> in Airflow, DVC will need push access to repo from the DAG. The following image illustrates this use case.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-31.png\" class=\"kg-image\" alt=\"clone the repo and push to git repo from the DAG from Pod using git credentials bound to Kubernetes secret\" loading=\"lazy\" width=\"2000\" height=\"1715\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-31.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/04\/image-31.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/04\/image-31.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-31.png 2222w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>For that, in the DAG (using <code>KubernetesPodOperator<\/code>), you need to <strong>mount the Git secret<\/strong> into the worker pod as environment variables.<\/p>\n<p>Here is a code example.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-25.png\" class=\"kg-image\" alt=\"dag file example with git credentials\" loading=\"lazy\" width=\"630\" height=\"298\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-25.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-25.png 630w\"><\/figure>\n<h2 id=\"clean-up\">Clean Up<\/h2>\n<p>If you no longer need the Airflow setup, run the following commands to uninstall it.<\/p>\n<pre><code class=\"language-bash\">helm uninstall airflow -n airflow<\/code><\/pre>\n<p>The PVCs created by the Helm deployment will be persistent. Use the following command to remove all.<\/p>\n<pre><code class=\"language-bash\">kubectl delete pvc --all -n airflow<\/code><\/pre>\n<h2 id=\"troubleshooting-airflow-on-k8s\">Troubleshooting Airflow on K8s<\/h2>\n<p>Following are some of the issues you may face with Airflow on Kubernetes.<\/p>\n<h3 id=\"oomkills-for-tasks\">OOMKills for tasks<\/h3>\n<p>OOMKill is a common problem with Airflow worker pods. It happens when a container exceeds its allocated memory, and Kubenretes terminates it to protect node stability.<\/p>\n<p>It can happen due to the following reasons.<\/p>\n<ol>\n<li>Memory-heavy code or data processing<\/li>\n<li>Incorrect request and limit resource configurations.<\/li>\n<\/ol>\n<p>To fix this, set the memory request and limit for your tasks. If you are using <strong><code>KubernetesExecutor<\/code><\/strong>, you can <strong>set memory request and limit for each tasks<\/strong> in the DAG file itself.<\/p>\n<p>For exampple, the following code snippet shows how we used the memory request and limit in the DAG file we used.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-29.png\" class=\"kg-image\" alt=\"how resources are specified in the dag file\" loading=\"lazy\" width=\"549\" height=\"341\"><\/figure>\n<h3 id=\"dags-are-not-showing-in-ui\">DAGs are Not Showing in UI<\/h3>\n<p>You have created a DAG file and synced the Git repo to Airflow, but it&#8217;s not showing in the UI.<\/p>\n<p>This may be caused by the following reasons: <\/p>\n<ul>\n<li>Check if the DAG directory you specified in GitSync is correct.<\/li>\n<li>Check the DAG file for any syntax issues and that the DAG object is specified in the file.<\/li>\n<\/ul>\n<h2 id=\"using-multiple-repositories-for-dags\">Using Multiple Repositories for DAGS<\/h2>\n<p>GitSync does not support multiple repos. You have to keep all dags inside a single repo and reads DAGs from a <strong>single folder<\/strong>.<\/p>\n<p>But what if I want to use DAG&#8217;s from multiple repositories? <\/p>\n<p>Well, you can use <a href=\"https:\/\/airflow.apache.org\/docs\/helm-chart\/stable\/manage-dag-files.html?ref=devopscube.com#synchronizing-multiple-git-repositories-with-git-sync\" rel=\"noreferrer\">Git Submodules<\/a> that lets you <strong>embed multiple repos inside one repo.<\/strong><\/p>\n<p>For example,<\/p>\n<pre><code>main-airflow-repo\/\n \u251c\u2500\u2500 dags\/\n \u2502    \u251c\u2500\u2500 team-a\/   (submodule repo A)\n \u2502    \u251c\u2500\u2500 team-b\/   (submodule repo B)\n \u2502    \u251c\u2500\u2500 ml\/       (submodule repo C)<\/code><\/pre>\n<p>This way, GitSync pulls <strong>only main-airflow-repo<\/strong>. But inside it you already have multiple repos configured using a<strong> <\/strong><code>.gitmodules<\/code> file.<\/p>\n<h2 id=\"airflow-monitoring-observability\">Airflow Monitoring &amp; Observability<\/h2>\n<p>Airflow deployment comes with a metrics exporter pod named <a href=\"https:\/\/github.com\/statsd\/statsd?ref=devopscube.com\" rel=\"noreferrer\"><strong>statsd<\/strong><\/a>.<\/p>\n<p>The statsd pod collects metrics from the Airflow pods and exposes them. We can use <a href=\"https:\/\/devopscube.com\/setup-prometheus-monitoring-on-kubernetes\/\" rel=\"noreferrer\">Prometheus<\/a> to scrape the metrics and use <a href=\"https:\/\/devopscube.com\/integrate-visualize-prometheus-grafana\/\" rel=\"noreferrer\">Grafana to visualize<\/a> it as a dashboard.<\/p>\n<p>Here is an example <a href=\"https:\/\/grafana.com\/grafana\/dashboards\/23297-airflow-dags-overview\/?ref=devopscube.com\" rel=\"noreferrer\">Grafana template.<\/a><\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-30.png\" class=\"kg-image\" alt=\"\" loading=\"lazy\" width=\"1914\" height=\"1332\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2026\/04\/image-30.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2026\/04\/image-30.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2026\/04\/image-30.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2026\/04\/image-30.png 1914w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<h2 id=\"reducing-costs-with-spot-instances\">Reducing Costs With Spot Instances<\/h2>\n<p>If you want to save costs on Airflow executors on cloud, you can make use of spot instances.<\/p>\n<p>You can add spot instance to the Kubernetes cluster and use <strong>nodeSelector, taints + tolerations<\/strong> to schedule the executor pods on it.<\/p>\n<p>Also, keep in mind that, spot nodes can disappear anytime. This means Pods get killed immediately. So, if you plan to use spot instances, you must design for failure.<\/p>\n<h2 id=\"production-hardening-readiness\">Production Hardening &amp; Readiness<\/h2>\n<p>The following are some of the production hardening tips for Airflow.<\/p>\n<ol>\n<li><strong>External Metadata Databases:<\/strong> Don&#8217;t use the default <a href=\"https:\/\/devopscube.com\/deploy-postgresql-statefulset\/\" rel=\"noreferrer\">PostgreSQL<\/a> database, use managed databases like <a href=\"https:\/\/devopscube.com\/terraform-aws-rds\/\" rel=\"noreferrer\">AWS RDS<\/a> for production environment.<\/li>\n<li><strong>Enable TLS for secure communication:<\/strong> Configure <a href=\"https:\/\/devopscube.com\/setup-ingress-kubernetes-nginx-controller\/\" rel=\"noreferrer\">Ingress controller<\/a> with TLS certificate to make the communication between user and Airflow secure.<\/li>\n<li><strong>Configure SSO for user management:<\/strong> Replace the default login with SSO by integrating OAuth2 or OIDC.<\/li>\n<li><strong>Use private repos to store DAGs:<\/strong> DAGs have the logic of your project, so keep it secure in a private repository and use the <a href=\"https:\/\/devopscube.com\/generate-ssh-key-pair\/\" rel=\"noreferrer\">SSH method<\/a> to authenticate private git repos.<\/li>\n<li><strong>Secret Management:<\/strong> In a production environment, credentials and API keys are not stored in Kubernetes Secrets. Use external Secret Management tools like <a href=\"https:\/\/devopscube.com\/vault-in-kubernetes\/\" rel=\"noreferrer\">HashiCorp Vault<\/a> or Secret Managers of the cloud platform you are running in (eg, <a href=\"https:\/\/devopscube.com\/secrets-store-csi-dirver-eks\/\" rel=\"noreferrer\">AWS Secrets Manager<\/a>).<\/li>\n<li><strong>Accessing Cloud Resources: <\/strong>Use cloud-native identity mechanisms to access cloud resources from worker pod. For example, in AWS, you can use IAM Roles for Service Accounts (IRSA) or Pod Identity to associate an IAM role with a <a href=\"https:\/\/devopscube.com\/kubernetes-api-access-service-account\/\" rel=\"noreferrer\">Kubernetes service account<\/a>. The worker pods use this service account to access resources like S3 without storing credentials inside the container.<\/li>\n<\/ol>\n<p>Here is a quick production checklist for Airflow<\/p>\n<ol>\n<li>Is persistence enabled for logs?<\/li>\n<li>Are resource limits set on all DAGs?<\/li>\n<li>Are you using a Private Registry for custom images?<\/li>\n<li>Is RBAC configured for the Airflow UI?<\/li>\n<\/ol>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>When it comes to MLOPS, Airflow is a key tool used by many organizations. As <a href=\"https:\/\/devopscube.com\/become-devops-engineer\/\" rel=\"noreferrer\">Devops engineers<\/a>, understanding Airflow infrastructure foundations can <strong>help in design and operate data &amp; ML pipelines<\/strong> in actual projects.<\/p>\n<p>This blog covered the installation and configuration of Apache Airflow on a Kubernetes cluster with GitSync &amp; Kubernetes executor.<\/p>\n<p>Also, we focussed specifically on Airflow Kubernetes deployment, its management and some Day 2 operations. We have used Airflow for ETL process. We are yet to explore it for MLOPS and GenAI workflows.<\/p>\n<p>Over to you!<\/p>\n<p>Are you planning to use Airflow in any of your project? What is your use case?<\/p>\n<p>Drop your insights in the comments below.<\/p>\n<\/p>\n<hr>\n<p><strong>Ngu\u1ed3n:<\/strong> <a href=\"https:\/\/devopscube.com\/apache-airflow-on-kubernetes\/\" target=\"_blank\" rel=\"noopener noreferrer\">How to Deploy Apache Airflow on Kubernetes (Production Guide) \u2014 DevOpsCube<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: https:\/\/devopscube.com\/apache-airflow-on-kubernetes\/<\/p>\n","protected":false},"author":1,"featured_media":292,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-291","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=291"}],"version-history":[{"count":0,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/291\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/media\/292"}],"wp:attachment":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}