{"id":454,"date":"2025-01-07T07:56:55","date_gmt":"2025-01-07T07:56:55","guid":{"rendered":"https:\/\/blog.ngocha.biz\/?p=454"},"modified":"2025-01-07T07:56:55","modified_gmt":"2025-01-07T07:56:55","slug":"setup-prometheus-helm-chart","status":"publish","type":"post","link":"https:\/\/blog.ngocha.biz\/?p=454","title":{"rendered":"How to Setup Prometheus Using Helm Chart? &#8211; Detailed Guide"},"content":{"rendered":"<p>In this guide, we will look at the Prometheus setup on Kubernetes using a helm chart with all the best practices.<\/p>\n<p><!--kg-card-begin: html--><br \/>\n<span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">If you want to learn about all the&nbsp;<a href=\"https:\/\/devopscube.com\/kubernetes-objects-resources\/\" target=\"_blank\">Kubernetes objects<\/a>&nbsp;involved in the Prometheus setup, you can follow the&nbsp;<a href=\"https:\/\/devopscube.com\/setup-prometheus-monitoring-on-kubernetes\/\" target=\"_blank\">Prometheus on Kubernetes&nbsp;<\/a>guide, where we used plain YAML manifest to deploy Prometheus.<\/span><br \/>\n<!--kg-card-end: html--><\/p>\n<h2 id=\"prerequisites\">Prerequisites<\/h2>\n<p>For this setup, ensure you have the following prerequisites.<\/p>\n<ol>\n<li>Helm configured on your workstation or the CI server where you want to run the Helm commands. (v3.16.3 or higher)<\/li>\n<li>A working <a href=\"https:\/\/devopscube.com\/upgrade-kubernetes-cluster-kubeadm\/\" rel=\"noreferrer noopener\">Kubernetes cluster<\/a> (v1.30 or higher)<\/li>\n<\/ol>\n<h2 id=\"prometheus-helm-chart-repo\">Prometheus Helm Chart Repo<\/h2>\n<p>The Prometheus community maintains all the Prometheus related <a href=\"https:\/\/devopscube.com\/create-helm-chart\/\" rel=\"noreferrer noopener\">Helm<\/a> charts in the following GitHub repository.<\/p>\n<pre><code class=\"language-bash\">https:\/\/github.com\/prometheus-community\/helm-charts\/<\/code><\/pre>\n<p>This repo contains Prometheus stack, exporters, Pushgateways, etc. You can install the required charts as per your requirements.<\/p>\n<p>To get started, we will deploy the core Prometheus chart that installs the following.<\/p>\n<ol>\n<li>Prometheus server<\/li>\n<li><a href=\"https:\/\/devopscube.com\/prometheus-alert-manager\/\">Alertmanager<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/setup-kube-state-metrics\/\">Kube State Metrics<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/node-exporter-kubernetes\/\">Prometheus Node Exporter<\/a><\/li>\n<li><a href=\"https:\/\/devopscube.com\/setup-prometheus-pushgateway-on-kubernetes\/\" rel=\"noreferrer noopener\">Prometheus Pushgateway<\/a><\/li>\n<\/ol>\n<p>Except for the Prometheus server, other components are installed from the dependency charts (sub-charts). If you check the <code>Chart.yaml<\/code>, you will find the added chart dependencies below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-37-7.png\" class=\"kg-image\" alt=\"prometheus dependency charts\" loading=\"lazy\" width=\"745\" height=\"603\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-37-7.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-37-7.png 745w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>You can refer to this <a href=\"https:\/\/devopscube.com\/prometheus-architecture\/\">Prometheus Architecture blog<\/a> to learn the complete workflow of Prometheus and its components.<\/p>\n<h2 id=\"install-prometheus-stack-using-helm\">Install Prometheus Stack Using Helm<\/h2>\n<p>Now, let&#8217;s get started with the setup.<\/p>\n<p>Follow the steps below to set up Prometheus using the <strong>community Helm chart.<\/strong><\/p>\n<h3 id=\"step-1-add-prometheus-helm-repo\">Step 1: Add Prometheus Helm Repo<\/h3>\n<p>Add the Prometheus chart to your system using the following command.<\/p>\n<pre><code class=\"language-bash\">helm repo add prometheus-community https:\/\/prometheus-community.github.io\/helm-charts<\/code><\/pre>\n<p>You can list all the charts in the repo using the following command. We are going to use the <strong><code>Prometheus<\/code><\/strong> chart.<\/p>\n<pre><code class=\"language-bash\">helm search repo prometheus-community<\/code><\/pre>\n<p>Before you deploy the Prometheus Helm chart, you can view all the <a href=\"https:\/\/devopscube.com\/create-kubernetes-yaml\/\" rel=\"noreferrer\">YAML manifests<\/a> by converting the chart to plain YAML files using the following command.<\/p>\n<pre><code class=\"language-bash\">helm template prometheus-community prometheus-community\/prometheus --output-dir prometheus-manifests<\/code><\/pre>\n<p>Here is the tree view of all the associated charts with YAML and Prometheus YAML.<\/p>\n<pre><code class=\"language-bash\">\u279c  prometheus-manifests tree\n.\n\u2514\u2500\u2500 prometheus\n    \u251c\u2500\u2500 charts\n    \u2502   \u251c\u2500\u2500 alertmanager\n    \u2502   \u2502   \u2514\u2500\u2500 templates\n    \u2502   \u2502       \u251c\u2500\u2500 configmap.yaml\n    \u2502   \u2502       \u251c\u2500\u2500 serviceaccount.yaml\n    \u2502   \u2502       \u251c\u2500\u2500 services.yaml\n    \u2502   \u2502       \u2514\u2500\u2500 statefulset.yaml\n    \u2502   \u251c\u2500\u2500 kube-state-metrics\n    \u2502   \u2502   \u2514\u2500\u2500 templates\n    \u2502   \u2502       \u251c\u2500\u2500 clusterrolebinding.yaml\n    \u2502   \u2502       \u251c\u2500\u2500 deployment.yaml\n    \u2502   \u2502       \u251c\u2500\u2500 role.yaml\n    \u2502   \u2502       \u251c\u2500\u2500 service.yaml\n    \u2502   \u2502       \u2514\u2500\u2500 serviceaccount.yaml\n    \u2502   \u251c\u2500\u2500 prometheus-node-exporter\n    \u2502   \u2502   \u2514\u2500\u2500 templates\n    \u2502   \u2502       \u251c\u2500\u2500 daemonset.yaml\n    \u2502   \u2502       \u251c\u2500\u2500 service.yaml\n    \u2502   \u2502       \u2514\u2500\u2500 serviceaccount.yaml\n    \u2502   \u2514\u2500\u2500 prometheus-pushgateway\n    \u2502       \u2514\u2500\u2500 templates\n    \u2502           \u251c\u2500\u2500 deployment.yaml\n    \u2502           \u251c\u2500\u2500 service.yaml\n    \u2502           \u2514\u2500\u2500 serviceaccount.yaml\n    \u2514\u2500\u2500 templates\n        \u251c\u2500\u2500 clusterrole.yaml\n        \u251c\u2500\u2500 clusterrolebinding.yaml\n        \u251c\u2500\u2500 cm.yaml\n        \u251c\u2500\u2500 deploy.yaml\n        \u251c\u2500\u2500 pvc.yaml\n        \u251c\u2500\u2500 service.yaml\n        \u2514\u2500\u2500 serviceaccount.yaml<\/code><\/pre>\n<p>From the manifests, you can see that the Prometheus Helm chart deploys the following.<\/p>\n<ol>\n<li>Alertmanager (Statefulset)<\/li>\n<li>Kube State Metrics (Deployment)<\/li>\n<li>Prometheus Node Exporter (Daemonset)<\/li>\n<li>Prometheus Pushgateway (Deployment)<\/li>\n<li>Prometheus Server (Deployment)<\/li>\n<\/ol>\n<h3 id=\"step-2-customize-prometheus-helm-chart-configuration-values\">Step 2: Customize Prometheus Helm Chart Configuration Values<\/h3>\n<p>While deploying Prometheus, it is very important to know the default values that are part of the <strong><code>values.yaml <\/code><\/strong>file.<\/p>\n<p>If you are using the community chart for your project requirements, you should modify the <strong><code>values.yaml<\/code><\/strong> file as per your environment requirements.<\/p>\n<p>You can write all the default values to a <strong><code>values.yaml<\/code><\/strong> file using the following command.<\/p>\n<pre><code class=\"language-bash\">helm show values prometheus-community\/prometheus &gt; values.yaml<\/code><\/pre>\n<p>The following are the images used in this Prometheus Helm chart.<\/p>\n<ol>\n<li>quay.io\/prometheus-operator\/prometheus-config-reloader<\/li>\n<li>quay.io\/prometheus\/prometheus<\/li>\n<\/ol>\n<p>The subcharts use the following images.<\/p>\n<ol>\n<li>quay.io\/prometheus\/alertmanager<\/li>\n<li>registry.k8s.io\/kube-state-metrics\/kube-state-metrics<\/li>\n<li>quay.io\/prometheus\/node-exporter<\/li>\n<li>quay.io\/prometheus\/pushgateway<\/li>\n<\/ol>\n<p>You can customize the values to your needs. For example, the Prometheus Persistent Volume is set to 8Gi by default.<\/p>\n<blockquote><p><strong>Note:<\/strong> If you are running from a corporate network, you might not have access to these public images. You should first push these images to the organization private registry first and then deploy the chart. Also, check if the security guidelines allow you to push community image to private registries.<\/p><\/blockquote>\n<h3 id=\"step-3-deploy-prometheus-using-the-helm-chart\">Step 3: Deploy Prometheus using the Helm Chart<\/h3>\n<p>First, create a namespace <strong><code>monitoring<\/code><\/strong>. We will deploy Prometheus in the monitoring namespace.<\/p>\n<pre><code class=\"language-bash\">kubectl create namespace monitoring<\/code><\/pre>\n<p>Now, let&#8217;s deploy Prometheus using the values.yaml file.<\/p>\n<p>Here, I am adding two parameters to create a Persistent Volume for Prometheus and AlertManager.<\/p>\n<p>We can do the same configurations in the <code>values.yaml<\/code> as well.<\/p>\n<p>I am using the EKS cluster for a demo, so the mentioned storage class <code>gp2<\/code> is the default of the <a href=\"https:\/\/devopscube.com\/backup-and-restore-eks-cluster-velero\/\" rel=\"noreferrer noopener\">EKS cluster<\/a>.<\/p>\n<pre><code class=\"language-bash\">helm upgrade -i prometheus prometheus-community\/prometheus \\\n    --namespace monitoring \\\n    --set alertmanager.persistence.storageClass=\"gp2\" \\\n    --set server.persistentVolume.storageClass=\"gp2\"<\/code><\/pre>\n<p>You will get the status as deployed on a successful <a href=\"https:\/\/devopscube.com\/kubernetes-deployment-tutorial\/\" rel=\"noreferrer noopener\">deployment<\/a>, as shown below.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-78-6.png\" class=\"kg-image\" alt=\"the output of the prometheus helm installation\" loading=\"lazy\" width=\"902\" height=\"528\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-78-6.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-78-6.png 902w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Before we access the Prometheus, we can check that all the components are deployed and running properly.<\/p>\n<pre><code class=\"language-bash\">kubectl -n monitoring get all<\/code><\/pre>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-81-6.png\" class=\"kg-image\" alt=\"the list of prometheus components after the prometheus stack deployment\" loading=\"lazy\" width=\"1454\" height=\"839\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-81-6.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/03\/image-81-6.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-81-6.png 1454w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<h3 id=\"step-4-port-forward-the-prometheus-pod\">Step 4: Port Forward the Prometheus Pod<\/h3>\n<p>The above screenshot clearly shows that each Prometheus stack component has the Service type as Cluster IP, so it can only be accessed from inside the cluster.<\/p>\n<p>But we need to access it from our local machine to see the dashboard, so we perform the port forwarding.<\/p>\n<p>First, start with the Prometheus port forwarding, identify the Prometheus Pod name, and create that as an environment variable.<\/p>\n<pre><code class=\"language-bash\">export POD_NAME=$(kubectl get pods --namespace monitoring -l \"app.kubernetes.io\/name=prometheus,app.kubernetes.io\/instance=prometheus\" -o jsonpath=\"{.items[0].metadata.name}\")<\/code><\/pre>\n<p>To perform port forwarding, use the following command.<\/p>\n<pre><code class=\"language-bash\">kubectl --namespace monitoring port-forward $POD_NAME 9090<\/code><\/pre>\n<p>The port forwarding is properly done; you will see the following output.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-82-7.png\" class=\"kg-image\" alt=\"the port forwarding of the prometheus pod to access it from the local\" loading=\"lazy\" width=\"1083\" height=\"399\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-82-7.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/03\/image-82-7.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-82-7.png 1083w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Don&#8217;t close the terminal; meanwhile, open any of the web browsers from the same machine and paste this URL <code>localhost:9090<\/code><\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-83-7.png\" class=\"kg-image\" alt=\"the prometheus dashboard \" loading=\"lazy\" width=\"1227\" height=\"1040\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-83-7.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/03\/image-83-7.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-83-7.png 1227w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>In the Target section, we can see the cluster resources that Prometheus is monitoring by default.<\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-84-8.png\" class=\"kg-image\" alt=\"the targets section of the prometheus dashboard\" loading=\"lazy\" width=\"1892\" height=\"974\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-84-8.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/03\/image-84-8.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1600\/2025\/03\/image-84-8.png 1600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-84-8.png 1892w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>Now, we can try the Alertmanager port forwarding to see the dashboard.<\/p>\n<pre><code class=\"language-bash\">export POD_NAME=$(kubectl get pods --namespace monitoring -l \"app.kubernetes.io\/name=alertmanager,app.kubernetes.io\/instance=prometheus\" -o jsonpath=\"{.items[0].metadata.name}\")<\/code><\/pre>\n<p>To perform a Port forward to the Alertmanager Pod, use the following command.<\/p>\n<pre><code class=\"language-bash\">kubectl --namespace monitoring port-forward $POD_NAME 9093<\/code><\/pre>\n<p>Same as Prometheus, use localhost URL from the browser, but this time use the Port <code>9093.<\/code><\/p>\n<figure class=\"kg-card kg-image-card\"><img decoding=\"async\" src=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-85-6.png\" class=\"kg-image\" alt=\"The dashboard of the prometheus alertmanager\" loading=\"lazy\" width=\"1221\" height=\"570\" srcset=\"https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w600\/2025\/03\/image-85-6.png 600w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/size\/w1000\/2025\/03\/image-85-6.png 1000w, https:\/\/storage.ghost.io\/c\/5f\/2f\/5f2f4d20-2abf-4534-8d40-7aa233aedd43\/content\/images\/2025\/03\/image-85-6.png 1221w\" sizes=\"auto, (min-width: 720px) 720px\"><\/figure>\n<p>The Alertmanger is also working correctly, so the installation was successful.<\/p>\n<p>Here, we have explored the port forwarding method to expose the Prometheus application.<\/p>\n<blockquote><p><strong>Note:<\/strong> If you want a static endpoint to access the Prometheus via internal or external DNS, you can either use the type Node Port or <a href=\"https:\/\/devopscube.com\/aws-load-balancer-controller-on-eks\/\" rel=\"noreferrer noopener\">Load Balancer<\/a> service. Also you can use <a href=\"https:\/\/devopscube.com\/kubernetes-ingress-tutorial\/\" rel=\"noreferrer noopener\">ingress<\/a> to expose it via DNS. For TLS, use <a href=\"https:\/\/devopscube.com\/configure-ingress-tls-kubernetes\/\" rel=\"noreferrer noopener\">ingress TLS<\/a> configurations.<\/p><\/blockquote>\n<h2 id=\"prometheus-advanced-configuration\">Prometheus Advanced Configuration<\/h2>\n<p>Given below are the advanced configurations and best practices you can change based on your requirements.<\/p>\n<p>You can find the options below in the Helm values file itself.<\/p>\n<h3 id=\"storage\">Storage<\/h3>\n<p>By default, an 8 GB persistent volume will be attached to the Prometheus pod. If you want Prometheus to collect metrics from more workloads, increase the storage size.<\/p>\n<pre><code class=\"language-yaml\">server:\n  persistentVolume:\n     size: 20Gi<\/code><\/pre>\n<h3 id=\"retention-by-time-or-size\">Retention by Time or Size<\/h3>\n<p>By default, the retention period for metrics will be 15 days, after 15 days, the metric will be deleted.<\/p>\n<p>You can modify the retention by date or time.<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Don&#8217;t mix time based retention and size based retention, use any one<\/div>\n<\/div>\n<p>You can modify the retention on the Helm chart, as shown below.<\/p>\n<pre><code class=\"language-yaml\">server:\n  retention: 30d\n    # OR\n  retentionSize: 160Gi     <\/code><\/pre>\n<p>This configuration deletes the metrics after 30 days, or if you choose size-based retention, it will be deleted once it reaches 160 GB.<\/p>\n<h3 id=\"resource-limit-and-request\">Resource Limit and Request<\/h3>\n<p>By default, there will be no resource limit and request, but setting up resource limits and requests is important so that the pod can let the cluster know its resource requirements.<\/p>\n<p>Below is an example of a resource request and limit.<\/p>\n<pre><code class=\"language-yaml\">server:\n  resources:\n    requests:\n      cpu: \"500m\"\n      memory: \"512Mi\"\n    limits:\n      cpu: \"500m\"\n      memory: \"512Mi\"<\/code><\/pre>\n<p>If your Prometheus is collecting metrics from more workloads, increase the limits accordingly, or the pod will be restarted or OOMKILL because of insufficient resources.<\/p>\n<h3 id=\"basic-high-availability\">Basic High Availability<\/h3>\n<p>For high availability, you can do the following:<\/p>\n<ul>\n<li>Specify to deploy as a statefulset<\/li>\n<li>Increase replicas to 2<\/li>\n<li>Change the update strategy to rolling update, which prevents the pods from terminating until a new pod is created.<\/li>\n<li>Set the pod anti-affinity to hard, which prevents replicas from being created in the same node.<\/li>\n<\/ul>\n<pre><code class=\"language-yaml\">server:\n  replicaCount: 2\n  strategy:\n    type: RollingUpdate\n\n  podAntiAffinity: hard\n\n  statefulSet:\n    enabled: true<\/code><\/pre>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Each replica collects the same metrics from the workloads.<\/div>\n<\/div>\n<h2 id=\"how-to-calculate-prometheus-resource-requirements-for-real-projects\">How to Calculate Prometheus Resource Requirements for Real Projects?<\/h2>\n<p>We can calculate the estimated requirements of Prometheus stacks based on the project requirements.<\/p>\n<p>Let us take a real project scenario.<\/p>\n<p>Assume we are setting this up for a project in Kubernetes.<\/p>\n<h3 id=\"understand-your-project-environment\">Understand your Project Environment<\/h3>\n<p>The following is the list of requirements for the project<\/p>\n<ul>\n<li>The cluster has 30 worker nodes<\/li>\n<li>Almost 400 pods are running in it<\/li>\n<li>We would need a standard Prometheus exporter such as\n<ul>\n<li>Node Exporter<\/li>\n<li>Kubelet\/cAdvisor<\/li>\n<li>kube-state-metrics<\/li>\n<li>kube-apiserver<\/li>\n<li>ETCD<\/li>\n<\/ul>\n<\/li>\n<li>Scrape interval: 15 seconds<\/li>\n<li>Data retention: 15 days<\/li>\n<\/ul>\n<p>Now, we need to know the volume of data that we are expecting.<\/p>\n<h3 id=\"estimation-of-the-active-time-series\">Estimation of the Active Time Series<\/h3>\n<p>Prometheus stores the metrics as time series data<\/p>\n<p>The following is the rough estimation of the data.<\/p>\n<p><!--kg-card-begin: html--><\/p>\n<table data-start=\"995\" data-end=\"1301\" class=\"w-fit min-w-(--thread-content-width)\">\n<thead data-start=\"995\" data-end=\"1044\">\n<tr data-start=\"995\" data-end=\"1044\">\n<th data-start=\"995\" data-end=\"1004\" data-col-size=\"sm\">Source<\/th>\n<th data-start=\"1004\" data-end=\"1022\" data-col-size=\"sm\">Series per Node<\/th>\n<th data-start=\"1022\" data-end=\"1044\" data-col-size=\"sm\">Total for 30 Nodes<\/th>\n<\/tr>\n<\/thead>\n<tbody data-start=\"1094\" data-end=\"1301\">\n<tr data-start=\"1094\" data-end=\"1128\">\n<td data-start=\"1094\" data-end=\"1110\" data-col-size=\"sm\">node-exporter<\/td>\n<td data-col-size=\"sm\" data-start=\"1110\" data-end=\"1118\">1,000<\/td>\n<td data-col-size=\"sm\" data-start=\"1118\" data-end=\"1128\">30,000<\/td>\n<\/tr>\n<tr data-start=\"1129\" data-end=\"1168\">\n<td data-start=\"1129\" data-end=\"1150\" data-col-size=\"sm\">kubelet + cAdvisor<\/td>\n<td data-col-size=\"sm\" data-start=\"1150\" data-end=\"1158\">2,500<\/td>\n<td data-col-size=\"sm\" data-start=\"1158\" data-end=\"1168\">75,000<\/td>\n<\/tr>\n<tr data-start=\"1169\" data-end=\"1219\">\n<td data-start=\"1169\" data-end=\"1205\" data-col-size=\"sm\">kube-state-metrics (cluster-wide)<\/td>\n<td data-col-size=\"sm\" data-start=\"1205\" data-end=\"1209\">\u2013<\/td>\n<td data-col-size=\"sm\" data-start=\"1209\" data-end=\"1219\">30,000<\/td>\n<\/tr>\n<tr data-start=\"1220\" data-end=\"1265\">\n<td data-start=\"1220\" data-end=\"1252\" data-col-size=\"sm\">kube-apiserver (cluster-wide)<\/td>\n<td data-col-size=\"sm\" data-start=\"1252\" data-end=\"1256\">\u2013<\/td>\n<td data-col-size=\"sm\" data-start=\"1256\" data-end=\"1265\">5,000<\/td>\n<\/tr>\n<tr data-start=\"1266\" data-end=\"1301\">\n<td data-start=\"1266\" data-end=\"1288\" data-col-size=\"sm\">etcd (cluster-wide)<\/td>\n<td data-col-size=\"sm\" data-start=\"1288\" data-end=\"1292\">\u2013<\/td>\n<td data-col-size=\"sm\" data-start=\"1292\" data-end=\"1301\">3,000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!--kg-card-end: html--><\/p>\n<p>Total = 30,000 + 75,000 + 30,000 + 5,000 + 3,000 = 143,000 series<\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Adding a <b><strong style=\"white-space: pre-wrap;\">20%<\/strong><\/b> buffer for new metrics and recording rules, so we expect around ~172,000 active series<\/div>\n<\/div>\n<h3 id=\"convert-to-samples-per-second-sps\">Convert to Samples Per Second (SPS)<\/h3>\n<p>Each time series has multiple data points that are based on the scrape interval.<\/p>\n<p>Prometheus scrapes data at intervals, so we calculate samples per second:<\/p>\n<p><strong>SPS = active_series \u00f7 scrape_interval<\/strong><\/p>\n<p>SPS = 172,000 \u00f7 15 = <strong>\u2248 11,467 samples\/second<\/strong><\/p>\n<h3 id=\"estimate-disk-storage\">Estimate Disk Storage<\/h3>\n<p>Assuming each sample takes around <strong>1.5 bytes <\/strong>after compression.<\/p>\n<p>So, daily data per replica is 11,467 \u00d7 86,400 \u00d7 1.5 \u2248 <strong>1.49 GB\/day.<\/strong><\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">86,400 is the seconds conversion of 24 hours.<\/div>\n<\/div>\n<p>For 15 days, 2 replicas, and 20% overhead:<br \/>1.49 \u00d7 15 \u00d7 2 \u00d7 1.2 = <strong>~54 GB total<\/strong><\/p>\n<p>We are rounding this up to <strong>60 to 80 GB disk<\/strong> per Prometheus instance.<\/p>\n<p>Now, we need to calculate the memory for Prometheus.<\/p>\n<h3 id=\"estimate-memory\">Estimate Memory<\/h3>\n<p>Prometheus keeps recent data in memory (head block) that usually requires <strong>~2\u20133 KB per active series<\/strong>.<\/p>\n<p>172,000 \u00d7 2.5 KB \u2248 <strong>430 MB<\/strong> for active data.<\/p>\n<p>We also need to consider memory for queries, UI, and buffers:<br \/>So each Prometheus Pod should have a resource <strong>Request <\/strong>of <strong>2 GB<\/strong>, a <strong>Limit of 4 GB<\/strong><\/p>\n<h3 id=\"estimate-cpu\">Estimate CPU<\/h3>\n<p>Prometheus is not a very CPU-heavy application, but <strong>queries<\/strong> and <strong>rule evaluations<\/strong> require CPU.<\/p>\n<ul>\n<li><strong>Request:<\/strong> 2 vCPU<\/li>\n<li><strong>Limit:<\/strong> 4 vCPU<\/li>\n<\/ul>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">For larger setups, it is recommended to use 4 vCPU request and 8 vCPU limit.<\/div>\n<\/div>\n<h3 id=\"recommended-prometheus-flags\">Recommended Prometheus Flags<\/h3>\n<p>For the production setup, the following flags are essential.<\/p>\n<ul>\n<li><code>--storage.tsdb.retention.time=15d<\/code> &#8211; To define how long we want to keep the metrics<\/li>\n<li><code>--storage.tsdb.retention.size=70GB<\/code> &#8211; How many metrics should be stored on the disk? Once the threshold is reached, old data will be deleted.<\/li>\n<li><code>--query.max-concurrency=40<\/code> &#8211; Controls how many queries can run at a time.<\/li>\n<li><code>--web.enable-admin-api=false<\/code> &#8211; This disables the Prometheus admin API as part of security.<\/li>\n<li><code>--enable-feature=exemplar-storage<\/code> &#8211; To link the traces with the metrics.<\/li>\n<\/ul>\n<h3 id=\"validate-resource-usage-after-deployment\">Validate Resource Usage After Deployment<\/h3>\n<p>Once Prometheus is up and running, monitor the following metrics to know what resources need to be adjusted.<\/p>\n<p><!--kg-card-begin: html--><\/p>\n<table data-start=\"3205\" data-end=\"3574\" class=\"w-fit min-w-(--thread-content-width)\">\n<thead data-start=\"3205\" data-end=\"3225\">\n<tr data-start=\"3205\" data-end=\"3225\">\n<th data-start=\"3205\" data-end=\"3214\" data-col-size=\"md\">Metric<\/th>\n<th data-start=\"3214\" data-end=\"3225\" data-col-size=\"sm\">Purpose<\/th>\n<\/tr>\n<\/thead>\n<tbody data-start=\"3249\" data-end=\"3574\">\n<tr data-start=\"3249\" data-end=\"3310\">\n<td data-start=\"3249\" data-end=\"3281\" data-col-size=\"md\"><code data-start=\"3251\" data-end=\"3280\">prometheus_tsdb_head_series<\/code><\/td>\n<td data-col-size=\"sm\" data-start=\"3281\" data-end=\"3310\">Shows total active series<\/td>\n<\/tr>\n<tr data-start=\"3311\" data-end=\"3393\">\n<td data-start=\"3311\" data-end=\"3369\" data-col-size=\"md\"><code data-start=\"3313\" data-end=\"3368\">rate(prometheus_tsdb_head_samples_appended_total[5m])<\/code><\/td>\n<td data-col-size=\"sm\" data-start=\"3369\" data-end=\"3393\">Shows ingestion rate<\/td>\n<\/tr>\n<tr data-start=\"3394\" data-end=\"3455\">\n<td data-start=\"3394\" data-end=\"3435\" data-col-size=\"md\"><code data-start=\"3396\" data-end=\"3434\">prometheus_tsdb_storage_blocks_bytes<\/code><\/td>\n<td data-col-size=\"sm\" data-start=\"3435\" data-end=\"3455\">Shows disk usage<\/td>\n<\/tr>\n<tr data-start=\"3456\" data-end=\"3534\">\n<td data-start=\"3456\" data-end=\"3504\" data-col-size=\"md\"><code data-start=\"3458\" data-end=\"3503\">prometheus_rule_group_last_duration_seconds<\/code><\/td>\n<td data-col-size=\"sm\" data-start=\"3504\" data-end=\"3534\">Shows rule evaluation time<\/td>\n<\/tr>\n<tr data-start=\"3535\" data-end=\"3574\">\n<td data-start=\"3535\" data-end=\"3542\" data-col-size=\"md\"><code data-start=\"3537\" data-end=\"3541\">up<\/code><\/td>\n<td data-col-size=\"sm\" data-start=\"3542\" data-end=\"3574\">Confirms targets are healthy<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!--kg-card-end: html--><\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">If memory usage goes above 70\u201380% of the limit, we need to scale it.<\/div>\n<\/div>\n<h3 id=\"scaling-guidelines\">Scaling Guidelines<\/h3>\n<p>For smaller setups like <strong>10 nodes<\/strong> and <strong>~150 Pods<\/strong>, we can use the following estimation.<\/p>\n<ul>\n<li>40\u201360k series<\/li>\n<li>~0.5 GB\/day storage<\/li>\n<li>20\u201330 GB disk<\/li>\n<li>1\u20132 vCPU<\/li>\n<li>1\u20132 GB RAM<\/li>\n<\/ul>\n<p>For larger setups like 50-80 nodes, ~1,500\u20132,000 pods, we can consider the following:<\/p>\n<ul>\n<li>Scaling the resources, such as CPU\/RAM\/disk, to run the Prometheus stack without any issues.<\/li>\n<li>And if planning to store the metrics for the long-term, consider <strong>Thanos, Cortex, or Mimir<\/strong> with Prometheus.<\/li>\n<\/ul>\n<h3 id=\"when-to-use-thanos-cortex-or-mimir\">When to Use Thanos, Cortex, or Mimir<\/h3>\n<p>If your requirements are any of the following, you can use Thanos or Cortex with Prometheus.<\/p>\n<ul>\n<li>You are expecting over <strong>1\u20132 million time series <\/strong>or<\/li>\n<li>You want to store the metrics longer than <strong>30\u201360 days <\/strong>or<\/li>\n<li>Multiple clusters that need a single view<\/li>\n<\/ul>\n<p>Then switch to <strong>Prometheus + Thanos\/Cortex\/Mimir<\/strong>.<br \/>You will get scalable, long-term, and highly available metrics.<\/p>\n<h3 id=\"final-recommendation\">Final Recommendation<\/h3>\n<p>For a <strong>small to medium<\/strong> production cluster, use the following resources for Prometheus.<\/p>\n<p><!--kg-card-begin: html--><\/p>\n<table data-start=\"4358\" data-end=\"4569\" class=\"w-fit min-w-(--thread-content-width)\">\n<thead data-start=\"4358\" data-end=\"4401\">\n<tr data-start=\"4358\" data-end=\"4401\">\n<th data-start=\"4358\" data-end=\"4369\" data-col-size=\"sm\">Resource<\/th>\n<th data-start=\"4369\" data-end=\"4401\" data-col-size=\"sm\">Recommendation (per replica)<\/th>\n<\/tr>\n<\/thead>\n<tbody data-start=\"4448\" data-end=\"4569\">\n<tr data-start=\"4448\" data-end=\"4466\">\n<td data-start=\"4448\" data-end=\"4454\" data-col-size=\"sm\">CPU<\/td>\n<td data-col-size=\"sm\" data-start=\"4454\" data-end=\"4466\">2\u20134 vCPU<\/td>\n<\/tr>\n<tr data-start=\"4467\" data-end=\"4486\">\n<td data-start=\"4467\" data-end=\"4476\" data-col-size=\"sm\">Memory<\/td>\n<td data-col-size=\"sm\" data-start=\"4476\" data-end=\"4486\">2\u20134 GB<\/td>\n<\/tr>\n<tr data-start=\"4487\" data-end=\"4512\">\n<td data-start=\"4487\" data-end=\"4494\" data-col-size=\"sm\">Disk<\/td>\n<td data-start=\"4494\" data-end=\"4512\" data-col-size=\"sm\">60\u201380 GB (SSD)<\/td>\n<\/tr>\n<tr data-start=\"4513\" data-end=\"4536\">\n<td data-start=\"4513\" data-end=\"4525\" data-col-size=\"sm\">Retention<\/td>\n<td data-col-size=\"sm\" data-start=\"4525\" data-end=\"4536\">15 days<\/td>\n<\/tr>\n<tr data-start=\"4537\" data-end=\"4569\">\n<td data-start=\"4537\" data-end=\"4555\" data-col-size=\"sm\">Scrape Interval<\/td>\n<td data-col-size=\"sm\" data-start=\"4555\" data-end=\"4569\">15 seconds<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!--kg-card-end: html--><\/p>\n<div class=\"kg-card kg-callout-card kg-callout-card-blue\">\n<div class=\"kg-callout-emoji\">\ud83d\udca1<\/div>\n<div class=\"kg-callout-text\">Start with this baseline resources, then observe metrics and adjust as your environment grows.<\/div>\n<\/div>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>This guide provides a basic installation of the Prometheus stack using Helm. You will need to configure it to monitor your applications or endpoints.<\/p>\n<p>Also, we have seen additional configurations and resource calculation of Prometheus.<\/p>\n<p>For advanced configuration or production level setup, you can make use of <a href=\"https:\/\/devopscube.com\/setup-prometheus-operator\/\" rel=\"noreferrer noopener\">Prometheus Operator<\/a>, where every configuration is available as a Kubernetes CRD.<\/p>\n<hr>\n<p><strong>Ngu\u1ed3n:<\/strong> <a href=\"https:\/\/devopscube.com\/setup-prometheus-helm-chart\/\" target=\"_blank\" rel=\"noopener noreferrer\">How to Setup Prometheus Using Helm Chart? &#8211; Detailed Guide \u2014 DevOpsCube<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: https:\/\/devopscube.com\/setup-prometheus-helm-chart\/<\/p>\n","protected":false},"author":1,"featured_media":455,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-454","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/454","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=454"}],"version-history":[{"count":0,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/posts\/454\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=\/wp\/v2\/media\/455"}],"wp:attachment":[{"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ngocha.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}