Azure Monitor for AKS (Container Insight)

Introduction to Azure Monitor for AKS

Monitoring is a crucial for any application or infrastructure deployment,
whether its on-premises or cloud solution.Applications, Infrastructure  malfunctioning can be happens in any moment. So its better to have a monitoring solution to monitor and get relevant steps before such kind of failures.

Azure Monitor for AKS is kind of monitoring solution Azure team provide us to go deep in to monitoring Azure managed Kubernetes cluster. Container monitoring is a critical when  you’re running a production cluster, at scale, with multiple applications. 

Azure Monitor for containers gives you performance visibility by collecting memory and processor metrics from controllers, nodes, and containers that are available in Kubernetes through the Metrics API. It collect container logs as well and stored them in Azure Log Analytics.

Azure Container Monitor is NOW GA !!  ready to use in production AKS Clusters

What Azure Container Monitor provides?

  • Identify AKS containers that are running on the node and their average processor and memory utilization. This knowledge can help you identify resource bottlenecks.
  • Identify where the container resides in a controller or a pod. This knowledge can help you view the controller’s or pod’s overall performance.
  • Review the resource utilization of workloads running on the host that are unrelated to the standard processes that support the pod.
  • Understand the behavior of the cluster under average and heaviest loads. This knowledge can help you identify capacity needs and determine the maximum load that the cluster can sustain.

Onboard Azure Container Monitoring 

Onboarding Azure Container Monitoring can be done by different methods. It depends on the way/method we deploy the AKS cluster to Azure. The best way of achieving this is by using IaC for  the deployment of cluster.

Azure Portal

If the AKS cluster is deployed from Azure portal we can enable monitoring at the creation of the cluster. follow is the settings we need to enable.

ARM Template

Its possible to use Azure ARM Template to deploy AKS cluster and at the deployment we can enable the monitoring. Step-By-Step Guide

Onbording Azure Container Monitoring for Existing Cluster

If the AKS cluster is deployed and need to take the advantage of monitoring, we can enable it by using following methods.

Refer Azure Documentation

Azure Container Monitoring Enabled AKS Cluster

Lets walkthrough how to deploy a AKS cluster with monitoring enable. For this I used Terraform templates. In previous blog, I explain how to deploy AKS cluster using Terraform.

If you go through previous blog post mentioned above, and if we want to enable Azure Container Monitoring to same cluster, we need to add few more resources and variables to Terraform template. Lets go through each step by step

Adding Azure Log Analytics Workspace

To work with Azure Container Monitoring, first we need to have Log Analytics workspace. To create a workspace we can use below Terraform code.

#variable associate to this resource

variable "log_analytics_workspace_name" {

}
variable "log_analytics_workspace_location" {
    default = "eastus"
}
variable "log_analytics_workspace_sku" {
    default = "PerNode"
}


#Create Log Analytics Workspace
resource "azurerm_log_analytics_workspace" "aksterraform" {
    name                = "${var.log_analytics_workspace_name}"
    location            = "${var.log_analytics_workspace_location}"
    resource_group_name = "${azurerm_resource_group.k8terraform.name}"
    sku                 = "${var.log_analytics_workspace_sku}"
}

Enable Log Analytics Solution

#Enable Log Analytics Solution
resource "azurerm_log_analytics_solution" "aksterraformsolution" {
    solution_name         = "ContainerInsights"
    location              = "${azurerm_log_analytics_workspace.aksterraform.location}"
    resource_group_name   = "${azurerm_resource_group.k8terraform.name}"
    workspace_resource_id = "${azurerm_log_analytics_workspace.aksterraform.id}"
    workspace_name        = "${azurerm_log_analytics_workspace.aksterraform.name}"

    plan {
        publisher = "Microsoft"
        product   = "OMSGallery/ContainerInsights"
    }
}

Add the Container Monitoring agent for K8s Cluster Nodes

    addon_profile{
        oms_agent{
            enabled = true
            log_analytics_workspace_id = "${azurerm_log_analytics_workspace.aksterraform.id}"
        }
    }

Add variables to <Name>.tfvars file.

log_analytics_workspace_name = "aksterraform"

log_analytics_workspace_location = "eastus"

log_analytics_workspace_sku = "PerNode"

Next adding the output to .tf file. We use output to get any information or configuration data for our use. Output are optional but it will be usefull in some scenarios.

#Outputs
output "client_key" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.client_key}"
}

output "client_certificate" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.client_certificate}"
}

output "cluster_ca_certificate" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.cluster_ca_certificate}"
}

output "cluster_username" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.username}"
}

output "cluster_password" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.password}"
}

output "kube_config" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config_raw}"
}

output "host" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.host}"
}

Following is the full template for .tf file and .tfvars file

#.tfvars

arm_subscription_id = "SUB_ID"

arm_client_id = "CLIENT_ID"

arm_client_secret = "CLIENT_SECRET"

arm_tenent_id = "Tenent_ID"

resource_group_name = "k8terraform"

log_analytics_workspace_name = "aksterraform"

log_analytics_workspace_location = "eastus"

log_analytics_workspace_sku = "PerNode"

location = "East US"

cluster_name = "k8terraform"

dns_prifix = "k8terraform1232"

ssh_public_key = "E:\\DevOps\\Terraform\\Azure\\AKS\\aksdeploy"

agent_count = 3
# .tf

#Variable
variable "arm_subscription_id" {
}

variable "arm_client_id" {
}

variable "arm_client_secret" {
}

variable "arm_tenent_id" {
}

variable "location" {
}

variable "cluster_name" {
}

variable "dns_prifix" {
}

variable "ssh_public_key" {
}

variable "agent_count" {
    default = 3
}

variable "resource_group_name" {
}

variable "log_analytics_workspace_name" {

}
variable "log_analytics_workspace_location" {
    default = "eastus"
}
variable "log_analytics_workspace_sku" {
    default = "PerNode"
}



#Add Azure Provider
provider "azurerm" {
}

#Create Resource Group
resource "azurerm_resource_group" "k8terraform" {
    name = "${var.resource_group_name}"
    location = "${var.location}"
}

#Create Log Analytics Workspace
resource "azurerm_log_analytics_workspace" "aksterraform" {
    name                = "${var.log_analytics_workspace_name}"
    location            = "${var.log_analytics_workspace_location}"
    resource_group_name = "${azurerm_resource_group.k8terraform.name}"
    sku                 = "${var.log_analytics_workspace_sku}"
}

#Enable Log Analytics Solution
resource "azurerm_log_analytics_solution" "aksterraformsolution" {
    solution_name         = "ContainerInsights"
    location              = "${azurerm_log_analytics_workspace.aksterraform.location}"
    resource_group_name   = "${azurerm_resource_group.k8terraform.name}"
    workspace_resource_id = "${azurerm_log_analytics_workspace.aksterraform.id}"
    workspace_name        = "${azurerm_log_analytics_workspace.aksterraform.name}"

    plan {
        publisher = "Microsoft"
        product   = "OMSGallery/ContainerInsights"
    }
}

#Create AKS Cluster
resource "azurerm_kubernetes_cluster" "k8cluster" {
    name = "${var.cluster_name}"
    location = "${azurerm_resource_group.k8terraform.location}"
    resource_group_name = "${azurerm_resource_group.k8terraform.name}"
    dns_prefix = "${var.dns_prifix}"

    linux_profile{
        admin_username = "localadmin"
        ssh_key{
            key_data = "${file("${var.ssh_public_key}")}"
        }
    }

    agent_pool_profile{
        name = "aksterraform"
        count = "${var.agent_count}"
        vm_size = "Standard_B2ms"
        os_type = "Linux"
        os_disk_size_gb = 30
    }
    addon_profile{
        oms_agent{
            enabled = true
            log_analytics_workspace_id = "${azurerm_log_analytics_workspace.aksterraform.id}"
        }
    }

    service_principal{
        client_id = "${var.arm_client_id}"
        client_secret = "${var.arm_client_secret}"
    }
    tags{
        Enviornment = "Development"
    }
}

#Outputs
output "client_key" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.client_key}"
}

output "client_certificate" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.client_certificate}"
}

output "cluster_ca_certificate" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.cluster_ca_certificate}"
}

output "cluster_username" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.username}"
}

output "cluster_password" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.password}"
}

output "kube_config" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config_raw}"
}

output "host" {
    value = "${azurerm_kubernetes_cluster.k8cluster.kube_config.0.host}"
}

After the AKS & Container Monitor deployment it will take 20min to get the metrics and logs to analysis. So be patient if you didn’t see any metrics in the Container monitor dashboard.

We can verify the agent (OMS agent Deployment) deployed to cluster by running following command.

kubectl get ds omsagent --namespace=kube-system

NAME       DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
omsagent   3         3         3         3            3           beta.kubernetes.io/os=linux   21d

Navigate to AKS Monitoring 

We have few ways to open the Container Insight. One is Using Azure Monitor (as below) & inside the AKS cluster blade navigate to insights

 Cluster Wide Metrics Monitoring

Cluster wide monitoring metrics is a important to have, when we have a large deployment. Because it has more chance to go out of resources in the cluster. So its important to see CPU, Memory, Node Status over the time period, Pod count.

Cluster Wide Metrics 

Drill down further into AKS Cluster

You can drill down to performance grid view that shows the health and performance of your nodes, controllers, and containers. 

Performance Grid View

In performance grid view it shows how much CPU capacity used for each container. each bar is consist of 15min. this is very useful to measure the traffic or identify fault containers.

Multi-cluster View

Its common to have  multiple AKS clusters to manage. If we can monitor altogether in one place it would be help full to administrators. The multi-cluster view discovers all AKS clusters across subscriptions, resource group, and workspaces, and provides you a health roll up view.

Live Debugging Log Output 

With live logs you get a real time, live stream of your container logs directly in your Azure portal. You can pause the live stream and search within the log file for errors or issues.

To learn more about Azure Monitor for containers, read our documentation, “Azure Monitor for containers overview.”

Thank You & Happy Monitoring !!