Skip to main content

Cisco ISE Production Deployment on AWS

Purpose & Background

Cisco Identity Services Engine (ISE) provides network access control (NAC), RADIUS/TACACS authentication, authorization, accounting, and policy enforcement services.

This document describes the Production deployment of Cisco ISE on AWS, including:

  • infrastructure components
  • failover mechanism
  • health checks
  • operational guidance

High-Level Architecture

Two Cisco ISE nodes are deployed on EC2 forming a TACACS cluster across two availability zones.

RADIUS/TACACS traffic health is actively managed by a UDP Lambda probe, which performs a real RADIUS check on port 1812 and dynamically registers or deregisters nodes from the NLB target groups.

Admin traffic on TCP/443 is not automated via Lambda. Instead, the Admin NLB always registers only one node at a time, controlled via Terraform.


Components

VPC

Network Operations Centre VPC

  • vpc-035cbba5e5264a6bc

EC2 Instances

Deployed across two Availability Zones.

TACACS Cluster

Two EC2 instances are used:

TACACS1
  • Instance name: MOJ-NS-NOC-ISE-TACACS001
  • Private IP: 172.27.3.63
  • Subnet: subnet-0e2b800aa0df1c3f0 (Private1-az1)
TACACS2
  • Instance name: MOJ-NS-NOC-ISE-TACACS002
  • Private IP: 172.27.4.63
  • Subnet: subnet-009e4b1fc6fb32cec (Private1-az2)

Key Pair

Stored in 1Password:

  • Vault: Network Operation Team - NOC
  • Item: CISCO ISE EC2 Prod KeyPair moj-aws-noc-ise-kp

Elastic Load Balancer

Network Load Balancers are deployed across two AZs with source IP stickiness enabled.

Subnets

  • subnet-0e2b800aa0df1c3f0 (Private1-az1)
  • subnet-009e4b1fc6fb32cec (Private1-az2)

Traffic Types

TACACS / RADIUS Ports

ISE-TACACS-NETWORK-NLB

  • Handles authentication traffic.
Port Protocol Purpose
49 TCP TACACS
1812 UDP RADIUS Auth
1813 UDP RADIUS Accounting

Target health is controlled by: - AWS NLB health checks - Custom UDP Lambda probe

Admin Port

ISE-TACACS-ADMIN-NLB

  • Handles GUI traffic on TCP/443

Only one node per cluster is registered at a time.

Normal state:

ISE-TACACS-443 172.27.3.63

Failover state:

ISE-TACACS-443 172.27.4.63

Failover is controlled through Terraform configuration, not Lambda.


AWS Transfer Family and S3

AWS Transfer Family SFTP is configured to send backup files from Cisco ISE to Amazon S3.

To use AWS Transfer Family with SFTP, each user must authenticate using a public SSH key, which is generated and uploaded into Transfer Family by the LAN team.

ISE Transfer Family

S3 bucket for tacacs backup

tacacs1 = { home_directory = "/moj-noc-prod-ise-backup/tacacs" }

Secrets Manager

Two AWS Secrets Manager secrets are used:

ise_basic_auth

Basic authentication credentials used by the TACACS failover Lambda when calling the Cisco ISE API.

ise_shared_secret

Shared secret used by the UDP health-check Lambda for RADIUS probe requests.

Route53

Two alias records are created for HTTPS traffic routing to the Network Load Balancer.

ISE Route53

VPC Endpoints

VPC Endpoints allow the Lambda functions to communicate with AWS APIs without using the public internet.

ISE Transfer Family

Lambda

Lambda Source Files

lambda_files = {
  tacacs_udp = "ise_tacacs_udp_failover.py"
}

EventBridge Scheduler

Lambda functions are triggered every minute.

Previous Failover Automation Design - Admin (443) Lambda for ISE-TACACS-ADMIN-NLB target group

Originally, admin failover was designed to be handled by the Lambda function:

ise_tacacs_https_failover.py

This function queried the Cisco ISE API to determine which node was PrimaryAdmin, then dynamically registered the correct node.

Design Decision

Due to the limitations Remove Admin Load Balancer Failover Lambda, the Lambda based Admin failover logic was removed.

Admin traffic failover is now handled by:

  • AWS NLB TCP health checks
  • Terraform controlled active target

Health Check - UDP Lambda for ISE-TACACS-NETWORK-NLB target group

This Lambda acts as a custom health checker for two TACACS nodes. It actively tests whether each node can respond on UDP/1812 (RADIUS authentication) and then updates two load balancer target groups so that only healthy nodes receive traffic.

AWS native load balancer health checks are mainly HTTP, HTTPS, or TCP based and do not perform a real RADIUS validation on UDP/1812.

If verification is required to confirm that the RADIUS service is responding correctly, basic TCP reachability cannot be relied upon. This Lambda performs an application level check instead.

Function Overview

Fetch RADIUS Shared Secret

Retrieve the RADIUS shared secret from AWS Secrets Manager.

Health Check

Probe port 1812 with a dummy Access-Request using radtest.

Treat any valid reply as a successful response:

  • Access-Accept
  • Access-Reject

Health Rules

  • Node is healthy if port 1812 responds
  • Node is unhealthy if port 1812 does not respond

Target Group Registration

Desired state:

Nodes that successfully pass the 1812 health check.

Apply the same desired node set to both target groups:

  1812
  1813

Failure Behaviour

  • If both nodes fail

    • Log an ALERT
    • Do not modify existing target groups
  • If a node starts responding again on 1812

    • It will be added back to both target groups during the next Lambda run

Operational Guidance

Normal Operations

  • Admin NLB should have only one registered target.

  • RADIUS/TACACS traffic dynamically adjusts based on probe results. If a node becomes unreachable, it will be temporarily deregistered and automatically re-added once healthy.

Admin NLB Failover Procedure

Admin NLB ISE-TACACS-ADMIN-NLB failover is performed through Terraform.

Terraform variable: admin_failover_to_secondary

Normal State

admin_failover_to_secondary = false

Active node: 172.27.3.63

Failover to Secondary

Change Terraform configuration:

admin_failover_to_secondary = true

Run pipeline / Terraform apply.

Terraform will:

deregister 172.27.3.63
register   172.27.4.63
Failback

Set:

admin_failover_to_secondary = false

Apply Terraform again.


Troubleshooting

UDP probe failing

Check the following:

  • RADIUS shared secret configuration
  • NLB target deregistration events
  • Security group rules
  • Cisco ISE service status

Persistent deregistration

If both nodes appear unhealthy:

  • Validate connectivity to both nodes
  • Verify credentials and shared secrets

Monitoring and Alerting

CloudWatch log groups are used for Lambda logging.

The current logging level is set to:

LOG_LEVEL=INFO

To enable more detailed troubleshooting logs, change the environment variable to:

LOG_LEVEL=DEBUG

CloudWatch Log Groups

tacacs_udp = "/aws/lambda/ise-tacacs-udp-failover"
This page was last reviewed on 16 March 2026. It needs to be reviewed again on 16 September 2026 by the page owner #nvvs-devops .