ADR 007: Remove Admin Load Balancer Failover Lambda - TBC
Status
✅ Accepted
Context
A Lambda function (ise_tacacs_https_failover.py) was introduced to manage failover for Cisco ISE Admin traffic behind the AWS Load Balancer.
The intent was to programmatically determine the active Primary PAN and update load balancer target registration accordingly.
Testing of ise_tacacs_https_failover.py confirmed the following:
- When the primary node is down, the secondary node takes just under 120 seconds to respond
- The Lambda timeout had to be increased to approximately 2 minutes for the function to complete successfully
- The EventBridge schedule runs every minute, which would result in overlapping Lambda invocations
- The secondary node continues to report the primary node as the active PAN in its API response when the primary is down
- Cisco confirmed this is currently expected behaviour and not considered a software defect
These findings create two key issues:
- The API response time is too slow for effective failover routing
- The API response cannot be relied upon to accurately determine the active primary PAN during failure conditions
As a result, the Lambda-based failover approach does not provide a reliable or operationally suitable mechanism for Admin traffic routing.
Decision
The Lambda-based Admin failover logic implemented by ise_tacacs_https_failover.py will be removed.
Admin traffic routing will instead rely on the native AWS Load Balancer health check to determine node availability.
No custom Lambda logic will be used to identify the active primary PAN for Admin traffic.