In this article I would like to focus on virtual machines, in particular Cisco ISE virtual machines running on VMware. I will explain why virtual ISE deployments DO NOT support snapshots as well as the potential issues that you could face if snapshots are enabled.
So what is a snapshot?
A snapshot is a copy of a virtual machines disk file (.VMDK) at a particular point in time. VMware allows you to take manual snapshots of a virtual machine or even automatically take snapshots of devices at a specific time. Snapshots are useful in situations where an operational device is rendered useless for whatever reason and you would like to restore that device back to a working state.
So why doesn't Cisco ISE support snapshots?
Cisco ISE comes with its own backup and restore utilities and not only that, Cisco ISE doesn't support backups because the data within the nodes is constantly changing and is being synchronised with the database.
What happens if snapshots are taken of ISE nodes?
If snapshots are taken of ISE nodes, the nodes will freeze and cause services to stop. To resume services, a reboot of the affected ISE node will be required.
Snapshots can seriously affect the deployment to a point where the ISE database becomes corrupted and a complete new install is required of that node again. I've also seen behaviour where the ISE node becomes corrupted but even after an application reset, it still doesn't work as it should.
I don't have access to the VMware environment so how would I know that snapshots may be affecting my ISE deployment?
If the VMware infrastructure is managed by a third party, more often than not you may not have access to the back-end environment. When trying to troubleshoot issues with virtual ISE instances, this can sometimes prove challenging, especially if you need to see whether snapshots are the root cause of issues within your ISE deployment.
Nevertheless, we can often diagnose the issue from ISE. So if you find that you are troubleshooting a potential snapshot issue, take a look at the following points that have been observed on virtual ISE deployments when snapshots are enabled.
The ISE node is still reachable via ping however you cannot login via SSH
AAA requests to ISE PSN's is failing
When I try to access the GUI of the ISE node, it times out
I can access the primary PAN but some of the nodes are shown as offline when I check the deployment status
How can I maintain backups ensuring snapshots don't affect my ISE deployment?
Ensure automatic snapshots are disabled
Ensure the relevant teams are aware that snapshots shouldn't be taken of ISE nodes
Configure scheduled backups within ISE