Automation and Smart Card Protected SSH
Ablative makes heavy use of Ansible, this allows us to push changes to servers quickly, ensure that all instances are hardened appropriately and, by following a secure software development lifecycle, we can automate all interaction with customer servers meaning no human (beyond the customer themselves) ever has access to a server.
One of the biggest issues with automation is protecting the credentials that grant access to the customer servers. We’ve seen APTs target MSPs and we can’t ignore the possibility that the UK Government can interfere with our equipment or even seize our infrastructure.
We can’t protect the SSH keys with passwords as that requires a human to interact with the session.
So we turned to HSMs, namely the NitroKey HSM 2.
Hardware Security Modules
The primary benefit of a HSM is that the private key material is stored in an area of memory that prevents extraction, this security mechanism is similar to the ‘secure enclave’ in Apple products that annoys the FBI et al.
Using a HSM we can securely generate the private keys that allow Ansible to
access servers without the risk that someone could steal the keys from ~/.ssh/
.
The HSM is exposed via GPG agent so in theory someone with a shell on the automation hosts could still use the key. To thwart this we have several layered defences;
- No direct inbound connectivity
- Automation hosts use ansible-pull
- Tripwire logging to Splunk
- Syslog pushing to Splunk
- Signed git commits
Layered Defence
No direct inbound connectivity
Our automation servers are physically separate from the rest of the Ablative Infrastructure and use both physical and host based firewalls to prevent inbound access via normal means.
These servers do allow SSH access but only via an authenticated v3 .onion address.
In the event the servers become unreachable then someone from the team must travel to the location and login locally.
Automation Hosts Use Ansible Pull
ansible-pull inverts the
usual paradigm where the management host connects over SSH to targets and runs the
Ansible playbook. Instead the host pulls a copy of the playbooks (including the ability to
leverage verify-commit
(see ‘Signed git commits’ below) to ensure the code is authentic) and executes them
locally.
Tripwire logging to Splunk
TripWire is a software suite that monitors a host for any unauthorized changes to the filesystem. If any files are changed (whether by ansible-pull or not) a notification is sent to Splunk which in turn alerts the SOC that something untoward is happening.
Syslog pushing to Splunk
Syslog gathers a lot of the day-to-day activity of the server (daemon lifecycle, crontabs,
etc) alongside important events such as a user logging in or a sudo
/doas
command
being issued.
These events are relayed to Splunk which can evaluate them and if neccessary raise an alert to the SOC.
Signed git commits
The automation hosts perform a git pull
before they perform their scheduled actions,
if any new commits have been added since they last run the cycle through the commits
with git verify-commit <hash>
.
If any of the commits fail GPG verification the scheduled tasks fails and an alert is raised to the SOC.
Physical Seizure
The last weakness in our model is if the automation servers and their HSMs are seized as part of a Search and Seizure or Equipment Interference Warrant.
All servers utilise the Brass Horn Communications S53 hardware, in the event that the server is moved, a USB device is plugged in (and a variety of other conditions) then the s53 daemon will lock the HSM, destroy any Full Disk Encryption elements and, if necessary, shut the machine down.
We’ve gone to some effort to ensure that we can automate the management of customer servers without risking access by unauthorized parties, if this sort of security is what you’re looking for in a web host then drop by our website or the .onion and grab some shared hosting or a MultiHop VPS.