Automation and Smart Card Protected SSH

Ablative makes heavy use of Ansible, this allows us to push changes to servers quickly, ensure that all instances are hardened appropriately and, by following a secure software development lifecycle, we can automate all interaction with customer servers meaning no human (beyond the customer themselves) ever has access to a server.

One of the biggest issues with automation is protecting the credentials that grant access to the customer servers. We’ve seen APTs target MSPs and we can’t ignore the possibility that the UK Government can interfere with our equipment or even seize our infrastructure.

We can’t protect the SSH keys with passwords as that requires a human to interact with the session.

So we turned to HSMs, namely the NitroKey HSM 2.

Hardware Security Modules

The primary benefit of a HSM is that the private key material is stored in an area of memory that prevents extraction, this security mechanism is similar to the ‘secure enclave’ in Apple products that annoys the FBI et al.

Using a HSM we can securely generate the private keys that allow Ansible to access servers without the risk that someone could steal the keys from ~/.ssh/.

The HSM is exposed via GPG agent so in theory someone with a shell on the automation hosts could still use the key. To thwart this we have several layered defences;

No direct inbound connectivity
Automation hosts use ansible-pull
Tripwire logging to Splunk
Syslog pushing to Splunk
Signed git commits

Layered Defence

No direct inbound connectivity

Our automation servers are physically separate from the rest of the Ablative Infrastructure and use both physical and host based firewalls to prevent inbound access via normal means.

These servers do allow SSH access but only via an authenticated v3 .onion address.

In the event the servers become unreachable then someone from the team must travel to the location and login locally.

Automation Hosts Use Ansible Pull

ansible-pull inverts the usual paradigm where the management host connects over SSH to targets and runs the Ansible playbook. Instead the host pulls a copy of the playbooks (including the ability to leverage verify-commit (see ‘Signed git commits’ below) to ensure the code is authentic) and executes them locally.

Tripwire logging to Splunk

TripWire is a software suite that monitors a host for any unauthorized changes to the filesystem. If any files are changed (whether by ansible-pull or not) a notification is sent to Splunk which in turn alerts the SOC that something untoward is happening.

Syslog pushing to Splunk

Syslog gathers a lot of the day-to-day activity of the server (daemon lifecycle, crontabs, etc) alongside important events such as a user logging in or a sudo/doas command being issued.

These events are relayed to Splunk which can evaluate them and if neccessary raise an alert to the SOC.

Signed git commits

The automation hosts perform a git pull before they perform their scheduled actions, if any new commits have been added since they last run the cycle through the commits with git verify-commit <hash>.

If any of the commits fail GPG verification the scheduled tasks fails and an alert is raised to the SOC.

Physical Seizure

The last weakness in our model is if the automation servers and their HSMs are seized as part of a Search and Seizure or Equipment Interference Warrant.

All servers utilise the Brass Horn Communications S53 hardware, in the event that the server is moved, a USB device is plugged in (and a variety of other conditions) then the s53 daemon will lock the HSM, destroy any Full Disk Encryption elements and, if necessary, shut the machine down.

We’ve gone to some effort to ensure that we can automate the management of customer servers without risking access by unauthorized parties, if this sort of security is what you’re looking for in a web host then drop by our website or the .onion and grab some shared hosting or a MultiHop VPS.