Chapter 1
Planning it out
Introduction
APIs are normally offered as a library or a module to be used programmatically.
We identify different types of APIs; local, remote, REST, Ajax, Streamed and TCP.
Documentation
API Documentation, examples, starters, discussions and articles
Our framework is currently built on PHP mostly, but the Bots can be scripted in any shape and form. A lot of the code we generate also ties pretty well with shell scripting.
Choosing the right technologies
When designing cryptographically-enhanced systems, there are important details to consider. The purpose of the communication, what data is being transmitted, how long it will be stored and who has the privileges to access it.
So, when choosing, its really a matter of picking the right tech for the job. The technologies I present in this book are open-source, and already present in most environments. (Except maybe Windows… emf.)
And if we look at all this cynically, we could think that a simple SSH server would do the entire job of safeguarding our data. If that server doesn’t get owned, our data is safe. And yes, that is the mother of all fuckups today. Servers do get owned, and it’s just a matter of time. 30 seconds for most unprotected devices connecting to any Internet, and 5 minutes for a typical Windows server, without a firewall. If your servers are sitting on your internal network, still assume they can get owned. If you don’t, they will for sure.
It is important to consider cryptographic tools that work at different levels, protecting more than just communications. A perfect balance can be achieved by selecting technologies that will protect data in transit, data at rest and archived data. Which are the 3 main areas of data-ciphering that really interest us.
Data in Transit
Data that is being transmitted actively, normally over a secure channel, such as SSH, SSL (TLS really), or encrypted against prying eyes if transmitted over a cleartext channel such as emails. Emails being an exceptional issue today given the advent of secure communications between email systems and the explosive growth of cloud email storage.
For that specific detail (the longer term storage of emails in distributed backups and the cloud), any data sent over email should be categorized or considered as “data at rest” really.
Otherwise, in our planning, we’ll consider the different transmission protocols and elect one that includes encryption of its communication channel between hosts. This allows us to easily secure all communications, and double-protect any encrypted data being transmitted in file format.
Data at Rest
Data at rest is data that is normally current, in use, and necessary for different internal processes. Such data normally poses minimal difficulty in protecting using encryption because the applications and services normally force it to follow the current evolutive track of encryption. That is, an enterprise will normally maintain the access interfaces to their latest secure versions, and by doing so, the security requirements on the cryptography will also get updated in a trickle fashion to the data storage elements. This will make sense as you progress through your understanding of storage and cryptography. Point being, any ciphered data at rest should have its keys rotated periodically.
Archived Data
This type of data is usually written to a disk and forgotten about, until a need arises for it.
Tip: first off, this is the worse data backup practice ever. Inspired by the 80’s success for tape backups. Yesterday this method was “useful” to safeguard a system against catastrophic failure, today, a 3 day restoration period is a catastrophic failure in itself. 😊
Seriously, if one considers encrypting backup data on tape, one has to consider storing the associated keys on a different media, in a different location, and maintain those keys for the lifetime of the backup’s usefulness. If using tapes in a secure location, then you might just dispense with the encryption and order a big safe to store your tapes instead. It’s just easier for the next guy who’ll attempt to restore a system when you’re gone. I would normally distrust any backup solution that professes encryption to tape and “no problems with key storage”, as they would either be storing the keys alongside the data on the tape, and/or just plain ignorant.
If storing sensitive data in a long-term fashion (like backups to disks or JBODs for example), but maintaining some sort of digital access to it, then you must consider rotating the used cryptographic keys once a while. The more sensitive or risky, the more often those keys should be rotated.
SSH
SSH comes in different flavours today, but all compatible as it would seem. SSH servers provide a remotely accessible login shell to users bridging the world between local and remote users in the Unix world. (If you still think or use telnet or FTP, please rip them out immediately.)
SSH is a proven methodology for protecting communication channels. SSH has a built-in support for remote procedure calls and is quite easy to use in batch mode. SSH simply doesn’t include any facilities to cipher files directly. So SSH can be thought of as a technology to protect communications only. One important aspect of SSH is that it serves as a basis to grant user access and is normally already integrated with the OS’s user database (except maybe on Windows? Doh.)
SSH’s secret sauce lies in the way it establishes a key exchange mechanism, which cannot be normally exploited, at connection time or during the lifetime of the connection.
PGP
Grandfather of many things, open-source initiatives, public cryptography and probably the blockchain revolution. PGP has never been widely in use, for unknown reasons. Maybe the fact that it pertains more to email communications in the literature, or was made by nerds wanting to be hippies, who knows. (No offense!). Fact is, PGP is quite good at something, and that is public key cryptography. They invented it, and fought for it, and the alternatives, although quite numerous, only copy the original code. PGP’s history makes it an insidious element of most cryptography today, it’s quite an interesting read.
PGP can be used to protect communications between 2 individuals. It can also be used to cipher, and sign documents (against tampering and in some countries, as legal tender proofs), and though it’s key management interfaces are basic, PGP provides one of the best (functioning) mechanisms for key signing, certification authorities and public key sharing.
In my personal experimentations, I’ve found PGP to be useful for long term file storage as well. It can be used to cipher a file using keys that can be used as a master and slave key. Quite useful in case recovery becomes essential when keys are lost.
Managing keys comes at a cost though, and this is where we must detail a plan to Automate Private Key management.
The PGP Windows and Mac OSX clients should be fully compatible with the Unix variants.
PGP for Windows can be found under: https://www.gpg4win.org
And PGP for Unix should be accessible in your favorite repository as gnupg.
Normally we consider there are 2 variants to gnuPG, the v1.x which is more appropriate for automated setups and v2.x which is more appropriate for user GUI systems. The differences are subtle, but the one difference makes it that you cannot use v2 in automated processes, because it depends on user interaction in its key-retrieval process which cannot be deactivated.
Also, the PGP protocol is quite vast, allowing for different trust models, applications and methodologies. In this book we’ll highlight the most common ones, and present potential applications. We’ll definitely build our own PKI infrastructure using gnuPG, because as it is, out of the box, it’s not meant for our direct purpose, as it lacks a Private Key Distribution Mechanism.
SSL (well, TLS)
The tip of the great Internet iceberg. SSL was invented for protecting online credit card transactions initially. The idea being that Internet connections are considered “temporary” in nature, and web servers need to keep serving relatively fast, SSL came in with a plug-in proposal for webservers to protect these client requests. Browsers adopted the technology, and eventually all web servers. In 2016-2017 SSL came under fire, it was proven broken from all sides, and after upgrading from 1.0 to 3.0 in a major hurry, the whole industry switched to TLS (1.0 which got broken quickly, and now most use TLS1.1 or TLS1.2). Most system administrators are familiar with this technology, and we all know it’s only meant to “serve” and “protect” content in transit. No points in arguing about it’s abilities to protect files, too many glitches litter the path today.
After the late exploit disaster, OpenSSL split into different coding branches, LibreSSL, and others, in order to combat the programming fatigue to which it had been succumbing to.
In short, I personally don’t trust it further than a web serving tool. Even then, I think it’s going the way of WPA. Some exploit paper writers are hard at work on that one. And the industry is barely ready for TLS 1.3 or 2.0. So a risky technology for reliance upon at this point in time.
Sodium / NaCL
A new kid is on the block, oh wait, it’s the same old kid without clothes on. (pun intended.)
Sodium is basically a library of assembly bits of code meant to “normalize” and increase the algorithmic speed of ciphering and deciphering data. Pfew, a mouthful.
Sodium is gaining quick acceptance in the development world, and its tentacles are branching out in different languages through nifty APIs. Sodium is quite tempting to use, it provides ciphering methods on all levels for programmers to hang themselves with. Its functionality covers short term ciphering and long term ciphering.
I was all for Sodium myself, back in 2016, but recently have come to doubt and even regret that enthusiasm. My reasons are biased, perhaps, but my experiences and readings lead me to a simple conclusion. Sodium was made by a small number of professionals, reproducing known algorithms (see copying & pasting) in assembly code for quick execution. The problem with this? Nobody reads or writes assembly code these days. (By that I mean the average programmer doesn’t come with Assembly in his tool book anymore).
So, in terms of reviewing capabilities, only Assembly programmers can review that code, and ask any Assembly programmer, they got much better things to do. (Analyzing Assembly code is just the bomb!)
Therefore, and I’m not the only one with that mindset, I would put Sodium on the ice until someone in the industry, with solid credentials, can say they reviewed it and approve of it. Meanwhile, NaCL, Sodium and Libsodium are NOT FIPS-140 certified, so if your enterprise is looking into certifications, keep looking.
And my personal bias has to do with m. Bernstein, who so often shows up rehashing existing technologies for his benefits. There, I said it. The idea was good, but nobody sold me on the intentions. Therefore, I personally doubt the concept was properly implemented.
Alas, for now in 2019, Sodium seems to be the best and practically only choice if we want compatibility across different platforms and accessibility in our PHP integrations.
One thing of note though; Sodium has NOT been accepted by the PHP community, as it was somehow forced down their throats. Therefore, in order to get libsodium included in the PHP environment, under Unix at least, one has to compile PHP from his local ports tree and include the libsodium libraries at configure time. See the Appendix “compiling PHP under OpenBSD with Libsodium”
IPSEC
IP Security, or IP Tunnelling is a technology more commonly known as a VPN. (Tiny exception here, some manufacturers do sell SSL VPNs, but that’s not the same thing.)
IPSec works at the network layer really, encapsulating packet content with encryption and signature material, all the while managing a more-permanent connection. (This is the difference.) The method it uses to achieve this is quite ingenious really, IPSec will renew the keying material once a while to refresh the cryptographic material on both sides of the connection. This effectively limits the key exposure in time. Using PFS (Perfect Forward Secrecy) is another means to enhance the security factor which has made its mark of late.
So, IPSec can generally be used to protect communication channels, that stay open for undetermined periods of time. It’s probably the main reason Microsoft elected to implement IPSec at the file sharing level on its servers. IPSec is available on all OSes, as been proven compatible in many different scenarios (Windows to BSD, Linux to BSD, Windows to Linux) and its become a de-facto standard with it’s pluggable architecture which allows one to swap algorithms and parameters to satisfy specific requirements.
Most would think this improves security scores, but I dare say think again. Deploying IPSec at the service level internally effectively blocks system administrators from being able to identify network traffic. It’s a double-edge sword that wasn’t really thought-through. I believe it can pose a greater security risk on paper. Perhaps I’ll dive into those details in a separate chapter or book, probably entitled Applying the NIST security scoring system in order to put your executives to sleep.
Certificates, a quick dip into
Certificates come as a wrapper to keys in general. They allow the transmission, and confirmation of authenticity through public signatures and a public key service to validate said certificate contents.
A certificate’s content is made up of:
- Certificate header details (Organisation name, Issuer, signatures, Recipient and it’s purpose(s))
- Public Key of the owner
- And optionally a list of signatures that can be linked to other online public certificates testifying to the authenticity of the current certificate.
Certificates are not proper to one technology, they are used by all. But they all differ in their header contents, purposes, formats and signature options. To make matters worse, they’re not interoperable, and each has its certificate signing authority, server and mechanisms.
The good news is that most of them can be implemented as-is at the user level, or integrated at the programming/API level, as we’ll detail in this book.
Pick your sauce, the specifics of algorithms
Once a or many technologies have been identified to build the base of your system, the next challenge lies in picking the “right” ciphering and hashing algorithms.
This is an ever-evolving field of research and development generally, and so a system must include facilities for easily switching algorithms and have administrators keen on exploit developments surrounding their choice algorithms (and their alternatives, indirectly.)
Commonly known signature algorithms;
The broken ones; MD5, SHA1. Normally used for signing material, today we prefer SHA256, SHA512 and I believe some people are doing SHA728+ even. Some software still rely on SHA1 for code-signing, something that should be phased out of the enterprise at all costs.
The working, recommended ones; SHA256, or SHA512. My personal doubts are not founded yet, but I personally believe that with the explosion of crypto-currency mining hardware the days of SHA256 and even 512 are numbered. When it’s possible to compute 14Tera-Hashes per minute on low-cost hardware, I think it should become a serious concern. Currently the only impediment to this exploit development is the complication involved in reducing an entire file to its digest format on ASIC hardware. They’re more meant for little strings of characters.
Nevertheless, this only underlines the necessity for a pluggable engine where algorithms can be switched on a dime.
Blowfish was recently made famous, it’s useful for password hashing with a configurable number of rounds to make the brute-forcing more tenuous. Sodium also offers this facility using a different hashing method, and so do a number of programming languages.
Well-known ciphering algorithms;
Rindjael went quiet in the past years, I don’t remember if it had been exploited or fused into another name.
And, the currently famous one, AES in different bit flavors, such as AES-128, AES-256 and AES-512, etc.
AES is currently the NIST standard, being relatively efficient and implemented at the hardware level on a number of devices and lately (as of Kaby Lake) on Intel processors as well. This allows to increase the throughput on VPNs using AES for example. The AES standard has come under intense scrutiny by security researchers participating in the NIST competition, an issue with predictable seed material that can be used to “backdoor” the algorithm itself is suspected. The exact detail involves the use of a common elliptical function which is too predictable. Bernstein et all have suggested the use of a different function, commonly called the ECDSA ellipse. If you’re a mathematician, this field should interest you, myself, I draw the line at trying to explain things beyond this level. Maybe for another book meant for advanced programmers. The only thing to know (as of Feb 2018), when available, you should use ECDSA based algorithms, it’s usually available under SSH.
VPNs usually rely on AES for its faster throughput. Older VPNs that can only use older protocols should be quickly discarded. Examples are DES, 3DES and MD5 as being unreliable (easily exploitable) algorithms.
My current recommendations are to use AES-256 in CBR mode on most VPNs. ECSDA should be favored on SSH services, and I’m still investigating how PGP handles its algorithms, in details, for ciphering.
Our proposed cryptographic picks
For the purpose of transferring files and remote accessing file spaces, we’ll opt for an SSH/SFTP approach.
For the purposes of storing encrypting files, we’ll focus on PGP.
This leaves us with 2 sets of keys to manage; SSH keys (both server public keys and user keysets), and PGP keys (external parties normally publish their public keys on the PGP network, and internally we’ll consider the deployment of our own public key infrastructure, along with a private key infrastructure for automating our Bots).
Yes, so we’ll be dealing with a lot of keys, keysets and certification.
In order to minimize the brainwork, and perhaps optimize our usage of the above technologies, we’re currently studying in our development environment using SSH public keys distributed through DNS for node authentication, and User public keys through DNS as well for SSH authentication. PGP keys can also be used with the SSH services, another case for our R&D in the lab currently.
In our current framework, we’ll use PGP on 2 fronts; with our trading partners (clients or providers) which communicate with us using PGP will require access to our own public keys. (We currently delimit these keys as “Reception-Expedition” keys, for the purpose of exchanging with our partners), and internally, we’ll manage a different keyset hierarchy for each customer. This allows us to encrypt data that is proprietary to each customer with a key that is dedicated to this customer. Our internal Bots will then retrieve this key material to work on the corresponding data, and we’ll focus on securing this key-retrieval system against exploiters.
Customers and Partners integrate very well at the SSH and PGP levels for file sending and reception, external parties looking into automation will marvel at your setup using SSH or SFTP, and internally, everything remains the same, using the same reliable technologies which are much more easily security audited.
We apply an additional cryptographic layer at the application level, within our general framework that is, in order to protect keying material stored in the databases. This assures us that if a database server gets hacked, the keys remain on a different server (ie; the service servers). This has a little side effect though, it obligates us to encapsulate a number of private key retrieval processes behind REST-style APIs so that the database keying material doesn’t get distributed across all servers, thus maintaining control over cryptographic access and better security. If re-using our proposed PHP class objects, the facilities to handle this ciphering of database content is already programmed. One only needs to supply a new keyset in the different systems to get the whole framework going, happily ciphering data.
Because we can also get extremely paranoid, we also establish an application-level keyset for each Process/Nodes registered in our automation system. When sending signals to the target Process/Systems, the originating process should cipher the message using the target’s public key. This assures us that messages don’t get processed by the wrong node, guarantees that the data hasn’t been modified, and that it originates from a valid node within our system. Our distributed APIs take care of this as well.
As of 2019, Microsoft Windows now includes an SSH client. We haven't reviewed its functionality yet, we might eventually include it in our solution. For the time being we rely on PuTTY under Windows, which has proven very functional for the past 20 years or so.