Automation

Basic methodologies to achieve automation.

Scheduled

Cron and task manager are two globally known task schedulers, and perfect examples of a calendar-based scheduling. Bots can be launched from task schedulers, and its quite appropriate for a number of Bots.

View details »

Event-Driven

Usually for more responsive environments, or graphical user interfaces, not unusual in a file-driven application or a big data system.

View details »

Streaming

This is where veterans can test their metal. Data streams pose numerous challenges to procedural and object-oriented programmers alike. Could we call it "stream-oriented" programming?

View details »

So, how do we really automate things? Well, there's a little layer of theory that we should first discuss.

Every time someone approaches me with automation questions for the first time, I feel compelled to take a couple of hours to size the programmer asking the questions and guide him on a journey through the different concepts.

Automation can look very easy on the surface, because, quite frankly, once you grasp one concept, we think we know it all. We start implementing, programming, we sweat on it for a couple of weeks until we hit this egg&chicken problem, or theorical wall in our concept and are forced to return to the drawing board. I myself have done this, more than once, so I'll spare you some trouble.

Push & Pull

The first concept we have to learn when heading into automation is our Push & Pull theory. Not unlike starboard and portside, they are as important in maintaining our direction, and vision throughout a project.

The act of "pushing" is basically a process that takes care of accessing a remote system and transfering local data to the remote system. (from the local system to a remote system)

The act of "pulling" is a process that takes care of accessing a remote system and reading remote data to a local storage medium. (from a remote system to the local system)

If you're a web programmer, a browser makes a pull (or a poll) when executing a client request against a server. In theory both pull & push will occur over the same TCP connection, but the initiator of the connection is the Client making a request to the Server. The client pulls data from a remote node.

As a web programmer again, when you're updating a web site, you're making a push. You'll transfer files over FTP or SFTP to the server, pushing local data to a remote node.

So, what needs to be understood, is that both pushing and pulling have their specific applications, constraints and innaplicabilities. Only lazy programmers resolve all their issues by only pushing or pulling, and it has dire consequences on the final security posture.

When is an appropriate time to push ?

Ahahaha ! Certainly not when you're copying configuration files. (the why I disdain Chef and some others...) Sure, you can figure out ways to not use the root account remotely, through sudo or doas, but ultimately, you're working around a misconception in the first place, with giganormous efforts.

This becomes quite obvious only when we approach public cryptography in its finest details; when comes the time to generate a private key, it should always be generated on the host that'll be ultimately using it. A private key *should not* be moved between nodes. And its essentially the same with certificates, as there always is a "seed" cryptographic key that must be secured against prying eyes at all costs. Only when the "seed" is properly protected can we rely on its protective effects, anything less is a statistical mistake.

With that explained, just keep an open mind to the push and pull benefits in different situations. @todo: We should attempt to provide a helper table here.

Scheduled tasks

By far the easiest to grasp and play with; we normally schedule things like backups, or report generations.

We can also use the task scheduler (under Unix at least) to repetitively launch a Bot as well, and let the Bot decide if there's a job to do or not. If not, just exit back with a success exit code.

As you can see, these are by far the simplest of bots, and I strongly suggest you always attempt to resolve your automation scheduling through a task scheduler like cron. Beyond that, we're entering specialisation world.

Event-driven

Just like in event-driven programming, which usually revolves around user clicking events, automated tasks could be driven by events, as long as the events and executed tasks share a common framework. For this reason, I personally recommend using a Message Queue with scheduled Bots instead, which allows decoupling frameworks from the signaling system.

Under particular situations, like the Windows User Interface, even event-driven interfaces are managed through a Messaging Queue in the background. Simply proving that Message Queues are as important in responsive systems.

Streaming

Receiving and emitting streams of data involves a new thinking paradigm, and a completely different network architecture at times.

The new problem being introduced with Webstreams, Websockets and HTTP 2.0 in general revolves around the nature of what a stream really is. That is; a constant flow of data, predominantly in one specific direction, which rules out the ability for a system to acquire an "entire" set of data, without resorting to time calculations.

Typically Bots will be involved in setting up and ripping down streaming facilities. Anything beyond that becomes a bit rare in the programming field.

But we identify a number of uses where Bots are necessary to handle stock data streams. And they're particularly useful in this context because typical markets don't function 24hours a day. Well, FIAT markets for now...

Orchestration

The act of composing your technical symfony and insuring that all the elements fit together.

Typically this is where the real challenge begins, and too often fizzes out.

I imagine a lot of books and teachers still start with this concept before delving into automation itself, and I find it a bit of a paradox. The way I see it, basing a framework decision on the orchestration features that a system offers is the perfect way to get locked-in with a vendor, and his very limited view.

Originally orchestration was a matter of having one master node responsible for orchestrating the different roles necessary on a cluster, a farm or a network.

Nowadays, orchestration can also be conducted through a "script", executed on a controling node, which results in instructions being dispatched to slave nodes (see pushed)

For example, Incinga is a pure orchestration solution.

Message Queue

A message queue is basically a FIFO list of items (First In First Out). Although most Message Queues can be configured to use different popping and pushing mechanisms, like when using a priority system.

Message queues are useful to decouple application components, in all directions and localities. Any software that can access the message queue can share with the other softwares reaching it.

Messages queues allows us to organize dependencies, parent-child relationships and order our executions accordingly.

Numerous solutions for Message Queues exist out there; RabbitMQ, Azure Messaging Bus, etc.. RabbitMQ is probably the oldest and most mature of all.

But in our guide, we're proposing building our own message queue, based on a simple SQL table. I'll show you the logic behind the different fields for a starter, and how to extend your own mesasge queue. We'll also delve in the theory behind the messaging and how we can apply the different message retrieval mechanisms.

Our message queue resembles the other ones enormously (mine was programmed in the 90s, a bit before RabbitMQ in fact), the main difference being in the storage engine which differs accross the different engines. But once you know the logic, data structures and API calls, its all the same, very plug&play actually.

Mine was built on the SQL storage concept because of its purpose; hosting automation, which provides a much better fit with invoicing and customer service. It also minimizes implementation complexities by simply requiring an additional database on a probably already existing system, rather than a set of new (redundant) server instances to run a new piece of software.