Wednesday, March 06, 2013

No snowflakes allowed

"Snowflake" is a term I learned from my colleague Jeff Roberts. It is used in the Chef community (maybe in the configuration management community at large as well) to designate a server/node that is 'unique', i.e. not in configuration management control. In a Chef environment, it means that the node in question was never added to Chef and never had chef-client run on it.

We've all been in situations where it seems overkill to go through the effort of automating the setup of a server. Maybe the server has a unique purpose within our infrastructure. Maybe we didn't feel like spending the time to create Chef recipes for that server. Whatever the reasoning, it seemed low-risk at the time.

Well, I am here to tell you there is danger in this way of thinking. Example: we deployed a server in EC2 manually. We installed the Sensu client on it manually and pointed it at our Sensu server. Everything seemed fine. Then one day we updated our Sensu configuration (via Chef) both on the Sensu server and on all the Sensu clients. Of course, the Sensu configuration on our snowflake server never got updated, since chef-client wasn't running on that server. As a result, the Sensu client wasn't checking in properly with the Sensu server, and the snowflake behaved as if it was falling off the map as far as our monitoring system was concerned. We had to manually update Sensu on the snowflake to bring it in sync with our configuration changes.

Basically, the result of having snowflake servers is that they do fall off the map as far as the overall automation of your infrastructure is concerned. They suffer bitrot, and you end up spending lots of time on their care and feeding, thus defeating the purpose of saving the time to automate them in the first place.

This being said, it's hard to be disciplined enough to run chef-client periodically on every single server in your infrastructure. I've never been able to do that before, but we are doing it now, mostly because of the insistence of Jeff. I do see the advantages of this discipline, and I do recommend it to everybody.

No comments:

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...