Tuesday, December 21, 2010

Using libcloud to manage instances across multiple cloud providers

More and more organizations are moving to ‘the cloud’ these days. In most cases, using ‘the cloud’ means buying compute and storage capacity from a public cloud vendor such as Amazon, Rackspace, GoGrid, Linode, etc. I believe that the next step in cloud usage will be deploying instances across multiple cloud providers, mainly for high availability, but also for performance reasons (for example if a specific provider has a presence in a geographical region closer to your user base).

All cloud vendors offer APIs for accessing their services -- if they don’t, they’re not a genuine cloud vendor in my book at least. The onus is on you as a system administrator to learn how to use these APIs, which can vary wildly from one provider to another. Enter libcloud, a Python-based package that offers a unified interface to various cloud provider APIs. The list of supported vendors is impressive, and more are added all the time. Libcloud was started by Cloudkick but has since migrated to the Apache Foundation as an Incubator Project.

One thing to note is that libcloud goes for breadth at the expense of depth, in that it only supports a subset of the available provider APIs -- things such as creating, rebooting, destroying an instance, and listing all instances. If you need to go in-depth with a given provider’s API, you need to use other libraries that cover all or at least a large portion of the functionality exposed by the API. Examples of such libraries are boto for Amazon EC2 and python-cloudservers for Rackspace.

Introducing libcloud

The current stable version on libcloud is 0.4.0. You can install it from PyPI via

# easy_install apache-libcloud

The main concepts of libcloud are providers, drivers, images, sizes and locations.

A provider is a cloud vendor such as Amazon EC2 and Rackspace. Note that currently each EC2 region (US East, US West, EU West, Asia-Pacific Southeast) is exposed as a different provider, although they may be unified in the future.

The common operations supported by libcloud are exposed for each provider through a driver. If you want to add another provider, you need to create a new driver and implement the interface common to all providers (in the Python code, this is done by subclassing a base NodeDriver class and overriding/adding methods appropriately, according to the specific needs of the provider).

Images are provider-dependent, and generally represent the OS flavors available for deployment for a given provider. In EC2-speak, they are equivalent to an AMI.

Sizes are provider-dependent, and represent the amount of compute, storage and network capacity that a given instance will use when deployed. The more capacity, the more you pay and the happier the provider.

Locations correspond to geographical data center locations available for a given provider; however, they are not very well represented in libcloud. For example, in the case of Amazon EC2, they currently map to EC2 regions rather than EC2 availability zones. However, this will change in the near future (as I will describe below, proper EC2 availability zone management is being implemented). As another example, Rackspace is represented in libcloud as a single location, listed currently as DFW1; however, your instances will get deployed at a data center determined at your Rackspace account creation time (thanks to Paul Querna for clarifying this aspect).

Managing instances with libcloud

Getting a connection to a provider via a driver

All the interactions with a given cloud provider happen in libcloud across a connection obtained via the driver for that provider. Here is the canonical code snippet for that, taking EC2 as an example:


from libcloud.types import Provider
from libcloud.providers import get_driver
EC2_ACCESS_ID = 'MY ACCESS ID'
EC2_SECRET_KEY = 'MY SECRET KEY'
EC2Driver = get_driver(Provider.EC2)
conn = EC2Driver(EC2_ACCESS_ID, EC2_SECRET_KEY)


For Rackspace, the code looks like this:


USER = 'MyUser'
API_KEY = 'MyApiKey'
Driver = get_driver(Provider.RACKSPACE)
conn = Driver(USER, API_KEY)


Getting a list of images available for a provider

Once you get a connection, you can call a variety of informational methods on that connection, for example list_images, which returns a list of NodeImage objects. Be prepared for this call to take a while, especially in Amazon EC2, which in the US East region returns no less than 6,932 images currently. Here is a code snippet that prints the number of available images, and the first 5 images returned in the list:


EC2Driver = get_driver(Provider.EC2)
conn = EC2Driver(EC2_ACCESS_ID, EC2_SECRET_KEY)
images = conn.list_images()
print len(images)
print images[:5]
6982
[<NodeImage: id=aki-00806369, name=karmic-kernel-zul/ubuntu-kernel-2.6.31-300-ec2-i386-20091001-test-04.manifest.xml, driver=Amazon EC2 (us-east-1) ...>, <NodeImage: id=aki-00896a69, name=karmic-kernel-zul/ubuntu-kernel-2.6.31-300-ec2-i386-20091002-test-04.manifest.xml, driver=Amazon EC2 (us-east-1) ...>, <NodeImage: id=aki-008b6869, name=redhat-cloud/RHEL-5-Server/5.4/x86_64/kernels/kernel-2.6.18-164.x86_64.manifest.xml, driver=Amazon EC2 (us-east-1) ...>, <NodeImage: id=aki-00f41769, name=karmic-kernel-zul/ubuntu-kernel-2.6.31-301-ec2-i386-20091012-test-06.manifest.xml, driver=Amazon EC2 (us-east-1) ...>, <NodeImage: id=aki-010be668, name=ubuntu-kernels-milestone-us/ubuntu-lucid-i386-linux-image-2.6.32-301-ec2-v-2.6.32-301.4-kernel.img.manifest.xml, driver=Amazon EC2 (us-east-1) ...>]


Here is the output of same code running against the Rackspace driver:


23
[<NodeImage: id=58, name=Windows Server 2008 R2 x64 - MSSQL2K8R2, driver=Rackspace ...>, <NodeImage: id=71, name=Fedora 14, driver=Rackspace ...>, <NodeImage: id=29, name=Windows Server 2003 R2 SP2 x86, driver=Rackspace ...>, <NodeImage: id=40, name=Oracle EL Server Release 5 Update 4, driver=Rackspace ...>, <NodeImage: id=23, name=Windows Server 2003 R2 SP2 x64, driver=Rackspace ...>]


Note that a NodeImage object for a given provider may have provider-specific information stored in most cases in a variable called ‘extra’. It pays to inspect the NodeImage objects by printing their __dict__ member variable. Here is an example for EC2:


print images[0].__dict__
{'extra': {}, 'driver': <libcloud.drivers.ec2.ec2nodedriver 0xb7eebfec="" at="" object="">, 'id': 'aki-00806369', 'name': 'karmic-kernel-zul/ubuntu-kernel-2.6.31-300-ec2-i386-20091001-test-04.manifest.xml'}


In this case, the NodeImage object has an id, a name and a driver, with no ‘extra’ information.

Same code running against Rackspace, with similar information being returned:


print images[0].__dict__
{'extra': {'serverId': None}, 'driver': <libcloud.drivers.rackspace.rackspacenodedriver 0x88b506c="" at="" object="">, 'id': '4', 'name': 'Debian 5.0 (lenny)'}


Getting a list of sizes available for a provider

When you call list_sizes on a connection to a provider, you retrieve a list of NodeSize objects representing the available sizes for that provider.

Amazon EC2 example:


EC2Driver = get_driver(Provider.EC2)
conn = EC2Driver(EC2_ACCESS_ID, EC2_SECRET_KEY)
sizes = conn.list_sizes()
print len(sizes)
print sizes[:5]
print sizes[0].__dict__
9
[<NodeSize: (us-east-1)="" ...="" bandwidth="None" disk="850" driver="Amazon" ec2="" id="m1.large," instance,="" name="Large" price=".38" ram="7680">, <NodeSize: (us-east-1)="" ...="" bandwidth="None" disk="1690" driver="Amazon" ec2="" extra="" id="c1.xlarge," instance,="" large="" name="High-CPU" price=".76" ram="7680">, <NodeSize: (us-east-1)="" ...="" bandwidth="None" disk="160" driver="Amazon" ec2="" id="m1.small," instance,="" name="Small" price=".095" ram="1740">, <NodeSize: (us-east-1)="" ...="" bandwidth="None" disk="350" driver="Amazon" ec2="" id="c1.medium," instance,="" medium="" name="High-CPU" price=".19" ram="1740">, <NodeSize: (us-east-1)="" ...="" bandwidth="None" disk="1690" driver="Amazon" ec2="" id="m1.xlarge," instance,="" large="" name="Extra" price=".76" ram="15360">]
{'name': 'Large Instance', 'price': '.38', 'ram': 7680, 'driver': <libcloud.drivers.ec2.ec2nodedriver 0xb7f49fec="" at="" object="">, 'bandwidth': None, 'disk': 850, 'id': 'm1.large'}


Same code running against Rackspace:


7
[<NodeSize: ...="" bandwidth="None" disk="10" driver="Rackspace" id="1," name="256" price=".015" ram="256" server,="">, <NodeSize: ...="" bandwidth="None" disk="20" driver="Rackspace" id="2," name="512" price=".030" ram="512" server,="">, <NodeSize: ...="" bandwidth="None" disk="40" driver="Rackspace" id="3," name="1GB" price=".060" ram="1024" server,="">, <NodeSize: ...="" bandwidth="None" disk="80" driver="Rackspace" id="4," name="2GB" price=".120" ram="2048" server,="">, <NodeSize: ...="" bandwidth="None" disk="160" driver="Rackspace" id="5," name="4GB" price=".240" ram="4096" server,="">]
{'name': '256 server', 'price': '.015', 'ram': 256, 'driver': <libcloud.drivers.rackspace.rackspacenodedriver 0x841506c="" at="" object="">, 'bandwidth': None, 'disk': 10, 'id': '1'}


Getting a list of locations available for a provider

As I mentioned before, locations are somewhat ambiguous currently in libcloud.

For example, when you call list_locations on a connection to the EC2 provider (which represents the EC2 US East region), you get information about the region and not about the availability zones (AZs) included in that region:


EC2Driver = get_driver(Provider.EC2)
conn = EC2Driver(EC2_ACCESS_ID, EC2_SECRET_KEY)
print conn.list_locations()
[<NodeLocation: id=0, name=Amazon US N. Virginia, country=US, driver=Amazon EC2 (us-east-1)>]


However, there is a patch sent by Tomaž Muraus to the libcloud mailing list which adds support for EC2 availability zones. For example, the US East region has 4 AZs: us-east-1a, us-east-1b, us-east-1c, us-east-1d. These AZs should be represented by libcloud locations, and indeed the code with the patch applied shows just that:


print conn.list_locations()
[<EC2NodeLocation: id=0, name=Amazon US N. Virginia, country=US, availability_zone=us-east-1a driver=Amazon EC2 (us-east-1)>, <EC2NodeLocation: id=1, name=Amazon US N. Virginia, country=US, availability_zone=us-east-1b driver=Amazon EC2 (us-east-1)>, <EC2NodeLocation: id=2, name=Amazon US N. Virginia, country=US, availability_zone=us-east-1c driver=Amazon EC2 (us-east-1)>, <EC2NodeLocation: id=3, name=Amazon US N. Virginia, country=US, availability_zone=us-east-1d driver=Amazon EC2 (us-east-1)>]


Hopefully the patch will make it soon into the libcloud github repository, and then into the next libcloud release.

(Update 02/24/11The patch did make it in the latest libcloud release which is 0.4.2 at this time)

If you run list_locations on a Rackspace connection, you get back DFW1, even though your instances may actually get deployed at a different data center. Hopefully this too will be fixed soon in libcloud:


Driver = get_driver(Provider.RACKSPACE)
conn = Driver(USER, API_KEY)
print conn.list_locations()
[<NodeLocation: id=0, name=Rackspace DFW1, country=US, driver=Rackspace>]


Launching an instance

The API call for launching an instance with libcloud is create_node. It has 3 required parameters: a name for your new instance, a NodeImage and a NodeSize. You can also specify a NodeLocation (if you don’t, the default location for that provider will be used).

EC2 node creation example

A given provider driver may accept other parameters to the create_node call. For example, EC2 accepts an ex_keyname argument for specifying the EC2 key you want to use when creating the instance.

Note that to create a node, you have to know what image and what size you want to use for that node. Here can come in handy the code snippets I showed above for retrieving images and sizes available for a given provider. You can either retrieve the full list and iterate through the list until you find your desired image and size (either by name or by id), or you can construct NodeImage and NodeSize objects from scratch, based on the desired id.

Example of a NodeImage object for EC2 corresponding to a specific AMI:


image = NodeImage(id="ami-014da868", name="", driver="")


Example of a NodeSize object for EC2 corresponding to an m1.small instance size:


size = NodeSize(id="m1.small", name="", ram=None, disk=None, bandwidth=None, price=None, driver="")


Note that in both examples the only parameter that need to be set is the id, but all the other parameters need to be present in the call, even if they are set to None or the empty string.

In the case of EC2, for the instance to be actually usable via ssh, you also need to pass the ex_keyname parameter and set it to a keypair name that exists in your EC2 account for that region. Libcloud provides a way to create or import a keypair programmatically. Here is a code snippet that creates a keypair via the ex_create_keypair call (specific to the libcloud EC2 driver), then saves the private key in a file in /root/.ssh on the machine running the code:


keyname = sys.argv[1]
resp = conn.ex_create_keypair(name=keyname)
key_material = resp.get('keyMaterial')
if not key_material:
sys.exit(1)
private_key = '/root/.ssh/%s.pem' % keyname
f = open(private_key, 'w')
f.write(key_material + '\n')
f.close()
os.chmod(private_key, 0600)


You can also pass the name of an EC2 security group to create_node via the ex_securitygroup parameter. Libcloud also allows you to create security groups programmatically by means of the ex_create_security_group method specific to the libcloud EC2 driver.

Now, armed with the NodeImage and NodeSize objects constructed above, as well as the keypair name, we can launch an instance in EC2:


node = conn.create_node(name='test1', image=image, size=size, ex_keyname=keyname)


Note that we didn’t specify any location, so we have no control over the availability zone where the instance will be created. With Tomaž’s patch we can actually get a location corresponding to our desired availability zone, then launch the instance in that zone. Here is an example for us-east-1b:


locations = conn.list_locations()
for location in locations:
if location.availability_zone.name == 'us-east-1b':
break
node = conn.create_node(name='tst', image=image, size=size, location=location, ex_keyname=keyname)


Once the node is created, you can call the list_nodes method on the connection object and inspect the current status of the node, along with other information about that node. In EC2, a new instance is initially shown with a status of ‘pending’. Once the status changes to ‘running’, you can ssh into that instance using the private key created above.

Printing node.__dict__ for a newly created instance shows it with ‘pending’ status:


{'name': 'i-f692ae9b', 'extra': {'status': 'pending', 'productcode': [], 'groups': None, 'instanceId': 'i-f692ae9b', 'dns_name': '', 'launchdatetime': '2010-12-14T20:25:22.000Z', 'imageId': 'ami-014da868', 'kernelid': None, 'keyname': 'k1', 'availability': 'us-east-1d', 'launchindex': '0', 'ramdiskid': None, 'private_dns': '', 'instancetype': 'm1.small'}, 'driver': <libcloud.drivers.ec2.ec2nodedriver 0x9e088ec="" at="" object="">, 'public_ip': [''], 'state': 3, 'private_ip': [''], 'id': 'i-f692ae9b', 'uuid': '76fcd974aab6f50092e5a637d6edbac140d7542c'}


Printing node.__dict__ a few minutes after the instance was launched shows the instance with ‘running’ status:


{'name': 'i-f692ae9b', 'extra': {'status': 'running', 'productcode': [], 'groups': ['default'], 'instanceId': 'i-f692ae9b', 'dns_name': 'ec2-184-72-92-114.compute-1.amazonaws.com', 'launchdatetime': '2010-12-14T20:25:22.000Z', 'imageId': 'ami-014da868', 'kernelid': None, 'keyname': 'k1', 'availability': 'us-east-1d', 'launchindex': '0', 'ramdiskid': None, 'private_dns': 'domU-12-31-39-04-65-11.compute-1.internal', 'instancetype': 'm1.small'}, 'driver': <libcloud.drivers.ec2.ec2nodedriver 0x93f42cc="" at="" object="">, 'public_ip': ['ec2-184-72-92-114.compute-1.amazonaws.com'], 'state': 0, 'private_ip': ['domU-12-31-39-04-65-11.compute-1.internal'], 'id': 'i-f692ae9b', 'uuid': '76fcd974aab6f50092e5a637d6edbac140d7542c'}


Note also that the ‘extra’ member variable of the node object shows a wealth of information specific to EC2 -- things such as security group, AMI id, kernel id, availability zone, private and public DNS names, etc. Another interesting thing to note is that the name member variable of the node object is now set to the EC2 instance id, thus guaranteeing uniqueness of names across EC2 node objects.

At this point (assuming the machine where you run the libcloud code is allowed ssh access into the default EC2 security group) you should be able to ssh into the newly created instance using the private key corresponding to the keypair you used to create the instance. In my case, I used the k1.pem private file created via ex_create_keypair and I ssh-ed into the private IP address of the new instance, because I was already on an EC2 instance in the same availability zone:

# ssh -i ~/.ssh/k1.pem domU-12-31-39-04-65-11.compute-1.internal


Rackspace node creation example

Here is another example of calling node_create, this time using Rackspace as the provider. Before I ran this code, I already called list_images and list_sizes on the Rackspace connection object, so I know that I want the NodeImage with id 71 (which happens to be Fedora 14) and the NodeSize with id 1 (the smallest one). The code snippet below will create the node using the image and the size I specify, with a name that I also specify (this name needs to be different for each call of create_node):


Driver = get_driver(Provider.RACKSPACE)
conn = Driver(USER, API_KEY)
images = conn.list_images()
for image in images:
if image.id == '71':
break
sizes = conn.list_sizes()
for size in sizes:
if size.id == '1':
break
node = conn.create_node(name='testrackspace', image=image, size=size)
print node.__dict__


The code prints out:


{'name': 'testrackspace', 'extra': {'metadata': {}, 'password': 'testrackspaceO1jk6O5jV', 'flavorId': '1', 'hostId': '9bff080afbd3bec3ca140048311049f9', 'imageId': '71'}, 'driver': <libcloud.drivers.rackspace.rackspacenodedriver 0x877c3ec="" at="" object="">, 'public_ip': ['184.106.187.226'], 'state': 3, 'private_ip': ['10.180.67.242'], 'id': '497741', 'uuid': '1fbf7c3fde339af9fa901af6bf0b73d4d10472bb'}


Note that the name variable of the node object was set to the name we specified in the create_node call. You don’t log in with a key (at least initially) to a Rackspace node, but instead you’re given a password you can use to log in as root to the public IP that is also returned in the node information:

ssh root@184.106.187.226root@184.106.187.226's password:[root@testrackspace ~]#


Rebooting and destroying instances

Once you have a list of nodes in a given provider, it’s easy to iterate through the list and choose a given node based on its unique name -- which as we’ve seen is the instance id for EC2 and the hostname for Rackspace. Once you identify a node, you can call destroy_node or reboot_node on the connection object to terminate or reboot that node.

Here is a code snippet that performs a destroy_node operation for an EC2 instance with a specific instance id:


EC2Driver = get_driver(Provider.EC2)
conn = EC2Driver(EC2_ACCESS_ID, EC2_SECRET_KEY)
nodes = conn.list_nodes()
for node in nodes:
if node.name == 'i-66724d0b':
conn.destroy_node(node)


Here is another code snippet that performs a reboot_node operation for a Rackspace node with a specific hostname:


Driver = get_driver(Provider.RACKSPACE)
conn = Driver(USER, API_KEY)
nodes = conn.list_nodes()
for node in nodes:
if node.name == 'testrackspace':
conn.reboot_node(node)


The Overmind project

I would be remiss if I didn’t mention a new but very promising project started by Miquel Torres: Overmind. The goal of Overmind is to be a complete server provisioning and configuration management system. For the server provisioning portion, Overmind uses libcloud, while also offering a Django-based Web interface for managing providers and nodes. EC2 and Rackspace are supported currently, but it should be easy to add new providers. If you are interested in trying out Overmind and contributing code or tests, please send a message to the overmind-dev mailing list. Next versions of Overmind aim to add configuration management capabilities using Opscode Chef.

Further reading

Friday, December 10, 2010

A Fabric script for striping EBS volumes

Here's a short Fabric script which might be useful to people who need to stripe EBS volumes in Amazon EC2. Striping is recommended if you want to improve the I/O of your EBS-based volumes. However, striping won't help if one of the member EBS volumes goes AWOL or suffers performance issues. In any case, here's the Fabric script:

import commands
from fabric.api import *

# Globals

env.project='EBSSTRIPING'
env.user = 'myuser'

DEVICES = [
    "/dev/sdd",
    "/dev/sde",
    "/dev/sdf",
    "/dev/sdg",
]

VOL_SIZE = 400 # GB

# Tasks

def install():
    install_packages()
    create_raid0()
    create_lvm()
    mkfs_mount_lvm()

def install_packages():
    run('DEBIAN_FRONTEND=noninteractive apt-get -y install mdadm')
    run('apt-get -y install lvm2')
    run('modprobe dm-mod')
    
def create_raid0():
    cmd = 'mdadm --create /dev/md0 --level=0 --chunk=256 --raid-devices=4 '
    for device in DEVICES:
        cmd += '%s ' % device
    run(cmd)
    run('blockdev --setra 65536 /dev/md0')

def create_lvm():
    run('pvcreate /dev/md0')
    run('vgcreate vgm0 /dev/md0')
    run('lvcreate --name lvm0 --size %dG vgm0' % VOL_SIZE)

def mkfs_mount_lvm():
    run('mkfs.xfs /dev/vgm0/lvm0')
    run('mkdir -p /mnt/lvm0')
    run('echo "/dev/vgm0/lvm0 /mnt/lvm0 xfs defaults 0 0" >> /etc/fstab')
    run('mount /mnt/lvm0')

A few things to note:

  • I assume that you already created and attached 4 EBS volumes to your instance with device names /dev/sdd through /dev/sdg; if your device names or volume count are different, modify the DEVICES list appropriately
  • The size of your target RAID0 volume is set in the VOL_SIZE variable
  • the helper functions are pretty self-explanatory: 
    1. we use mdadm to create a RAID0 device called /dev/md0; we also set the block size to 64 KB via the blockdev call
    2. we create a physical LVM volume on /dev/md0
    3. we create a volume group called vgm0 on /dev/md0
    4. we create a logical LVM volume called lvm0 of size VOL_SIZE, inside the vgm0 group
    5. we format the logical volume as XFS, then we mount it and also modify /etc/fstab
That's it. Hopefully it will be useful to somebody out there.

Tuesday, November 30, 2010

Working with Chef attributes

It took me a while to really get how to use Chef attributes. It's fairly easy to understand what they are and where they are referenced in recipes, but it's not so clear from the documentation how and where to override them. Here's a quick note to clarify this.

A Chef attribute can be seen as a variable that:

1) gets initialized to a default value in cookbooks/mycookbook/attributes/default.rb

Examples:

default[:mycookbook][:swapfilesize] = '10485760'
default[:mycookbook][:tornado_version] = '1.1'
default[:mycookbook][:haproxy_version] = '1.4.8'
default[:mycookbook][:nginx_version] = '0.8.20'

2) gets used in cookbook recipes such as cookbooks.mycookbook/recipes/default.rb or any other myrecipefile.rb in the recipes directory; the syntax for using the attribute's value is of the form #{node[:mycookbook][:attribute_name]}

Example of using the haproxy_version attribute in a recipe called haproxy.rb:

# install haproxy from source
haproxy = "haproxy-#{node[:mycookbook][:haproxy_version]}"
haproxy_pkg = "#{haproxy}.tar.gz"

downloads = [
    "#{haproxy_pkg}",
]

downloads.each do |file|
    remote_file "/tmp/#{file}" do
        source "http://myserver.com/download/haproxy/#{file}"
    end
end

....etc.

Example of using the swapfilesize attribute in the default recipe default.rb:

# create swap file if it doesn't exist
SWAPFILE=/data/swapfile1

if [ -e "$SWAPFILE" ]
  then
   exit 0
fi
dd if=/dev/zero of=$SWAPFILE bs=1024 count=#{node[:mycookbook][:swapfilesize]}
mkswap $SWAPFILE
swapon $SWAPFILE
echo "$SWAPFILE swap swap defaults 0 0" >> /etc/fstab


3) can be overridden at either the role or the node level (and some other more obscure levels that I haven't used in practice)

I prefer to override attributes at the role level, because I want those overridden values to apply to all nodes pertaining to that role. 

For example, I have a role called appserver, which is defined in a file called appserver.rb in the roles directory:

# cat appserver.rb 
name "appserver"
description "Installs required packages and applications for an app server"
run_list "recipe[mycookbook::appserver]", "recipe[mycookbook::haproxy]", "recipe[memcached]"
override_attributes "mycookbook" => { "swapfilesize" => "4194304" }, "memcached" => { "memory" => "4096" }

Here I override the default swapfilesize value (an attribute from the mycookbook cookbook) of 10 GB and set it to 4 GB. I also override the default memcached memory value (an attribute from the Opscode memcached cookbook) of 128 MB and set it
to 4 GB.

If however you want to override some attributes at the node level, you could do this in the chef.json file on the node if you wanted for example to set the swap size to 2 GB:

"run_list": [ "role[base]", "role[appserver]" ]
"mycookbook": { "swapfilesize": "2097152"}

Anothe question is when should you use Chef attributes? My own observation is that wherever I use hardcoded values in my recipes, it's almost always better to use an attribute instead, and set the default value of the attribute to that hardcoded value, which can be then overridden as needed.

Tuesday, November 16, 2010

How to whip your infrastructure into shape

It's easy, just follow these steps:

Step 0. If you're fortunate enough to participate in the design of your infrastructure (as opposed to being thrown at the deep end and having to maintain some 'legacy' one), then try to aim for horizontal scalability. It's easier to scale out than to scale up, and failures in this mode will hopefully impact a smaller percentage of your users.

Step 1. Configure a good monitoring and alerting system

This is the single most important thing you need to do for your infrastructure. It's also a great way to learn a new infrastructure that you need to maintain.

I talked about different types of monitoring in another blog post. My preferred approach is to have 2 monitoring systems in place:
  • an internal monitoring system which I use to check the health of individual servers/devices
  • an external monitoring system used to check the behavior of the application/web site as a regular user would.

My preferred internal monitoring/alerting system is Nagios, but tools like Zabbix, OpenNMS, Zenoss, Monit etc. would definitely also do the job. I like Nagios because there is a wide variety of plugins already available (such as the extremely useful check_mysql_health plugin) and also because it's very easy to write custom plugin for your specific application needs. It's also relatively easy to generate Nagios configuration files automatically.

For external monitoring I use a combination of Pingdom and Akamai alerts. Pingdom runs checks against certain URLs within our application, whereas Akamai alerts us whenever the percentage of HTTP error codes returned by our application is greater than a certain threshold.

I'll talk more about correlating internal and external alerts below.

Step 2. Configure a good resource graphing system

This is the second most important thing you need to do for your infrastructure. It gives you visibility into how your system resources are used. If you don't have this visibility, it's very hard to do proper capacity planning. It's also hard to correlate monitoring alerts with resource limits you might have reached.

I use both Munin and Ganglia for resource graphing. I like the graphs that Munin produces and also some of the plugins that are available (such as the munin-mysql plugin), and I also like Ganglia's nice graph aggregation feature, which allows me to watch the same system resource across a cluster of nodes. Munin has this feature too, but Ganglia was designed from the get-go to work on clusters of machines.

Step 3. Dashboards, dashboards, dashboards

Everybody knows that whoever has the most dashboards wins. I am talking here about application-specific metrics that you want to track over time. I gave an example before of a dashboard I built for visualizing the outgoing email count through our system.

It's very easy to build such a dashboard with the Google Visualization API, so there's really no excuse for not having charts for critical metrics of your infrastructure. We use queuing a lot internally at Evite, so we have dashboards for tracking various queue sizes. We also track application errors from nginx logs and chart them in various ways: by server, by error code, by URL, aggregated, etc.

Dashboards offered by external monitoring tools such as Pingdom/Keynote/Gomez/Akamai are also very useful. They typically chart uptime and response time for various pages, and edge/origin HTTP traffic in the case of Akamai.

Step 4. Correlate errors with resource state and capacity

The combination of internal and external monitoring, resource charting and application dashboards is very powerful. As a rule, whenever you have an external alert firing off, you should have one or more internal ones firing off too. If you don't, then you don't have sufficient internal alerts, so you need to work on that aspect of your monitoring.

Once you do have external and internal alerts firing off in unison, you will be able to correlate external issues (such as increased percentages of HTTP error codes, or timeouts in certain application URLs) with server capacity issues/bottlenecks within your infrastructure. Of course, the fact that you are charting resources over time, and that you have a baseline to go from, will help you quickly identify outliers such as spikes in CPU usage or drops in Akamai traffic.

A typical work day for me starts with me opening a few tabs in my browser: the Nagios overview page, the Munin overview, the Ganglia overview, the Akamai HTTP content delivery dashboard, and various application-specific dashboards.

Let's say I get an alert from Akamai that the percentage of HTTP 500 error codes is over 1%. I start by checking the resource graphs for our database servers. I look at in-depth MySQL metrics in Munin, and at CPU metrics (especially CPU I/O wait time) in Ganglia. If nothing is out of the ordinary, I look at our various application services (our application consists of a multitude of RESTful Web services). The nginx log dashboard may show increased HTTP 500 errors from a particular server, or it may show an increase in such errors across the board. This may point to insufficient capacity at our services layer. Time to deploy more services on servers with enough CPU/RAM capacity. I know which servers those are, because I keep tabs on them with Munin and Ganglia.

As another example, I know that if the CPU I/O wait on my database servers approaches 30%, the servers will start huffing and puffing, and I'll see an increased number of slow queries. In this case, it's time to either identify queries to be optimized, or reduce the number of queries to the database -- or if everything else fails, time to add more database servers. (BTW, if you haven't yet read @allspaw's book "The Art of Capacity Planning", add reading it as a task for Step 0)

My point is that all these alerts and metric graphs are interconnected, and without looking at all of them, you're flying blind.

Step 5. Expect failures and recover quickly and gracefully

It's not a question whether failures will happen, it's WHEN they will happen. When they do happen, you need to be prepared. Hopefully you designed your infrastructure in a way that allows you to bounce back quickly from failures. If a database server goes down, hopefully you have a slave that you can quickly promote to a master, or even better you have another passive master ready to become the active server. Even better, maybe you have a fancy self-healing distributed database -- kudos to you then ;-)

One thing that you can do here is to have various knobs that turn on and off certain features or pieces of functionality within your application (again, John Allspaw has some blog posts and presentations on that from his days at Flickr and his current role at Etsy). These knobs allow you to survive an application server outage, or even (God forbid) a database outage, while still being able to present *something* to your end-users.

To quickly bounce back from a server failure, I recommend you use automated deployment and configuration management tools such as Chef, Puppet, Fabric, etc. (see some posts of mine on this topic). I personally use a combination of Chef (to bootstrap a new machine and do things as file system layout, pre-requisite installation etc) and Fabric (to actually deploy the application code).


Update #1:

Comment on Twitter:

@ericholscher:

"@griggheo Good stuff. Any thoughts on figuring out symptom's from causes? eg. load balancer is having issues which causes db load to drop?"

Good question, and something similar actually has happened to us. To me, it's a matter of knowing your baseline graphs. In our case, whenever I see an Akamai traffic drop, it's usually correlated to an increase in the percentage of HTTP 500 errors returned by our Web services. If I also see DB traffic dropping, then I know the bottleneck is at the application services layer. If the DB traffic is increasing, then the bottleneck is most likely the DB. Depending on the bottleneck, we need to add capacity at that layer, or to optimize code or DB queries at the respective layer.

Main accomplishment of this post

I'm most proud of the fact that I haven't used the following words in the post above: 'devops', 'cloud', 'noSQL' and 'agile'.

Thursday, October 28, 2010

MySQL load balancing with HAProxy

In an earlier blog post I was advising people to use HAProxy 1.4 and above if they need MySQL load balancing with health checks. It turns out that I didn't have much luck with that solution either. HAProxy shines when it load balances HTTP traffic, and its health checks are really meant to be run over HTTP and not plain TCP. So the solution I found was to have a small HTTP Web service (which I wrote using tornado) listening on a configurable port on each  MySQL node.

For the health check, the Web service connects via MySQLdb to the MySQL instance running on a given port and issues a 'show databases' command. For more in-depth checking you can obviously run fancier SQL statements.

The code for my small tornado server is here. The default port it listens on is 31337.

Now on the HAProxy side I have a "listen" section for each collection of MySQL nodes that I want to load balance. Example:
listen mysql-m0 0.0.0.0:33306
  mode tcp
  option httpchk GET /mysqlchk/?port=3306
  balance roundrobin
  server db101 10.10.10.1:3306 check port 31337 inter 5000 rise 3 fall 3
  server db201 10.10.10.2:3306 check port 31337 inter 5000 rise 3 fall 3 backup
In this case, HAProxy listens on port 33306 and load balances MySQL traffic between db101 and db201, with db101 being the primary node and db201 being the backup node (which means that traffic only goes to db101 unless it's considered down by the health check, in which case traffic is directed to db201). This scenario is especially useful when db101 and db201 are in a master-master replication setup, and you want traffic to hit only 1 of them at any given time. Note also that I could have had HAProxy listen on port 3306, but I preferred to have it listen and be contacted by the application on port 33306, in case I also wanted to run a MySQL server in port 3306 on the same server as HAProxy.

I specify how to call the HTTP check handler via "option httpchk GET /mysqlchk/?port=3306". I specify the port the handler listens on via the "port" option in the "server" line. In my case the port is 31337. So HAProxy will do a GET against http://10.10.10.1:31337/mysqlchk/?port=3306. If the result is an HTTP error code, the health check will be considered failed.

The other options "inter 5000 rise 3 fall 3" mean that the health check is issued by HAProxy every 5,000 ms, and that the health check needs to succeed 3 times ("rise 3") in order for the node to be considered up, and it needs to fail 3 times ("fall 3") in order for the node to be considered down.

I hasten to add that the master-master load balancing has its disadvantages. It did save my butt one Sunday morning when db101 went down hard (after all, it was an EC2 instance), and traffic was directed by HAProxy to db201 in a totally transparent fashion to the application.

But....I have also seen the situation where db201, as a slave to db101, lagged in its replication, and so when db101 was considered down and traffic was sent to db201, the state of the data was stale from an application point of view. I consider this disadvantage to weigh more than the automatic failover advantage, so I actually ended up taking db201 out of HAProxy. If db101 ever goes down hard again, I'll just manually point HAProxy to db201, after making sure the state of the data on db201 is what I expect.

So all this being said, I recommend the automated failover scenario only when load balance against a read-only farm of MySQL servers, which are all probably slaves of some master. In this case, although reads can also get out of sync, at least you won't attempt to do creates/updates/deletes against stale data.

The sad truth is that there is no good way of doing automated load balancing AND failover with MySQL without resorting to things such as DRBD which are not cloud-friendly. I am aware of Yves Trudeau's blog posts on "High availability for MySQL on Amazon EC2" but the setup he describes strikes me as experimental and I wouldn't trust it in a large-scale production setup.

In any case, I hope somebody will find the tornado handler I wrote useful for their own MySQL health checks, or actually any TCP-based health check they need to do within HAProxy.

Thursday, October 14, 2010

Introducing project "Overmind"

Overmind is the brainchild of Miquel Torres. In its current version, released today, Overmind is what is sometimes called a 'controller fabric' for managing cloud instances, based on libcloud. However, Miquel's Roadmap for the project is very ambitious, and includes things like automated configuration management and monitoring for the instances launched and managed via Overmind.

A little bit of history: Miquel contacted me via email in late July because he read my blog post on "Automated deployment systems: push vs. pull" and he was interested in collaborating on a queue-based deployment/config management system. The first step in such a system is to actually deploy the instances you need configured. Hence the need for something like Overmind.

I'm sure you're asking yourself -- why do these guys wanted to roll their own system? Why not use something like OpenStack? Note in late July OpenStack had only just been announced, and to this day (mid-October 2010) they have yet to release their controller fabric code. In the mean time, we have a pretty functional version of a deployment tool in Overmind, supporting Amazon EC2 and Rackspace, with a Django Web interface, and also with a REST API interface.

I am aware there are many other choices out there in terms of managing and deploying cloud instances -- Cloudkick, RightScale, Scalarium ...and the list goes on. The problem is that none of these is Open Source. They do have great ideas though that we can steal ;-)

I am also aware of Ruby-based tools such as Marionette Collective and its close integration with Puppet (which is now even closer since it has been acquired by Puppet Labs). The problem is that it's Ruby and not Python ;-)

In short, what Overmind brings to the table today is a Python-based, Django-based, libcloud-based tool for deploying (and destroying, but be careful out there) cloud instances. For the next release, Miquel and I are planning to add some configuration management capabilities. We're looking at kokki as a very interesting Python-based alternative to chef, although we're planning on supporting chef-solo too.

If you're interested in contributing to the project, please do! Miquel is an amazingly talented, focused and relentless developer, but he can definitely use more help (my contributions have been minimal in terms of actual code; I mostly tested Miquel's code and did some design and documentation work, especially in the REST API area).

Here are some pointers to Overmind-related resources:

Monday, September 27, 2010

Getting detailed I/O stats with Munin

Ever since Vladimir Vuksan pointed me to his Ganglia script for getting detailed disk stats, I've been looking for something similar for Munin. The iostat and iostat_ios Munin plugins, which are enabled by default when you install Munin, do show disk stats across all devices detected on the system. I wanted more in-depth stats per device though. In my case, the devices I'm interested in are actually Amazon EBS volumes mounted on my database servers.

I finally figured out how to achieve this, using the diskstat_ Munin plugin which gets installed by default when you install munin-node.

If you run

/usr/share/munin/plugins/diskstat_ suggest

you will see the various symlinks you can create for the devices available on your server.

In my case, I have 2 EBS volumes on each of my database servers, mounted as /dev/sdm and /dev/sdn. I created the following symlinks for /dev/sdm (and similar for /dev/sdn):


ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_latency_sdm
ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_throughput_sdm
ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_iops_sdm

Here's what metrics you get from these plugins:

  • from diskstat_iops: Read I/O Ops/sec, Write I/O Ops/sec, Avg. Request Size, Avg. Read Request Size, Avg. Write Request Size
  • from diskstat_latency: Device Utilization, Avg. Device I/O Time, Avg. I/O Wait Time, Avg. Read I/O Wait Time, Avg. Write I/O Wait Time
  • from diskstat_throughput: Read Bytes, Write Bytes
My next step is to follow the advice of Mark Seger (the author of collectl) and graph the output of collectl in real time, so that the stats are displayed in fine-grained intervals of 5-10 seconds instead of the 5-minute averages that RRD-based tools offer.

Tuesday, September 21, 2010

Quick note on installing and configuring Ganglia

I decided to give Ganglia a try to see if I like its metric visualizations and its plugins better than Munin's. I am still in the very early stages of evaluating it. However, I already banged my head against the wall trying to understand how to configure it properly. Here are some quick notes:

1) You can split your servers into clusters for ease of metric aggregation.

2) Each node in a cluster needs to run gmond. In Ubuntu, you can do 'apt-get install ganglia-monitoring' to install it. The config file is in /etc/ganglia/gmond.conf. More on the config file in a minute.

3) Each node in a cluster can send its metrics to a designated node via UDP.

4) One server in your infrastructure can be configured as both the overall metric collection server, and as the web front-end. This server needs to run gmetad, which in Ubuntu can be installed via 'apt-get install gmetad'. Its config file is /etc/gmetad.conf.

Note that you can have a tree of gmetad nodes, with the root of the tree configured to actually display the metric graphs. I wanted to keep it simple, so I am running both gmetad and the Web interface on the same node.

5) The gmetad server periodically polls one or more nodes in each cluster and retrieves the metrics for that cluster. It displays them via a PHP web interface which can be found in the source distribution.

That's about it in a nutshell in terms of the architecture of Ganglia. The nice thing is that it's scalable. You split nodes in clusters, you designate one or more nodes in a cluster to gather metrics from all the other nodes, and you have one ore more gmetad node(s) collecting the metrics from the designated nodes.

Now for the actual configuration. I have a cluster of DB servers, each running gmond. I also have another server called bak01 that I keep around for backup purposes. I configured each DB server to be part of a cluster called 'db'. I also configured each DB server to send the metrics collected by gmond to bak01 (via UDP on the non-default port of 8650). To do this, I have these entries in /etc/ganglia/gmond.conf on each DB server:


cluster {
  name = "db"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel { 
  host = bak01
  port = 8650

On host bak01, I also defined a udp_recv_channel and a tcp_accept_channel:

udp_recv_channel { 
  port = 8650

/* You can specify as many tcp_accept_channels as you like to share 
   an xml description of the state of the cluster */ 
tcp_accept_channel { 
  port = 8649 

The upd_recv_channel is necessary so bak01 can receive the metrics from the gmond nodes. The tcp_accept_channel is necessary so that bak01 can be contacted by the gmetad node.

That's it in terms of configuring gmond.

On the gmetad node, I made one modification to the default /etc/gmetad.conf file by specifying the cluster I want to collect metrics for, and the node where I want to collect the metrics from:

data_source "eosdb" 60 bak01

I then restarted gmetad via '/etc/init.d/gmetad restart'.

Ideally, these instructions would get you to a state where you would be able to see the graphs for all the nodes in the cluster. 

I automated the process of installing and configuring gmond on all the nodes via fabric. Maybe it all happened too fast for the collecting node (bak01), because it wasn't collecting metrics correctly for some of the nodes. I noticed that if I did 'telnet localhost 8649' on bak01, some of the nodes had no metrics associated with them. My solution was to stop and start gmond on those nodes, and that kicked things off. Strange though...

In any case, my next step is to install all kinds of Ganglia plugins, especially related to MySQL, but also for more in-depth disk I/O metrics.

Wednesday, September 15, 2010

Managing Rackspace CloudFiles with python-cloudfiles

I've started to use Rackspace CloudFiles as an alternate storage for database backups. I have the backups now on various EBS volumes in Amazon EC2, AND in CloudFiles, so that should be good enough for Disaster Recovery purposes, one would hope ;-)

I found the documentation for the python-cloudfiles package a bit lacking, so here's a quick post that walks through the common scenarios you encounter when managing CloudFiles containers and objects. I am not interested in the CDN aspect of CloudFiles for my purposes, so for that you'll need to dig on your own.

A CloudFiles container is similar to an Amazon S3 bucket, with one important difference: a container name cannot contain slashes, so you won't be able to mimic a file system hierarchy in CloudFiles the way you can do it in S3. A CloudFiles container, similar to an S3 bucket, contains objects -- which for CloudFiles have a max. size of 5 GB. So the CloudFiles storage landscape consists of 2 levels: a first level of containers (you can have an unlimited number of them), and a second level of objects embedded in containers. More details in the CloudFiles API Developer Guide (PDF).

Here's how you can use the python-cloudfiles package to perform CRUD operations on containers and objects.

Getting a connection to CloudFiles

First you need to obtain a connection to your CloudFiles account. You need a user name and an API key (the key can be generated via the Web interface at https://manage.rackspacecloud.com).

conn = cloudfiles.get_connection(username=USERNAME, api_key=API_KEY, serviceNet=True)

When specifying serviceNet=True, the docs say that you will use the Rackspace ServiceNet network to access Cloud Files, and not the public network.

Listing containers and objects

Once you get a connection, you can list existing containers, and objects within a container:

containers = conn.get_all_containers()
for c in containers:
    print "\nOBJECTS FOR CONTAINER: %s" % c.name
    objects = c.get_objects()
    for obj in objects:
        print obj.name

Creating containers

container = conn.create_container(container_name)

Creating objects in a container

Assuming you have a list of filenames you want to upload to a given container:

for f in files:
    print 'Uploading %s to container %s' % (f, container_name)
    basename = os.path.basename(f)
    o = container.create_object(basename)
    o.load_from_filename(f)

(note that the overview in the python-cloudfiles index.html doc has a typo -- it specifies 'load_from_file' instead of the correct 'load_from_filename')

Deleting containers and objects

You first need to delete all objects inside a container, then you can delete the container itself:

print 'Deleting container %s' % c.name
print 'Deleting all objects first'
objects = c.get_objects()
for obj in objects:
    c.delete_object(obj.name)
print 'Now deleting the container'
conn.delete_container(c.name)

Retrieving objects from a container

Remember that you don't have a backup process in place until you tested restores. So let's see how you retrieve objects that are stored in a CloudFiles container:

container_name = sys.argv[1]
containers = conn.get_all_containers()
c = None
for c in containers:
    if container_name == c.name:
        break
if not c:
    print "No countainer found with name %s" % container_name
    sys.exit(1)

target_dir = container_name
os.system('mkdir -p %s' % target_dir)
objects = c.get_objects()
for obj in objects:
    obj_name = obj.name
    print "Retrieving object %s" % obj_name
    target_file = "%s/%s" % (target_dir, obj_name)
    obj.save_to_filename(target_file)

Wednesday, September 01, 2010

MySQL InnoDB hot backups and restores with Percona XtraBackup

I blogged a while ago about MySQL fault-tolerance and disaster recovery techniques. At that time I was experimenting with the non-free InnoDB Hot Backup product. In the mean time I discovered Percona's XtraBackup (thanks Robin!). Here's how I tested XtraBackup for doing a hot backup and a restore of a MySQL database running Percona XtraDB (XtraBackup works with vanilla InnoDB too).

First of all, I use the following Percona .deb packages on a 64-bit Ubuntu Lucid EC2 instance:


# dpkg -l | grep percona
ii libpercona-xtradb-client-dev 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database development files
ii libpercona-xtradb-client16 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client library
ii percona-xtradb-client-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client binaries
ii percona-xtradb-common 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database common files (e.g. /etc
ii percona-xtradb-server-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database server binaries

I tried using the latest stable XtraBackup .deb package from the Percona downloads site but it didn't work for me. I started a hot backup with /usr/bin/innobackupex-1.5.1 and it ran for a while before dying with "InnoDB: Operating system error number 9 in a file operation." See this bug report for more details.

After unsuccessfully trying to compile XtraBackup from source, I tried XtraBackup-1.3-beta for Lucid from the Percona downloads. This worked fine.

Here's the scenario I tested against a MySQL Percona XtraDB instance running with DATADIR=/var/lib/mysql/m10 and a customized configuration file /etc/mysql10/my.cnf. I created and attached an EBS volume which I mounted as /xtrabackup on the instance running MySQL.

1) Take a hot backup of all databases under that instance:

/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --password=xxxxxx /xtrabackup

This will take a while and will create a timestamped directory under /xtrabackup, where it will store the database files from DATADIR. Note that the InnoDB log files are not created unless you apply step 2 below.

As the documentation says, make sure the output of innobackupex-1.5.1 ends with:

100901 05:33:12 innobackupex-1.5.1: completed OK!

2) Apply the transaction logs to the datafiles just created, so that the InnoDB logfiles are recreated in the target directory:

/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --password=xxxxxx --apply-log /xtrabackup/2010-09-01_05-21-36/

At this point, I tested a disaster recovery scenario by stopping MySQL and moving all files in DATADIR to a different location.

To bring the databases back to normal from the XtraBackup hot backup, I did the following:

1) Brought back up a functioning MySQL instance to be used by the XtraBackup restore operation:

i) Copied the contents of the default /var/lib/mysql/mysql database under /var/lib/mysql/m10/ (or you can recreate the mysql DB from scratch)

ii) Started mysqld_safe manually:

mysqld_safe --defaults-file=/etc/mysql10/my.cnf

This will create the data files and logs under DATADIR (/var/lib/mysql/m10) with the sizes specified in the configuration file. I had to wait until the messages in /var/log/syslog told me that the MySQL instance is ready and listening for connections.

2) Copied back the files from the hot backup directory into DATADIR

Note that the copy-back operation below initially errored out because it tried to copy the mysql directory too, and it found the directory already there under DATADIR. So the 2nd time I ran it, I moved /var/lib/mysql/m10/mysql to mysql.bak. The copy-back command is:

/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --copy-back /xtrabackup/2010-09-01_05-21-36/

You can also copy the files from /xtrabackup/2010-09-01_05-21-36/ into DATADIR using vanilla cp.

NOTE: verify the permissions on the restored files. In my case, some files in DATADIR were owned by root, so MySQL didn't start up properly because of that. Do a 'chown -R mysql:mysql DATADIR' to be sure.

3) If everything went well in step 2, restart the MySQL instance to make sure everything is OK.

At this point, your MySQL instance should have its databases restored to the point where you took the hot backup.

IMPORTANT: if the newly restored instance needs to be set up as a slave to an existing master server, you need to set the correct master_log_file and master_log_pos parameters via a 'CHANGE MASTER TO' command. These parameters are saved by innobackupex-1.5.1 in a file called xtrabackup_binlog_info in the target backup directory.

In my case, the xtrabackup_binlog_info file contained:

mysql-bin.000041 23657066

Here is an example of a CHANGE MASTER TO command I used:

STOP SLAVE;

CHANGE MASTER TO MASTER_HOST='masterhost', MASTER_PORT=3316, MASTER_USER='masteruser', MASTER_PASSWORD='masterpass', MASTER_LOG_FILE='mysql-bin.000041', MASTER_LOG_POS=23657066;

START SLAVE;

Note that XtraBackup can also run in a 'stream' mode useful for compressing the files generated by the backup operation. Details in the documentation.

Tuesday, August 31, 2010

Poor man's MySQL disaster recovery in EC2 using EBS volumes

First of all, I want to emphasize that this is NOT a disaster recovery strategy I recommend. However, in a pinch, it might save your ass. Here's the scenario I have:

  • 2 m1.large instances running Ubuntu 10.04 64-bit and the Percona XtraDB MySQL builds (for the record, the exact version I'm using is "Server version: 5.1.43-60.jaunty.11-log (Percona SQL Server (GPL), XtraDB 9.1, Revision 60")
  • I'll call the 2 servers db101 and db201
  • each server is running 2 MySQL instances -- I'll call them m1 and m2
  • instance m1 on db101 and instance m1 on db201 are set up in master-master replication (and similar for instance m2)
  • the DATADIR for m1 is /var/lib/mysql/m1 on each server; that file system is mounted from an EBS volume (and similar for m2)
  • the configuration files for m1 are in /etc/mysql1 on each server -- that directory was initially a copy of the Ubuntu /etc/mysql configuration directory, which I then customized (and similar for m2)
  • the init.d script for m1 is in /etc/init.d/mysql1 (similar for m2)
What I tested:
  • I took a snapshot of each of the 2 EBS volumes associated with each of the DB servers (4 snapshots in all)
  • I terminated the 2 m1.large instances
  • I launched 2 m1.xlarge instances and installed the same Percona distribution (this was done via a Chef recipe at instance launch time); I'll call the 2 new instances xdb101 and xdb102
  • I pushed the configuration files for m1 and m2, as well as the init.d scripts (this was done via fabric)
  • I created new volumes from the EBS snapshots (note that these volumes can be created in any EC2 availability zone)
  • On xdb101, I attached the 2 volumes created from the EBS snapshots on db101; I specified /dev/sdm and /dev/sdn as the device names (similar on xdb201)
  • On xdb101, I created /var/lib/mysql/m1 and mounted /dev/sdm there; I also created /var/lib/mysql/m2 and mounted /dev/sdn there (similar on xdb201)
  • At this point, the DATADIR directories for both m1 and m2 are populated with 'live files' from the moment when I took the EBS snapshot
  • I made sure syslog-ng accepts UDP traffic from localhost (by default it doesn't); this is because by default in Ubuntu mysql log messages are sent to syslog --> to do this, I ensured that "udp(ip(127.0.0.1) port(514));" appears in the "source s_all" entry in /etc/syslog-ng/syslog-ng.conf
At this point, I started up the first MySQL instance on xdb101 via "/etc/init.d/mysql1 start". This script most likely will show [fail] on the console, because MySQL will not start up normally. If you look in /var/log/syslog, you'll see entries similar to:

Aug 31 18:03:21 xdb101 mysqld: 100831 18:03:21 [Note] Plugin 'FEDERATED' is disabled.
Aug 31 18:03:21 xdb101 mysqld: InnoDB: The InnoDB memory heap is disabled
Aug 31 18:03:21 xdb101 mysqld: InnoDB: Mutexes and rw_locks use GCC atomic builtins
Aug 31 18:03:22 xdb101 mysqld: 100831 18:03:22  InnoDB: highest supported file format is Barracuda.
Aug 31 18:03:23 xdb101 mysqld: InnoDB: The log sequence number in ibdata files does not match
Aug 31 18:03:23 xdb101 mysqld: InnoDB: the log sequence number in the ib_logfiles!
Aug 31 18:03:23 xdb101 mysqld: 100831 18:03:23  InnoDB: Database was not shut down normally!
Aug 31 18:03:23 xdb101 mysqld: InnoDB: Starting crash recovery.

If you wait a bit longer (and if you're lucky), you'll see entries similar to:

Aug 31 18:04:20 xdb101 mysqld: InnoDB: Restoring possible half-written data pages from the doublewrite
Aug 31 18:04:20 xdb101 mysqld: InnoDB: buffer...
Aug 31 18:04:24 xdb101 mysqld: InnoDB: In a MySQL replication slave the last master binlog file
Aug 31 18:04:24 xdb101 mysqld: InnoDB: position 0 15200672, file name mysql-bin.000015
Aug 31 18:04:24 xdb101 mysqld: InnoDB: and relay log file
Aug 31 18:04:24 xdb101 mysqld: InnoDB: position 0 15200817, file name ./mysqld-relay-bin.000042
Aug 31 18:04:24 xdb101 mysqld: InnoDB: Last MySQL binlog file position 0 17490532, file name /var/lib/mysql/m1/mysql-bin.000002
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 InnoDB Plugin 1.0.6-9.1 started; log sequence number 1844705956
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Recovering after a crash using /var/lib/mysql/m1/mysql-bin
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Starting crash recovery...
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Crash recovery finished.

At this point, you can do "/etc/init.d/mysql1 restart" just to make sure that both stopping and starting that instance work as expected. Repeat for instance m2, and also repeat on server xdb201.

So....IF you are lucky and the InnoDB crash recovery process did its job, you should have 2 functional MySQL instances one each of xdb101 and xdb201. I tested this with several pairs of servers and it worked for me every time, but I hasten to say that YMMV, so DO NOT bet on this as your disaster recovery strategy!

At this point I still had to re-establish the master-master replication between m1 on xdb101 and m1 on xdb201 (and similar for m2). 

When I initially set up this replication between the original m1.large servers, I used something like this on both db101 and db201:

CHANGE MASTER TO MASTER_HOST='master1', MASTER_PORT=3306, MASTER_USER='masteruser', MASTER_PASSWORD='xxxxxx';"

The trick for me is that master1 points to db201 in db101's /etc/hosts, and vice-versa.

On the newly created xdb101 and xdb201, there are no entries for master1 in /etc/hosts, so replication is broken. Which is a good thing initially, because you want to have the MySQL instances on each server be brought back up without throwing replication into the mix.

Once I added an entry for master1 in xdb101's /etc/hosts pointing to xdb201, and did the same on xdb201, I did a 'stop slave; start slave; show slave status\G' on the m1 instance on each server. In all cases I tested, one of the slaves was showing everything OK, while the other one was complaining about   not being able to read from the master's log file. This was fairly simply to fix. Let's assume xdb101 is the one complaining. I did the following:
  • on xdb201, I ran 'show master status\G' and noted the file name (for example "mysql-bin.000017") and the file position (for example 106)
  • on xdb101, I ran the following command: "stop slave; change master to master_log_file='mysql-bin.000017', master_log_pos=106; start slave;"
  • not a 'show slave status\G' on xdb101 should show everything back to normal
Some lessons:
  • take periodic snapshots of your EBS volumes (at least 1/day)
  • for a true disaster recovery strategy, use at least mysqldump to dump your DB to disk periodically, or something more advanced such as Percona XtraBackup; I recommend dumping the DB to an EBS volume and taking periodic snapshots of that volume
  • the procedure I detailed above is handy when you want to grow your instance 'vertically' -- for example I went from m1.large to m1.xlarge

Friday, August 20, 2010

Visualizing MySQL metrics with the munin-mysql plugin

Munin is a great tool for resource visualization. Sometimes though installing a 3rd party Munin plugin is not as straightforward as you would like. I have been struggling a bit with one such plugin, munin-mysql, so I thought I'd spell it out for my future reference. My particular scenario is running multiple MySQL instances on various port numbers (3306 and up) on the same machine. I wanted to graph in particular the various InnoDB metrics that munin-mysql supports. I installed the plugin on various Ubuntu flavors such as Jaunty and Lucid.

Here are the steps:

1) Install 2 pre-requisite Perl modules for munin-mysql: IPC-ShareLite and Cache-Cache

2) git clone http://github.com/kjellm/munin-mysql

3) cd munin-mysql; edit Makefile and point PLUGIN_DIR to the directory where your munin plugins reside (if you installed Munin on Ubuntu via apt-get, that directory is /usr/share/munin/plugins)

4) make install --> this will copy the mysql_ Perl script to PLUGIN_DIR, and the mysql_.conf file to /etc/munin/plugin-conf.d

5) Edit /etc/munin/plugin-conf.d/mysql_.conf and customize it with your specific MySQL information.

For example, if you run 2 MySQL instances on ports 3306 and 3307, you could have something like this in mysql_.conf:


[mysql_3306_*]
env.mysqlconnection DBI:mysql:mysql;host=127.0.0.1;port=3306
env.mysqluser myuser1
env.mysqlpassword mypassword1

[mysql_3307_*]
env.mysqlconnection DBI:mysql:mysql;host=127.0.0.1;port=3307
env.mysqluser myuser2
env.mysqlpassword mypassword2

6) Run "/usr/share/munin/plugins/mysql_ suggest" to see what metrics are supported by the plugin. Then proceed to create symlinks in /etc/munin/plugins, adding the port number and the metric name as the suffix.

For example, to track InnoDB I/O metrics for the MySQL instance running on port 3306, you would create this symlink:

ln -s /usr/share/munin/plugins/mysql_ /etc/munin/plugins/mysql_3306_innodb_io

(replace 3306 with 3307 to track this metric for the other MySQL instance running on port 3307)

Of course, it's easy to automate this by a simple shell script.

7) Restart munin-node and wait 10-15 minutes for the munin master to receive the information about the new metrics.

Important! If you need to troubleshoot this plugin (and any Munin plugin), do not make the mistake of simply running the plugin script directly in the shell. If you do this, it will not read the configuration file(s) correctly, and it will most probably fail. Instead, what you need to do is to follow the "Debugging Munin plugins" documentation, and run the plugin through the munin-run utility. For example:


# munin-run mysql_3306_innodb_io
ib_io_read.value 34
ib_io_write.value 57870
ib_io_log.value 8325
ib_io_fsync.value 55476

One more thing: you should probably automate all these above steps. I have most of it automated via a fabric script. The only thing I do by hand is to create the appropriate symlinks for the specific port numbers I have on each server.

That's it! Enjoy staring for hours at your brand new MySQL metrics!


Monday, August 16, 2010

MySQL and AppArmor on Ubuntu

This is just a quick post that I hope will save some people some headache when they try to customize their MySQL setup on Ubuntu. I've spent some quality time with this problem over the weekend. I tried in vain for hours to have MySQL read its configuration files from a non-default location on an Ubuntu 9.04 server, only to figure out that it was all AppArmor's fault.

My ultimate goal was to run multiple instances of MySQL on the same host. In the past I achieved this with MySQL Sandbox, but this time I wanted to use MySQL installed from Debian packages and not from a tarball of the binary distribution, and MySQL Sandbox has some issues with that.

Here's what I did: I copied /etc/mysql to /etc/mysql0, then I edited /etc/mysql0/my.cnf and modified the location of the socket file, the pid file and the datadir to non-default locations. Then I tried to run:

/usr/bin/mysqld_safe --defaults-file=/etc/mysql0/my.cnf

At this point, /var/log/daemon.log showed this error:

mysqld[25133]: Could not open required defaults file: /etc/mysql0/my.cnf
mysqld[25133]: Fatal error in defaults handling. Program aborted

It took me as I said a few hours trying all kinds of crazy things until I noticed lines like these in /var/log/syslog:

kernel: [18593519.090601] type=1503 audit(1281847667.413:22): operation="inode_permission" requested_mask="::r" denied_mask="::r" fsuid=0 name="/etc/mysql0/my.cnf"
 pid=4884 profile="/usr/sbin/mysqld"

This made me realize it's AppArmor preventing mysqld from opening non-default files. I don't need AppArmor on my servers, so I just stopped it with 'service apparmor stop' and chkconfig-ed it off....at which point every customization I had started to work perfectly.

At least 2 lessons:

1) when you see mysterious, hair-pulling errors, check security-related processes on your server: iptables, AppArmor, SELinux etc.

2) check all log files in /var/log -- I was focused on daemon.log and didn't notice the errors in syslog quickly enough

Google didn't help when I searched for "mysqld Could not open required defaults file". I couldn't find any reference to AppArmor, only to file permissions.

Tuesday, August 03, 2010

What automated deployment/config mgmt tools do you use?

I posted this question yesterday as a quick tweet. I got a bunch of answers already that I'll include here, but feel free to add your answers as comments to this post too. Or reply to @griggheo on Twitter.

I started by saying I have 2 favorite tools: Fabric for pushing app state (pure Python) and Chef for pulling/bootstraping OS/package state (pure Ruby). For more discussions on push vs. pull deployment tools, see this post of mine.

Here are the replies I got on Twitter so far:

@keyist : Fabric and Chef for me as well. use Fabric to automate uploading cookbooks+json and run chef-solo on server

@vvuksan : mcollective for control, puppet for config mgmt/OS config. Some reasons why outlined here http://j.mp/cAKarI

@RackerHacker : There is another solution besides ssh and for loops? :-P

@chris_mahan : libcloud, to pop debian stable on cloud instance, fabric to set root passwd, install python2.6.5, apt-get nginx php django fapws.

@alfredodeza : I'm biased since I wrote it, but I use Pacha: http://code.google.com/p/pacha

@tcdavis : Fabric for remote calls; distribute/pip for python packaging and deps; Makefile to eliminate any repetition.

@competentgirl : Puppet for its expandability and integration w/ other tools (ie svn)

@yashh : Fabric. bundle assets, push to bunch of servers and restart in a shot.. love it

@bitprophet : I'm biased but I use Fab scripts for everything. Was turned off by daemons/declarative/etc aspects of Chef/Puppet style systems.

@kumar303 : @bitprophet we use Cap only because we do 2-way communication with remote SSH procs. Been meaning to look at patching Fab for this

Update with more Twitter replies:

@lt_kije : cfengine on work cluster (HPC/HTC), radmind on personal systems (OpenBSD)

@zenmatt : check out devstructure, built on puppet, more natural workflow for configuring servers.

@almadcz : paver for building, fabric or debian repo for deployment, buildbot for forgetting about it.

@geo_kollias: I use fabric for everything as @bitprophet, but i am thinking about making use of @hpk42 's execnet soon.

Wednesday, July 21, 2010

Bootstrapping EC2 instances with Chef

This is the third installment of my Chef post series (read the first and the second). This time I'll show how to use the Ubuntu EC2 instance bootstrap mechanism in conjunction with Chef and have the instance configure itself at launch time. I had a similar post last year, in which I was accomplishing a similar thing with puppet.

Why Chef this time, you ask? Although I am a Python guy, I prefer learning a smattering of Ruby rather than a proprietary DSL for configuration management. Also, when I upgraded my EC2 instances to the latest Ubuntu Lucid AMIs, puppet stopped working, so I was almost forced to look into Chef -- and I've liked what I've seen so far. I don't want to bad-mouth puppet though, I recommend you look into both if you need a good configuration management/deployment tool.

Here is a high-level view of the bootstrapping procedure I'm using:

1) You create Chef roles and tie them to cookbooks and recipes that you want executed on machines which will be associated with these roles.
2) You launch an EC2 Ubuntu AMI using any method you want (the EC2 Java-based command-line API, or scripts based on boto, etc.). The main thing here is that you pass a custom shell script to the instance via a user-data file.
3) When the EC2 instance boots up, it runs your custom user-data shell script. The script installs chef-client and its prerequisites, downloads the files necessary for running chef-client, runs chef-client once to register with the chef master and to run the recipes associated with its role, and finally runs chef-client in the background so that it wakes up and executed every N minutes.

Here are the 3 steps in more detail.

1) Create Chef roles, cookbooks and recipes

I already described how to do this in my previous post.

For the purposes of this example, let's assume we have a role called 'base' associated with a cookbook called 'base' and another role called 'myapp' associated with a cookbook called 'myapp'.

The 'base' cookbook contains recipes that can do things like installing packages that are required across all your applications, creating users and groups that you need across all server types, etc.

The 'myapp' cookbook contains recipes that can do things specific to one of your particular applications -- in my case, things like installing and configuring tornado/nginx/haproxy.

As a quick example, here's how to add a user and a group both called "myoctopus". This can be part of the default recipe in the cookbook 'base' (in the file cookbooks/base/recipes/default.rb).

The home directory is /home/myoctopus, and we make sure that directory exists and is owned by the user and group myoctopus.

group "myoctopus" do
action :create
end

user "myoctopus" do
gid "myoctopus"
home "/home/myoctopus"
shell "/bin/bash"
end

%w{/home/myoctopus}.each do |dir|
directory dir do
owner "myoctopus"
group "myoctopus"
mode "0755"
action :create
not_if "test -d #{dir}"
end
end


The role 'base' looks something like this, in a file called roles/base.rb:

name "base"
description "Base role (installs common packages)"
run_list("recipe[base]")


The role 'myapp' looks something like this, in a file called roles/myapp.rb:


name "myapp"
description "Installs required packages and applications for an app server"
run_list "recipe[memcached]", "recipe[myapp::tornado]"


Note that the role myapp specifies 2 recipes to be run: one is the default recipe of the 'memcached' cookbook (which is part of the Opscode cookbooks), and one is a reciped called tornado which is part of the myapp cookbook (the file for that recipe is cookbooks/myapp/recipes/tornado.rb). Basically, to denote a recipe, you either specify its cookbook (if the recipe is the default recipe of that cookbook), or you specify cookbook::recipe_name (if the recipe is non-default).

So far, we haven't associated any clients with these roles. We're going to do that on the client EC2 instance. This way the Chef server doesn't have to do any configuration operations during the bootstrap of the EC2 instance.

2) Launching an Ubuntu EC2 AMI with custom user-data

I wrote a Python wrapper around the EC2 command-line API tools. To launch an EC2 instance, I use the ec2-run-instances command-line tool. My Python script also takes a command line option called chef_role, which specifies the Chef role I want to associate with the instance I am launching. The main ingredient in the launching of the instance is the user-data file (passed to ec2-run-instances via the -f flag).

I use this template for the user-data file. My Python wrapper replaces HOSTNAME with an actual host name that I pass via a cmdline option. The Python wrapper also replaces CHEF_ROLE with the value of the chef_role cmdline option (which defaults to 'base').

The shell script which makes up the user-data file does the following:

a) Overwrites /etc/hosts with a version that has hardcoded values for chef.mycloud and mysite.com. The chef.mycloud.com box is where I run Chef server, and mysite.com is a machine serving as a download repository for utility scripts.

b) Downloads Eric Hammond's runurl script, which it uses to run other utility scripts.

c) Executes via runurl the script mysite.com/customize/hostname and passes it the real hostname of the machine being launched. The hostname script simply sets the hostname on the machine:

#!/bin/bash
hostname $1
echo $1 > /etc/hostname


d) Executes via runurl the script mysite.com/customize/hosts and passes it 2 arguments: add and self. Here's the hosts script:

#!/bin/bash
if [[ "$1" == "add" ]]; then
IPADDR=`ifconfig | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1}'`
HOSTNAME=`hostname`
sed -i "s/127.0.0.1 localhost.localdomain localhost/127.0.0.1 localhost.localdomain localhost\n$IPADDR $HOSTNAME.mycloud.com $HOSTNAME\n/g" /etc/hosts
fi

What this does is it adds the internal IP of the machine being launched to /etc/hosts and associates it with both the FQDN and the short hostname. The FQDN bit is important for chef configuration purposes. It needs to come before the short form in /etc/hosts. I could have obviously also used DNS, but at bootstrap time I prefer to deal with hardcoded host names for now.

Update 07/22/10

Patrick Lightbody sent me a note saying that it's easier to get the local IP address of the machine by using one of the handy EC2 internal HTTP queries.

If you run "curl -s http://169.254.169.254/latest/meta-data" on any EC2 instance, you'll see a list of variables that you can inspect that way. For the local IP, I modified my script above to use:

IPADDR=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`

e) Finally, and most importantly for this discussion, executes via runurl the script mysite.com/install/chef-client and passes it the actual value of the cmdline argument chef_role. The chef-client script does the heavy lifting in terms of installing and configuring chef-client on the instance being launched. As such, I will describe it in the next step.

3) Installing and configuring chef-client on the newly launched instance

Here is the chef-client script I'm using. The comments are fairly self-explanatory. Because I am passing CHEF_ROLE as its first argument, the script knows which role to associate with the client. It does it by downloading the appropriate chef.${CHEF_ROLE}.json. To follow the example, I have 2 files corresponding to the 2 roles I created on the Chef server.

Here is chef.base.json:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "init",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.mycloud.com"
}
},
"run_list": [ "role[base]" ]
}

The only difference in chef.myapp.json is the run_list, which in this case contains both roles (base and myapp):

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "init",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.mycloud.com"
}
},
"run_list": [ "role[base]", "role[myapp]" ]
}

The chef-client script also downloads the client.rb file which contains information about the Chef server:



log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "http://chef.mycloud.com:4000"

validation_client_name "chef-validator"
validation_key "/etc/chef/validation.pem"
client_key "/etc/chef/client.pem"

file_cache_path "/srv/chef/cache"
pid_file "/var/run/chef/chef-client.pid"

Mixlib::Log::Formatter.show_time = false

Note that the client knows the IP address of chef.mycloud.com because we hardcoded it in /etc/hosts.

The chef-client script also downloads validation.pem, which is an RSA key file used by the Chef server to validate the client upon the initial connection from the client.

The last file downloaded is the init script for launching chef-client automatically upon reboots. I took the liberty of butchering this sample init script and I made it much simpler (see the gist here but beware that it contains paths specific to my environment).

At this point, the client is ready to run this chef-client command which will contact the Chef server (via client.rb), validate itself (via validation.pem), download the recipes associated with the roles specified in chef.json, and run these recipes:

chef-client -j /etc/chef/chef.json -L /var/log/chef.log -l debug

I run the command in debug mode and I specify a log file location (the default output is stdout) so I can tell what's going on if something goes wrong.

That's about it. At this point, the newly launched instance is busy configuring itself via the Chef recipes. Time to sit back and enjoy your automated bootstrap process!

The last lines in chef-client remove the validation.pem file, which is only needed during the client registration, and run chef-client again, this time in the background, via the init script. The process running in the background looks something like this in my case:
/usr/bin/ruby1.8 /usr/bin/chef-client -L /var/log/chef.log -d -j /etc/chef/chef.json -c /etc/chef/client.rb -i 600 -s 30

The -i 600 option means chef-client will contact the Chef server every 600 seconds (plus a random interval given by -s 30) and it will inquire about additions or modifications to the roles it belongs to. If there are new recipes associated with any of the roles, the client will download and run them.

If you want to associate the client to new roles, you can just edit the local file /etc/chef/chef.json and add the new roles to the run_list.

Thursday, July 15, 2010

Tracking and visualizing mail logs with MongoDB and gviz_api

To me, nothing beats a nice dashboard for keeping track of how your infrastructure and your application are doing. At Evite, sending mail is a core part of our business. One thing we need to ensure is that our mail servers are busily humming away, sending mail out to our users. To this end, I built a quick outgoing email tracking tool using MongoDB and pymongo, and I also put together a dashboard visualization of that data using the Google Visualization API via the gviz_api Python module.

Tracking outgoing email from the mail logs with pymongo

Mail logs are sent to a centralized syslog. I have a simple Python script that tails the common mail log file every 5 minutes, counts the lines that conform to a specific regular expression (looking for a specific msgid pattern), then inserts that count into a MongoDB database. Here's the snippet of code that does that:

import datetime
from pymongo import Connection

conn = Connection(host="myhost.example.com")
db = conn.logs
maillogs = db.mail
d = {}
now = datetime.now()
d['insert_time'] = now
d['msg_count'] = msg_count
maillogs.save(d)

I use the pymongo module to open a connection to the host running the mongod daemon, then I declare a database called logs and a collection called maillogs within that database. Note that both the database and the collection are created on the fly in case they don't exist.

I then instantiate a Python dictionary with two keys, insert_time and msg_count. Finally, I use the save method on the maillogs collection to insert the dictionary into the MongoDB logs database. Can't get any easier than this.

Visualizing the outgoing email count with graph_viz

I have another simple Python script which queries the MongoDB logs database for all documents that have been inserted in the last hour. Here's how I do it:


MINUTES_AGO=60
conn = Connection()
db = conn.logs
maillogs = db.mail
now = datetime.datetime.now()
minutes_ago = now + datetime.timedelta(minutes=-MINUTES_AGO)
rows = maillogs.find({'insert_time': {"$gte": minutes_ago}})

As an aside, when querying MongoDB databases that contain documents with timestamp fields, the datetime module will become your intimate friend.

Just remember that you need to pass datetime objects when you put together a pymongo query. In the case above, I use the now() method to get the current timestamp, then I use timedelta with minutes=-60 to get the datetime object corresponding to 'now minus 1 hour'.

The gviz_api module has decent documentation, but it still took me a while to figure out how to use it properly (thanks to my colleague Dan Mesh for being the trailblazer and providing me with some good examples).

I want to graph the timestamps and message counts from the last hour. Using the pymongo query above, I get the documents inserted in MongoDB during the last hour. From that set, I need to generate the data that I am going to pass to gviz_api:


chart_data = []
for row in rows:
insert_time = row['insert_time']
insert_time = insert_time.strftime(%H:%M')
msg_count = int(row['msg_count'])
chart_data.append([insert_time, msg_count])

jschart("Outgoing_mail", chart_data)


In my case, chart_data is a list of lists, each list containing a timestamp and a message count.

I pass the chart_data list to the jschart function, which does the Google Visualization magic:

def jschart(name, chart_data):
description = [
("time", "string"),
("msg_count", "number", "Message count"),
]

data = []
for insert_time, msg_count in chart_data:
data.append((insert_time, msg_count))

# Loading it into gviz_api.DataTable
data_table = gviz_api.DataTable(description)
data_table.LoadData(data)

# Creating a JSON string
json = data_table.ToJSon()

name = "OUTGOING_MAIL"
html = TEMPL % {"title" : name, "json" : json}
open("charts/%s.html" % name, "w").write(html)

The important parts in this function are the description and the data variables. According to the docs, they both need to be of the same type, either dictionary or list. In my case, they're both lists. The description denotes the schema for the data I want to chart. I declare two variables I want to chart, insert_time of type string, and msg_count of type number. For msg_count, I also specify a user-friendly label called 'Message count', which will be displayed in the chart legend.

After constructing the data list based on chart_data, I declare a gviz_api DataTable, I load the data into it, I call the ToJSon method on it to get a JSON string, and finally I fill in a template string, passing it a title for the chart and the JSON data.

The template string is an HTML + Javascript snippet that actually talks to the Google Visualization backend and tells it to create an Area Chart. Click on this gist to view it.

That's it. I run the gviz_api script every 5 minutes via crontab and I generate an HTML file that serves as my dashboard.

I can easily also write a Nagios plugin based on the pymongo query, which would alert me for example if the number of outgoing email messages is too low or too high. It's very easy to write a Nagios plugin by just having a script that exits with 0 for success, 1 for warnings and 2 for critical errors. Here's a quick example, where wlimit is the warning threshold and climit is the critical threshold:


def check_maillogs(wlimit, climit):
# MongoDB
conn = Connection()
db = conn.logs
maillogs = db.mail
now = datetime.datetime.now()
minutes_ago = now + datetime.timedelta(minutes=-MINUTES_AGO)
count = maillogs.find({'insert_time': {"$gte": minutes_ago}}).count()
rc = 0
if count > wlimit:
rc = 1
if count > climit:
rc = 2
print "%d messages sent in the last %d minutes" % (count, MINUTES_AGO)
return rc


Update #1
See Mike Dirolf's comment on how to properly insert and query timestamp-related fields. Basically, use datetime.datetime.utcnow() instead of now() everywhere, and convert to local time zone when displaying.

Update #2
Due to popular demand, here's a screenshot of the chart I generate. Note that the small number of messages is a very, very small percentage of our outgoing mail traffic. I chose to chart it because it's related to some new functionality, and I want to see if we're getting too few or too many messages in that area of the application.

Using AWS CloudWatch Logs and AWS ElasticSearch for log aggregation and visualization

If you run your infrastructure in AWS, then you can use CloudWatch Logs and AWS ElasticSearch + Kibana for log aggregation/searching/visuali...