Sunday, July 26, 2015

Management infrastructure with DNS

You have several tens of servers in data centers in different parts of the world, a few large customers. Different versions of the software and settings for each cluster. This is familiar to many developers and operators. When running with hundreds of servers and network equipment management of the entire infrastructure can be time consuming.

Updating the software taking into account the characteristics of platforms, backup, reaction to incidents, and so on becoming a time consuming process.

To address these challenges ideal configuration management system. Like Chef, Puppet, SaltStack and others. If you are in the company may not use a configuration management system - start.

But in that case, if the infrastructure has been growing for several years. Gradually increase the number of servers. Or for some other reason you do not use the configuration management system. This article talks about how to add structure to your fleet of servers, network equipment and workstations.

Structure

Think about the structure of your company. Which groups of servers perform the same tasks. On what grounds and what groups can merge virtual machines. By type: Dev, Test, Prod. Feature: Vpn-servers, Web-servers, Db-servers. By location: BY, Client-name, Amazon. And so on.

[dev ]     +----+ +------+ +---+
[test] <=> |asia| |docker| |ec2|
[prod]     +----+ +------+ +---+

Then describe the structure and purpose of servers using DNS.
Examples:
vpn1.mts.devel.aptinfo.net
balancer2.us.dev.aptinfo.net
db3.azure.prod.aptinfo.net

app4.vmware.stage.aptinfo.net
sql5.ec2-west.preprod.aptinfo.net
www.aptinfo.net

Try to make the domain name gave the most information about the server as possible.

Convert the DNS structure


From a set of DNS records is easy to get a hierarchical structure.

{
    "domain": "aptinfo.net",
    "records": [
        {
            "content": "10.0.101.223",
            "fqdn": "app1.nl.stage.aptinfo.net",
            "subdomain": "app1.nl.stage",
            "type": "A"
        },
        {
            "content": "10.0.101.224",
            "fqdn": "app2.nl.stage.aptinfo.net",
            "subdomain": "app2.nl.stage",
            "type": "A"
        },

Transform list A, CNAME and other records in an associative array. We separate the domain part and group records by common parts of the subdomain.

'nl.stage': ['app1.nl.stage.aptinfo.net',
             'app2.nl.stage.aptinfo.net',
             'db1.nl.stage.aptinfo.net']


After this easy to get information about a group of servers using commands bash. It is also easy to manage groups of servers using different tools, such as Fabric.

$ fab  -R nl.stage  --  who
[app1.nl.stage.aptinfo.net] Executing task ''
[app1.nl.stage.aptinfo.net] run: who
[app1.nl.stage.aptinfo.net] out: root     pts/1        Jul 23 21:30 (10.50.124.15)
[app1.nl.stage.aptinfo.net] out: 

[app2.nl.stage.aptinfo.net] Executing task ''
[app2.nl.stage.aptinfo.net] run: who
[app2.nl.stage.aptinfo.net] out: root     pts/3        Jul 23 21:30 (10.50.124.15)
[app2.nl.stage.aptinfo.net] out: 

[db1.nl.stage.aptinfo.net] Executing task ''
[db1.nl.stage.aptinfo.net] run: who
No handlers could be found for logger "paramiko.transport"

Fatal error: Error reading SSH protocol banner

Underlying exception:
    Error reading SSH protocol banner

Aborting.
Disconnecting from root@app1.nl.stage.aptinfo.net... done.
Disconnecting from root@app2.nl.stage.aptinfo.net... done.
Error reading SSH protocol banner

Underlying exception:
    Error reading SSH protocol banner

But fabric has disadvantages. First: the program is interrupted if the return code is not equal to 0. The commands Secondly: an interrupt when one of the hosts is not available. For this purpose, more suitable Ansible

$ ansible  us.prod  -i ~/ansible/dynamic.py  -m shell  -a uptime

db2.us.prod.aptinfo.net | success | rc=0 >>
 15:33:38 up 1116 days, 21:55,  2 users,  load average: 0.04, 0.05, 0.07

app2.us.prod.aptinfo.net | success | rc=0 >>
 15:33:46 up 1117 days, 27 min,  2 users,  load average: 0.42, 0.53, 0.53

bckp.us.prod.aptinfo.net | success | rc=0 >>
 15:33:51 up 1152 days, 27 min,  1 user,  load average: 0.08, 0.43, 0.49

app1.us.prod.aptinfo.net | success | rc=0 >>
 15:33:52 up 1006 days,  7:07,  2 users,  load average: 2.19, 2.25, 2.20

Quickly and conveniently.
I hope this article will be useful.