blue-green deployment? This involves performing version upgrades by creating VMs with the new version of the application, making a toggle on the load-balancer before destroying the old VMs. Are you kidding?
We have to face the obvious. This approach is devilishly effective. Manual and tedious tasks are reduced, part of the slips due to heterogeneous configurations disappear. Some time is saved and it is possible to easily manage hundreds or even thousands of machines. The automaton becomes your armed wing, it is he who, by construction, laboriously performs all the repetitive operations that you have coded. A small snag however, when you ask him to do a stupid thing, he does it with obedience, on 1000 machines at a time ...
Then comes the moment of questioning the meaning of your action:
These questions begin to accumulate in your head. You also become the owner of a small asset of code that describes your infrastructure and which begins to accumulate discreetly ...
fix_mysql, third age version:
# Editing Ansible template file for MySQL configuration $ vim roles/mysql/templates/my.cnf # Editing Ansible MySQL role $ vim roles/mysql/tasks/main.yml # Launch in dry-run mode to list modifications to apply on all database servers in production $ ansible-playbook -i inv/prod -l db-servers configure.yml --check --diff # Real launch on the platform on all database servers in production $ ansible-playbook -i inv/prod -l db-servers configure.yml --diff -vvv # Ouch, after a manual check, we need to pass on all servers to fix our mistake... $ vim roles/mysql/templates/my.cnf # Launch on the platform on all database servers in production $ ansible-playbook -i inv/prod -l db-servers configure.yml --diff -vvv # Phew, it's fixed, no one has seen it...
Dans cet exemple, un outil (Ansible, un choix parmi tant d’autres) permet de déployer un changement de configuration sur tous les serveurs de base de données. Comme il n’y a pas de garde-fou, si on fait une coquille, celle-ci se déploie sagement sur toutes les machines concernées. Reste alors à vite repasser derrière pour réparer…
In this example, a tool (Ansible, a choice among many others) allows to deploy a configuration change on all database servers. Since there is no guardrail, if you make a mistake, it spreads out wisely on all the machines concerned. And you have to quickly pass over all of them to repair ...
#### Third age : comfort level | |
#### Why it works<br><br>As with the golden images: what worked yesterday will continue to function tomorrow.<br><br>A rolling update is equivalent to setting values in my code.<br><br>The code represents "the truth" of what is supposed to happen in production, everyone can refer to it to make a decision. | #### Why it does not work<br><br>I do not write tests: I have code but I do not know if it works at a given moment.<br><br>I do not trust the idempotence of my code and therefore I can not apply it in production to fix it.<br><br>I modify the machines behind my automaton's back manually and this breaks everything when it runs again |
Concernant la gestion du code que vous donnez à votre automate, une transformation s’opère et elle pourrait être la voie du salut. En allant voir ce qu’il se fait du côté des développeurs d’applications, vous vous rendez compte que c’est un nouveau monde parfois mal-connu qui s’ouvre à vous.
Regarding the code management you feed your automaton, a transformation takes place and it could be the way of salvation. It is done by looking at what application developers do, you realize that it is a new world, sometimes not well known, that opens up to you.
And especially this annoying mania of systematically writing and automatically (and continuously) executing tests to catch all errors (present or future regressions) that could get hidden in your work. A quick tour on the web shows that there is a plethora of these tools to test its infrastructures (bats, serverspec, test-kitchen…).
Developers have implemented a whole ecosystem of tools and practices to do their job.
By talking to them, you realize that they have done this because they aspire to something that resonates deliciously within you. They like to do quality work, make their code beautiful, expressive, maintainable, in short, treat it with the greatest care... They even have a term for it: software craftsmanship.
Fourth age version of fix_mysql:
$ git checkout master # Updating with the main repository $ git pull --rebase # Creating branch for the fix $ git checkout -b amz # Recreating a dev platform from scratch $ ansible-playbook -i inv/amz vm-reinit.yml # Launch on a dev platform $ ansible-playbook -i inv/amz -l db-servers configure.yml --diff -vvv # Editing the MySQL serverspec test to check the new expected behavior $ vim tests/spec/nodetypes/db/mysql_spec.rb # Checking the test has not passed $ ENV=amz rake -f tests/Rakefile spec:mysql # Editing the ansible template file for MySQL configuration $ vim roles/mysql/templates/my.cnf.j2 # Editing the ansible MySQL role $ vim roles/mysql/tasks/main.yml # Checking the ansible MySQL role syntax $ ansible-lint roles/mysql [ANSIBLE0012] Commands should not change things if nothing needs doing /home/amz/projets/trucs/infra-as-code/roles/mysql/tasks/main.yml:17 Task/Handler: Ch{mod,own} file # Ouch, I've made a mistake, ansible-lint has caught it before I launch the command $ vim roles/mysql/tasks/main.yml # Checking the ansible MySQL role $ ansible-lint roles/mysql # This time it works, launching on a dev platform in dry-run mode $ ansible-playbook -i inv/amz -l db-servers configure.yml --check --diff # Launching on a dev platform $ ansible-playbook -i inv/amz -l db-servers configure.yml --diff -vvv # Launching tests that should all pass $ ENV=amz rake -f tests/Rakefile spec:mysql # Checking modified files $ git status # Adding modified files to the next Git commit $ git add tests/spec/nodetypes/db/mysql_spec.rb roles/mysql/templates/my.cnf.j2 roles/mysql/tasks/main.yml # Git commit with item reference enclosed $ git commit -m “#678 ajout de la gestion des key_buffer_size” # Git push in the branch to ask for a merge-request + pair review # The CI/CD platform will deal with it automatically $ git push origin amz # Purging the temporary dev platform $ ansible-playbook -i inv/amz vm-destroy.yml
In this example, the Ansible code is supported by a code best practices analyzer (ansible-lint) and tests (in serverspec) that are written (if possible) before the implementation. The ability to have disposable environments on demand (via test-kitchen or any other solution) makes it possible to validate the changes on an ante-prod environment. The branch and code-review strategy contributes to the overall quality and to the share of the code repository. This is especially the occasion to have the code and the parameterization read again to a newbie on the project or a DBA who will check the settings. A continuous integration platform re-runs all the tests and a promotion mechanism (manual or automatic) rolls out the changes into production.
#### Fourth age : comfort level | |
#### Why it works<br><br>There are several testing levels that contribute to the code quality.<br><br>I make the most of the versioning provided by the SCM to version my whole infrastructure. | #### Why it does not work<br><br>Stop it now, it works. |
Therefore,for the sysadmins / devops, the deployment tool becomes a kind of shepherd dog. It is ultimately the one who becomes the new pet, the one you really need and take the greatest care of. The one whose every line of code is written with love, with all the possible quality since he is the one who allows you to manipulate such a park of servers and applications.
The transformation of the profession of ops is ultimately not a questioning of the intrinsic values of it. It is rather a change in the object they focus on. Instead of becoming attached to machines, it is now about becoming attached to the automaton (and to the code that guides it) which keeps them alive, while retaining the same concern for a job well done, this flavor that give us the pleasure to rise every morning.
What about tomorrow?
You are proud of your work, yet new challenges are coming ahead: cloud (IaaS, auto-scaling, PaaS), containerization (Swarm, Kubernetes ...), Application clusters (MongoDB, Cassandra, Kafka, Spark ... .)… The difference? Even shorter containers lifetime, a cluster mechanism that participates in the livestock life. Finally, no matter, you should be ready: you have implemented all the good practices preparing you for changes (tools and technologies).