Just One Tiny Fix

Imagine a time when you just had to make a change. You know the one, the really small one, that fixes that very special issue… on a production environment.

One may think of a thousand reasons not to. Using continuous integration, and properly separated yet perfectly similar environments might come at the top of the list.

That works very well when you come to a greenfield project and are starting with that. Or in that case where the change has been brought about, and environments are really comparable, including not only the code but also the data (both quantity and quality). But maybe you just need to make a change now and here, because that’s the last question to win a hypothetical million. Or, more realistically, you believe you will be able to go to sleep on a Friday night proud of your achievement.

Reasoning with oneself

Sometimes one thinks that it’s best to take the path of a lone wolf, to capture and fix a problem which only happens in production. Sometimes, even when multiple environments are set-up, there are cases somewhere between hard and impossible to replicate. Such might be rare race conditions (history knows of some taking years to identify and fix, take a look here, or here), or minute data differences (might actually be terabytes of data in the difference). First and foremost — please close this article now if you fall into this category. You are better off using some heuristics to figure out where to put logging, and going the path of code reviews, deployments, et. al.

Another seemingly good reason is to fix the issue happening here and now, affecting the system in a significant way. And maybe it just happens so that the knowledge of version control, verification and validation techniques, as well as collaboration with peers suddenly escapes our mind. What follows is the advice for those blessed souls, fixing the system on a proverbial Christmas night, from their parent’s basement, tethering over the phone using last percents of the battery power.

Protecting against oneself

There is usually a long chain of choices made throughout all layers of organisation, which led to the situation where somebody is making a manual change. The size of the company or its (technological) maturity level doesn’t protect against it. Browse through a list of postmortems published by major cloud providers, and rapidly growing startups. There will be a message about a typo, or some other “human” factor causing an outage. The real answer is “lack of tooling and automation”. But hopefully the readers are just PHP developers, who need to fix that one bug, so some precautions may be taken.

Make sure you have a handy backup of any code you change. If any version control tool is available on the server — start by adding the live code to a version control. In most cases that’s a very quick operation. And then, if you realise you have changed 20 files instead of 2, you can easily find and revert them all, after the system is completely broken down by a fix. In case of a git, it’s just git init once inside the code directory.

But maybe the server has a very limited set of tools. That’s when it’s time to remember how to use rsync (rsync -a /var/www/production /opt/backup/production-as-it-was-on-the-4rd-of-july). Or at least copy (cp -R /var/www/website /var/backup/working-website).

Any of the above takes an inconceivably little amount of time, and yet may save a lot of precious time if events take a turn for the worse.

Now, this article is about changing the code on production. It goes without saying, that if you are planning to also run SQL queries — a database backup is the first step. Then making sure you haven’t left out WHERE clause from that DELETE or UPDATE.

Getting back to oneself

The magic of some of the interpreted languages, such as PHP, and their biggest benefit — in the majority of deployments you have nothing implicitly shared and persisting in between requests. So you can write to the global scope (let’s admit — we are so far down the dark path, that this would be merely a misdemeanour), if you wish, and no other transaction will be affected.

With that in mind, it’s a really good idea to start with writing a function for oneself:

function is_it_me() {
    Return '192.0.2.1' === $_SERVER['REMOTE_ADDR'];
}

Please note that the IP address above is an example, please replace with yours when testing. The function is trivial, and certainly could be improved. Consider verifying some HTTP header, which you could manually set in the browser before the request is sent, to turn on your new functionality. But beware, keep it simple, and if you find yourself considering adding it to your core library set — it’s likely you are not even trying to make a change for the better.

With some simple workarounds, one may start the change. In the place where the issue is likely to originate, you may add a smart change:
If (is_it_me()) { require 'path/to/new/code.php'; }

And here, even if you make a mistake, mistype, or whatnot inside that new code — you know it will affect only you. You know your system, your code, and what to do. And you have the safeguards which will allow to return to where it all began, if it needs be. Or remember what was done, come Monday.

Proud of your fix

Hopefully, all the steps above allow you to finish the experiment, identify the bug, and get back to the table where your significant other was silently fuming about your work habits.

There are a few important takeaways here.

The first one is obvious. Hopefully, you will never have to use any of this. If you find yourself on a perfectly normal day reading through this article and thinking that it must be added into a contingency plan — please don’t do that. It’s better to start working on that test environment you were thinking of since last year. Or automating the deployments, after all.

The second one is even better. If some of this advice has helped you, or if you can relate to them — remember to identify the real reason in the postmortem. The reason wasn’t a mistype you made. The reason wasn’t something you forgot to add. The reason was that the lack of processes and tools led you to try and fix something. And what you need are the tools and processes.

It’s great if you have successfully made a change in production, and without harm, you have saved the company a good chunk of money. That might make you better understand why superheroes seem to wear PJs, or underwear on top of their suit. Now it’s time to rest. There will be a hopefully fruitful discussion next week, which will lead to some decisions paving the way for fewer sleepless nights. And Modus is always here to help — with tooling, processes, and ideas.

Posted in DevOps

Justas Butkus

Justas Butkus was a software engineer with over ten years of experience in a multitude of different fields. His interests include solving problems in areas where security and stability are of uttermost importance, helping teams organise, and figuring out what might be the next problem best resolved.

Reasoning with oneself

Protecting against oneself

Getting back to oneself

Proud of your fix

Justas Butkus

Related Posts

Want more insights to fuel your innovation efforts?