Back to archaeologist.html

Solution IDs

Having worked in the land of "tech" for the greater part of the past 15 years I've experienced a great number of theories about how work should be done. Here, I present a small backstory on one such workflow that I enjoyed using and I believe is a benefit to any type of person/team which does their work from an issue-system or ticketing-queue.

My story

When I first worked in tech I worked for a private hosting company. It was a juvenile company with young staff and played most things extremely fast-and-loose. When I was hired - I had been using Linux for about 8 years and FreeBSD for ~2-3 years. I was hired onto evening shifts and worked 12PM-12AM. Another person near my own age worked a very slightly overlapping shift - something like 11PM-7AM.

I often would get issues from customers that I did not know how to deal with. The local customizations to their servers were often not documented and I was not confident to declare problems or make corrections. So during our shift overlap I would ask this other person to help me with some of these issues. "Oh that? Don't worry about it, I'll fix it up tonight." he'd say. True to his word, he usually did just that. I'd eagerly check the issue the next day to see if he'd left company-view-only comments about the debugging process or solution - and to see what he'd told the customer. Unfortunately the ticket-update was usually: "Resolved. Let me know if it happens again."

Breaking the Wheel

Needless to say: I was not pleased with this. How am I to learn from this? I asked my peer to tell me how to fix these things so that the next time I would not need to bother him; but it just never happened.

We hired more people. Never did they learn some of the obscure knowledge from our co-worker. He thought what he was doing made him valuable. He thought the company would want to always keep the tenured person who knew all the fixes.

That co-worker eventually drowned as the company grew: He could no longer hand-fix all of these similar issues himself within the time of his own shift. He needed the help and no one could help him.

He was no longer valuable - because he was not solving new problems. He was having to waste time re-solving the same problems.

Meanwhile, I knew that I am forgetful. Anything I fixed once - whether I could fix it with code (and did) or could not - I wrote. My 'company-view' comments on tickets got to be repetitive for simple tasks. So I instead published how I recognized the problem, what it affected, and how to quick-fix it in my own ~/html/ homedir. The new staff would see tickets and search our ticket system for ones with similar symptoms. They would find my tickets and my comments. Eventually they'd find where I'd just left a link to one of my own notes-pages.

Enter Solution IDs

When a URL could not be found related to a problem some of the newer people would scramble. Eventually they're write their own document similar to mine. This eventually grew into a department wiki. With the high-searchability of a wiki - any of us (including support staff now) could receive an issue, find a problem output/bug/error, search it, and likely find how to fix it quickly. Heck, somethings that were customer-affecting even had statements.

Within the Wiki, we began to assign unique-identifiers to each article. Within the System Administration team, we began to require that any ticket - before being closed status - must have a linked SolutionID. If you fixed something that hadn't yet been documented: You got to document it. By using a wiki - if you used a SolutionID page but it wasn't as clear/ friendly as you'd liked: you could improve it right then and there.

Then what?

As you know, fixing things manually sucks. It is stupid. It is not why we work with computers and code. Unfortunately it was the nature of that company. If something was worth fixing... it was uhh worth fixing a thousand times over again: manually. :(

Fortunately, with httpd log analysation (as well as a built-in counter on the wiki) we were able to see which SolutionIDs were being visited the most. Boom! Input feedback for our improvements queue! By looking at the traffic to our SolutionIDs wiki (as well as the amount of times in a month/week/year that a given ID was reference in ticket-resolutions) we were able to see where was best to have developers and administrators invest real-time to actually squash-bugs.

How's this really work?

If you fix something: You document it. Whether it was a manual fix (rm /stupidhugefilefillingthedisk) or whether you fixed it in code: You document it. Not just "I removed this for you." Not just "git commit -m ' added handler to remove /stupidhugefilefillingthedisk after nightly job has finished'". How did you know this was your problem? How does someone know they're on the right article to follow? What are the symptoms? How did you fix it? Do you think your fix was short-term and will have to be repeated by others or do you think it should be a one-time thing? Any information will help the future.

After you document it - correlate it with the issue/ticket. Link the URL, mention the unique-ID of the page, something so that another person can find out and clearly know what you did.

Require every issue/ticket to have such an ID linked to it. This necessarily sucks at first; however, the more you have - the more times you just have to search-and-link instead of authoring a new document. Also, you'll find that creating the document is extremely fast.

Set up a feedback loop to re-analyze the decisions that were made and use it as input for future improvements/fixes. How many things that were marked as a quickfix ever needed to be run again? How many things that were flagged as a permanent/code fix weren't? How often are your employees visiting particular SolutionIDs?

Open new issues (either at some type of schedule - or any time that a solution is not marked as permanent/bug killed) to improve/fix the most impactful issues.

So, document stuff and make others do it?

Yes, but document it with the intent of continuous improvement. Document it with the intention of some future person who doesn't know you needing to follow it. Document it as if you'll acquire amnesia and be required to fix it again yourself. Don't waste time fixing things if they are decidedly small and without ill effects. Do ensure the biggest time wasters are fixed urgently.

The Pay-off

New employees can be your greatest resource. They are not yet jaded by company politics and are usually more optimistic about the future than anyone else you have. By giving them a way to help on almost any issue (after all, they can write the "How do I identify this problem" portion of an article - even if they are unable to write the solution) it gives them a great way to get involved (and fast!). It gives them fantastic insight into past problems - as well as who to discuss similar problems with (hint: take a look at that wiki page's contributors!).

For your senior employees: Now they don't have to repeat the same story again. When they do interact with newer employees they can have more meaningful conversations.

For management, project managers, etc: What a great way to not waste time re-solving problems, while also knowing where the efforts of your team are best invested to get the most return-on-investment/time!

For your career: Never again forget how you fixed something. Be able to grow and go on to fix new things - instead of spending more time re-fixing the old. Make yourself valuable for what you are going to accomplish for your employer: Not for what you've already done.