“I wish we had dedicated project resources. I am so busy with operations that I just don’t have time for projects.”
“Why does every IT issue get escalated to my top network and security people?”
“I don’t care if you have enough time. I need this stuff done now.”
“It takes me more time to fill out a change request than to make the actual change”
“We spend over 70% of our time doing operations which only leaves me 10-15% to work on projects after I read my email”
I love the provocative statement that Goldratt made (I paraphrase):
“Technology CAN (not does) provide value IF and only IF it diminishes a business constraint.”
Before you go off emailing me that technology has many other values please reflect deeply on this statement. Please reflect deeply on what business value IT provides.
I love the notion of continuity that the “diminishes” brings to the statement. In other words the constraint must be continually diminished as opposed to the word vanquished.
In order for the constraint to be continually diminished the technology must operate without ceasing or the constraint is re-introduced and the business is forced to deal with the issue once more, usually without any warning.
One could argue that not all applications in service deliver the best business value. But for the purposes of this discussion on IT operations let us assume that everything we are running in our datacenters provides the business with some critical constraint-busting capability.
By doing everything we can to ensure that those systems continue to operate free of interruption we are performing IT operations that support business operations.
The basic definition of Operations is “The act of harvesting value from resources”. More specific to IT I believe that IT operations represent our collective approach (strategy) and tactics (tasks, instructions, and programs) designed to prevent outages and interruptions to the IT services that existing business operations depend on.
Since the whole point of business operations at large is to produce profit (or achieve the mission for you non-profits) from its resources, when we perform IT operations successfully we are protecting business revenue generation. I like to call it “protecting revenue” for short
You may notice that firefighting, support or outages did not appear in my definition or mission of operations.
This is very important.
When we are troubleshooting, or firefighting an outage, operations have ceased and we are now in recovery mode (attempting to recover operations). Even if the issue is service impairment versus a full blown IT black-out we are attempting to recover from the situation and therefore are not in nominal operation. Both scenarios interrupt or affect business operations and can put revenue at risk in many ways.
So if you were really spending 70% of your time in operations..Do you think there would be so much chaos?
Next Post I will talk more about defining and measuring operations and the value it can provide.
When asked to improve, what is ITs obsession with the dream of total automation of everything?
Is it the ultimate exercise in black-and-white thinking? Given the reality of countless failed automation efforts marked by dead-bodies, is there a better middle road?
This basis of my ensuing posteriori-logic-trap becomes especially apparent when one begins to peel away at the thin veneer that obscures the complete failure of IT at large to operate in the realm of the scientific method.
In my dictionary, automation merely consists of defined practices, values and expected outcomes committed to code. But I have found that in the crucial connective fascia between; Project Management, DEV, QA and Ops, IT often lacks substantive specification, accurate documentation which often yields non-deterministic untraceable outcomes (action x and y caused effect z).
Ok, this may be a pretty harsh statement, but one that is borne out of over 20 years in IT, working with hundreds of IT organizations that struggle to accurately articulate either the goal or the definition of IT operations. If nearly every implicit destination defined below those two map coordinates is off by even single digit percentages and we consider the length and breadth of the IT journey, not to mention the height of the weeds we can get caught up in, well let’s just say chronic ITFAIL pain is both aft and on the horizon.
In the great book, Why Smart Executives Fail: And What You Can Learn from Their Mistakes the classic story of GMs “must replace labor with robots” fail is told in brutal hindsight. The lessons are clear but are an order of magnitude harsher when one considers the lesson that history has now taught us. Not only was the money they spent on those automation efforts lost to their deficient consitency of practice (automation just accelerated their rate of failure) but they could have purchased Nissan, Toyota, Honda and maybe even Mazda with the money they squandered on robots.
Only after years of trial and error have manufacturers struck a balance between automation and human involvement. Shigeo Shingo, the first person to document the Toyota Production system author of many amazing books including, Kaizen and the Art of Creative Thinking - The Scientific Thinking Mechanism, formalized this approach by refining the Japanese concept of Jidoka or Autonomation. Simply put, Autonomation is automation with human intelligence. This is the direction we need to explore in IT for our command and control systems.
To help define just where the intersection and labor divisions should best occur there are several Toyota Production system terms that are worth investigating as a path to improving your shop’s performance
Muri - Overburden - Is every day an exercise in futility? The email piles up the issues escalated, phone calls from execs, standing daily or weekly outage conference calls? Is your IT organization behind or stuck on projects? How many of these precious business projects are missing their commitment dates, over budget, under resourced because your team is overwhelmed with unplanned firefighting and drive-bys?
Mura- Inconsistency - Routine tasks and changes are like roulette with a two out of ten ending in unexplained failure that consumes your brightest staff for hours or days? Is patching or upgrading a fearful event which is marked by all knocking on wooden or even wood veneered objects and the presence of a shaman or holy person to ward off evil fail spirits?
Muda -Waste-All of those operating expense dollars lost to firefighting, audit corrective action drive-bys, shadow IT projects, unauthorized changes and root cause analysis meetings that take weeks to recommend the same trifecta of we need more budget, more staff and more time to focus on proactive tasks?
Right now many IT organizations are looking at Muda or waste in order to drive down costs. I posit that understanding Muri and Mura would be a much more valuable use of time and ultimately will reduce waste and increase IT throughput
Inevitably the solutions recommended by IT teams to these issues will involve automation or tools. This is not all bad, but the focus should be on building more deterministic ways of working for humans and considering where automation may help humans interact with their IT infrastructure more consistently.
This process of self reflection or Hansei is important fuel for Kaizen (continuous improvement). It becomes essential to distill all of the undesirable symptoms of overburden, inconsistency and waste and understand the few root causes that drive them all.
Did you know that the “Just In Time’ concept was pioneered by a group of Toyota Employees? These Toyota team members were lead by Taiichi Ono on a trip to the US in the 1950s to visit US auto manufacturers . They journeyed to Michigan and walked through Ford plants and were generally unimpressed by the high amounts of inventory they required to operate and the variance in labor output from day to day.
During the visit they stopped by a Piggly Wiggly grocery store and were amazed by their inventory replenishment system that only requested new items when they were sold. From this focus on Kaizen and Hansei they developed what later became the famous Just-In-Time philosophy that has become a pillar of the Toyota Production System.
It is not merely enough to improve in this economy, we are faced with the imperative of only improving the most important functions so as to quickly improve execution and IT throughput. As we set out on our journeys and investigation of other practices let’s make sure we are attacking true root causes of overburden and rework not just merely their undesirable effects.
I think that the Toyota Production System offers us many valuable insights in to building better IT. I find the thinking behind the system to be more enlightening than the practices. I encourage you to view all TPS, Lean and “Best Practices” in this light. Often the answer to “why” is more important than the “what” IMHO.
I will be writing more about the intersections of TPS and IT. I will be focusing on universal principles, that draw from Goldratt’s Theory of Constraints work, Steven Spear, Taiichi Ohno, Shingeo Shingo, Deming and the 10 years of research Gene Kim and I have done around IT high performance, in the coming weeks.