What We Learned about Information Technology Management
By Eser Kandogan, Paul P. Maglio, Eben M. Haber & John Bailey
Information technology (IT) customers now spend more money on people to manage systems than they spend for hardware and software. To understand why, we conducted ethnographic studies, observing a variety of IT service operations, examining collaboration and communication, social and technical complexity, standard and ad hoc practices, standards and processes, organizations, and communities of practice. We found that to cope with the complexity and idiosyncrasies of modern IT systems, administrators (1) coordinate activities among multiple specialists, systems, and organizations; and (2) build specialized tools, practices, and organizational structures custom fit to specific circumstances to support their work and improve productivity.
*The article is an excerpt from Kandogan, E, Maglio, P. P., Haber, E. M., & Bailey, J. (2012). “Taming Information Technology: Lessons from Studies of System Administrators.” New York: Oxford University Press. Used by permission.
On a hot July day, in windowless conference room, in a huge IT delivery center in Boulder, Colorado, ten system administrators who flew in from all over the country worked together along with many others on the phone and in instant messaging chat rooms. They were troubleshooting a stubborn and intermittent failure in a sales-support web application that was critical to an important customer. No one had any idea what was wrong. They had been there for several days already. They huddled together over their laptops, discussing theories, conjectures, and ideas — hoping they could discover the problem, fix it, and go home. Yet this continued for several weeks more, until the problem was fixed to the customer’s satisfaction. And we were there, watching this group tackle a critical situation or crit-sit, an “all hands on deck” sort of process for solving a difficult customer problem.
“IT staff costs make up more than two thirds of the total cost of ownership of IT, and that share continues to grow every year, largely because of disproportionate gains in computer speed and capacity compared to labor and productivity.”
After witnessing many scenes like this, it is no longer any surprise to us that information technology (IT) customers today spend more money for people to manage computer systems than they spend for the hardware and software alone. IT staff costs make up more than two thirds of the total cost of ownership of IT, and that share continues to grow every year,1 largely because of disproportionate gains in computer speed and capacity compared to labor and productivity. Moreover, modern IT systems are so complex that no single individual can possibly have a complete picture of how all the pieces fit together, not even those who manage them.2 This growing system complexity increases the total cost of ownership as well, with more components interacting in more ways, more configuration settings, and more things to go wrong. When things do go wrong, costs can add up to millions in lost revenue and make news headlines, and in the end people are often blamed.3
“Throughout our research, we found people coping with complex systems, and in most cases they coped effectively… People are creative, they adapt to changes, build their own tools, discover ways to overcome problems, and organize themselves to be more effective.”
We started with some basic questions: What do the people who manage IT systems actually do? Where do they spend time? How can we help them to be more efficient and effective?
To find answers, we observed computer system administrators in their natural environments. Equipped with camcorders, cameras, tapes, computers, and notebooks, we spent a total of two months over five years watching the activities, processes, and practices of real system administrators at six different sites in the U. S. We were doing ethnography. Not the kind that describes a foreign culture in some remote part of the world. We were studying tribes of IT specialists, the database administrators, web administrators, system security experts, storage designers, infrastructure architects, system operators — whatever their specific titles, we refer to them all as sysadmins. We were not cultural anthropologists, but ethnographic research was exactly what we needed to do. Using methods from ethnography and ethnomethodology, our goal was to develop a “thick description,” an understanding of practices, traditions, and ways of life within their social and historical context, by examining how tools, practices, processes, and organizations work and evolve.4
Our study of system administrators was inspired not only by the importance of information technology management and its associated high cost, but also by two other important trends. First, sysadmin work is increasingly done in the context of IT service delivery, in which IT management is done by a service provider for a client based on a formal business agreement.5 Service providers are very focused on understanding and improving practices to increase quality and reduce costs. Second, automation is increasingly used to reduce the human costs of IT management,6 but an understanding of the tasks to be automated is needed to know whether automation is feasible and then how to go about automating tasks effectively.7
We studied IT management at a large IT service provider, a large university computing center, and a large government data center. In all, we made 16 field visits, observing and interviewing more than 30 administrators, operators, team leads, and managers.
Our book, Taming Information Technology, chronicles our experiences, our stories, and ultimately what we learned.8 Our analyses are based on (1) how people work together and with technologies in grounding their understanding and in coordinating their actions and activities,9 (2) how constellations of people and technologies together comprise the cognitive systems capable of managing complex information technology,10 and (3) how the tools and techniques people develop evolve incrementally to create structures that support effective and productive work in a complex and continually changing environment.11
The stories described in the book include examples of planning, deployment, monitoring, and troubleshooting complex systems. They show social structure, historical context, interaction, collaboration, and communication among administrators. They expose the tremendous scale and complexity of modern IT systems, the practices that administrators adopt to cope, and the consequences when administrators fail to grasp how their systems really work. They demonstrate the use of various management tools — automated and manual, off-the-shelf and self-created. They show the proliferation of custom tools, which highlights the mismatch between many off-the-shelf tools and the needs of administrators. They provide evidence of how tools are shared and evolve over time. They describe both standard processes and ad hoc procedures, which many administrators develop and adapt over time to improve efficiency. They describe the organizations in which IT staff work, interactions and friction between groups, and informal roles and organizational structures people invented over time to get work done more effectively. They also describe broad communities and how system administrators work and collaborate beyond organizational boundaries. Throughout, we found people coping with complex systems, and in most cases they coped effectively. That is what people do after all. People are creative, they adapt to changes, build their own tools, discover ways to overcome problems, and organize themselves to be more effective.
Yet the ever increasing human-cost of IT systems management suggests that they may not be managing well enough. We think real productivity gains could be achieved if this ecosystem of people and technology were more broadly understood by designers, managers, and others with an interest in IT, and we hope our results will help. Here, we summarize just a few of the observations and lessons from the book.
The book tells eleven stories drawn from our observations of real sysadmins at work. These include the story of George, a web administrator who worked closely with many others in the course of debugging a new web server. This story demonstrates the importance of collaboration, communication, coordination, and situational awareness as different people brought together their expertise in an attempt to solve a difficult problem. It also highlights the need for trust and understanding; we show how it was not sufficient to find a solution to the problem. George had to trust his collaborators to develop an understanding of the problem and of its solution. We see similar issues of collaboration arising in the story of Dot, which shows the numerous steps and pitfalls involved in deploying a “simple” web application in an enterprise environment, highlighting not only the complexity involved, but also coping strategies used to integrate information from systems and other people.
In the story of a critical situation, we described in the beginning, a large group of experts were brought together in a single room to solve an intermittent, unpredictable web application failure. We describe how an ad hoc team spent many weeks together, trying to understand the subtle interactions between system components, collecting and exchanging information and ideas, and reconfiguring different components. This story demonstrates the heights of technical complexity, in which a problem could be so subtle as to require months of effort to solve. It also shows many techniques people use to under- stand such problems, and provides some evidence as to why the human cost of IT management keeps increasing.
In another story, database administrators Christine and Mike considered data loss to be the worst possible disaster, so they and their group developed a number of practices to avoid this, including keeping a central repository of step-by-step instruction documents for important operations, rehearsing all significant changes on a series of increasingly realistic test systems, and working side-by-side at the most critical moments to ensure a second pair of eyes oversaw those steps.
“Vendor-provided tools are often insufficient given the complex and idiosyncratic nature of many IT systems, and so administrators use their own creativity to fill the gaps.”
In several stories, we see that vendor-provided tools are often insufficient given the complex and idiosyncratic nature of many IT systems, and so administrators use their own creativity to fill the gaps. For example, Shawn, an operating system administrator responsible for keeping 120 Unix servers up-to-date with appropriate patches, relied heavily on home-grown tools and methods for coordinating his team’s activities and interacting with the client. Diana and Mark, storage administrators at a large government facility, created numerous custom tools as part of managing a massive robotic data tape repository.
System administration is more than just the work of individuals — it requires groups to work closely together and with other groups to effectively manage complex systems. The story of Henry and Ryan, who work the operations and architecture groups in a managed storage service organization, examines how the groups were organized internally and how their practices made for effective interaction with other groups and clients, showing the importance of organizational bridges, people who translate and transform information between groups. In the story of Joe and Aaron, security administrators at a large university data center, we see formal and informal collaboration between departments in the university, ad hoc collaboration between different universities facing a widespread security incident, and global communities that collectively maintain information sites and open-source tools.
“System administration is more than just the work of individuals — it requires groups to work closely together and with other groups to effectively manage complex systems.”
What We Learned
Our observations describe highly complex, large-scale, and often idiosyncratic environments in which people perform lengthy and risky operations given dynamic requirements and dynamic configurations. System administrators cope through specialization, innovation in tools and practices, and standardization. But these coping strategies interact: (1) technical complexity leads to specialization, which demands communication and coordination between individuals and teams, as workers and organizations spend considerable time establishing common ground with each other and with their systems; (2) technical complexity, technical change, and idiosyncratic systems demand innovations from a variety of actors, with system administrators creating tools and practices optimized for their own environments, communities of practice developing broadly applicable tools and best practices, and organizations engaging in standardization; and (3) information technology management practices — tools, organizations, policies, standards — and information technologies co-evolve, driven by individual and organizational needs, incrementally adapting and institutionalizing those adaptations that enable effective management as technology and business change.
In the end, we argue that effective management lies in supporting improved communication and grounding between individuals and organizations, and in establishing an ecosystem in which evolution of tools, practices, and organizations can flourish, where local innovations can grow into community standards and where standards can be adapted to local needs. System administration depends on (1) collaboration among people, (2) adaptation of tools and practices, (3) orchestration of information and work across space and time, (4) communities of practice working together, and (5) automation that is appropriate to the human work of IT management.
The book tells stories of people, the system administrators — about how individuals, groups, organizations, and communities work to configure, troubleshoot, optimize, and protect the computer infrastructure on which modern society depends. These stories show that system administrators work in a socially and technically complex environment. These stories also show how individuals, organizations, and communities create, adopt, and evolve tools and practices over time in a never-ending quest to improve the effectiveness of the complex socio-technical systems of which they are part. System administrators are knowledge workers, critical actors within their complex social and technical systems. They cope as best as they can with the complexity they face, improving themselves and their environments given the resources at hand. More precisely, across different types of system administrators and different contexts of work, we found that (1) systems are complex, administration roles are specialized, and system administrators are always involved in a process of grounding to achieve a common understanding of the state of their systems; (2) IT systems are idiosyncratic combinations of many interacting components, meaning appropriate tools for managing whole systems are often not available and requiring sysadmins to innovate, creating their own tools, practices, and organizational structures to support grounding and improve productivity; and (3) automation is a critical tool for helping system administrators manage the ever-increasing complexity of IT systems, but system administrators are needed to ensure that IT systems achieve human ends.
About the Authors
Eser Kandogan is a research staff member at IBM Research-Almaden and manages a group conducting research on visual interfaces to data. He served as the general chair and program chair for ACM CHIMIT symposium and was a member of the program committee for several conferences including ACM CHI, USENIX LISA, and IEEE Policy. He holds a B.Sc. degree in computer engineering and information sciences from Bilkent University, Turkey and a Ph.D. degree from University of Maryland, Computer Science Department. Dr. Kandogan has over 50 publications in areas such as human-computer interaction and information visualization.
Paul P. Maglio is a research staff member at IBM Research–Almaden, and a professor of technology management at the University of California, Merced. He holds a bachelor’s degree in computer science and engineering from MIT and a Ph.D. in cognitive science from the University of California, San Diego. One of the founders of the field of service science, Dr. Maglio serves on the editorial boards of the Journal of Service Research and of Service Science, and was lead editor of the Handbook of Service Science. He has published more than 100 papers in computer science, cognitive science, and service science.
Eben Haber is a research staff member at IBM Research-Almaden, where he has worked on topics including IT System Administration (including studies of sysadmins, developing prototype administration tools, and designing new features for middleware management products), as well as research on end-user programming and information visualization. He holds an A.B. in computer science/physics from Dartmouth College, and an M.S. and Ph.D. in computer science from the University of Wisconsin-Madison. As the only person on earth so named, a wealth of additional information about him can be found using any web search engine.
John Bailey is a Director of Product Design at CA Technologies, where he creates leading-edge product user experiences for management of information technology. Previously, Dr Bailey was a research scientist at IBM Research-Almaden, working on service systems and specializing in human factors in information technology service engagement and delivery. Prior to that, he was Lead User Experience Architect for IBM WebSphere Application Server and manager for user-centered design. He holds a Ph.D. in Human Factors Psychology from the University of Central Florida, and has published virtual reality, human-computer interaction, automation, simulation and training, systems administration, and service science.
1.Bozman, J. S. & Perry, R. (2010). The business value of large- scale server consolidation. IDC Whitepaper. Retrieved from http:// www-05.ibm.com/innovation/uk/leadership/meter/capabilities_ support/ documents/IDC_White_paper.pdf
2.Barrett, R. (2004, June). People and policies: Transforming the hu- man-computer partnership. In Proceedings of the 5th IEEE International Workshop on Policies for Distributed Systems and Networks (Policy 2004) (pp. 111-114).
3.Stone, B. (2008, July 6). As web traffic grows crashes take bigger toll. New York Times. Retrieved from http://www.nytimes. com/2008/07/06/technology/06outage.html
4.Geertz, C. (1973). Thick description: Toward and interpretive theory of culture. In The interpretation of cultures: Selected essays (pp. 3-30). New York, NY: Basic Books.
5.BusinessWeek (2006, January 30). The future of outsourcing. BusinessWeek. Retrieved from http://www.businessweek.com/ magazine/content/06_05/b3969401.htm
6.IBM. (2001). Autonomic Computing: IBM’s Perspective on the State of Information Technology. IBM. Retrieved from http://www. research.ibm.com/autonomic/manifesto/autonomic_computing.pdf
7.Brown, A. B., & Hellerstein, J. L. (2005). Reducing the cost of IT operations – Is automation always the answer? In Proceedings of the 10th conference on Hot Topics in Operating Systems, 10, USENIX Association.
8.Kandogan, E, Maglio, P. P., Haber, E. M., & Bailey, J. (2012). Taming Information Technology: Lessons from Studies of System Administrators. New York: Oxford University Press.
9.Clark H. H. (1996). Using language. Cambridge, England: Cambridge University Press.
10.Hutchins, E. (1996). Cognition in the wild. Cambridge, MA: MIT Press.
11.Arthur, W. B. (2009). The nature of technology: What it is and how it evolves. Free Press.