IT Infrastructure Update Tool Re-design

Lifecycle Manager

As an enterprise UX designer, working at a IT infrastructure company means interacting with stakeholders who often pushes for velocity, but bypasses understanding user needs.

 

Lifecycle Manager (LCM) is a tool that many Nutanix customers use for infra update, but it has also been complained a lot for its instability and legacy design. I initiated this re-design and worked with PM for 6 months on this project. After its release, the team got great customer feedback.

Company/ 

Nutanix

 

Role / 

Sole UX/UI Designer

 

Collaboration / 

PM and Engineers

 

Year / 

Oct, 2018- Mar, 2019

Context

What is LCM?

Companies manage their data in data center that has a certain number of servers (hosts). All these servers needs to be maintained and updated by corporate IT admins every once in a while. Lifecycle Manager (LCM) is such a tool for IT admins to manage and update their environment.

Boot Device (SATA DOM)

SATA disk on a module, the hypervisor boot drive for Nutanix platforms up 

Disks

A disk server allow users to have access to specific data or be able to store or maintain data on the server

NIC

A Network Interface Card  is a computer hardware component that allows a computer to connect to a network.

BMC

Baseboard management controller, the micro-controller that manages the motherboard

HBA

Host bus adapter, a device that manages communication between storage media and other system components

BIOS

Basic input/output system, the firmware that initializes the motherboard and runtime services on startup

What update does LCM support?

LCM started as a simple firmware update feature that sits within settings of Nutanix’s infrastructure management software Prism. In LCM 2.1, it only supports several firmware component updates.

 

Since 2018, the engineering team has decoupled themselves from Prism and has its own independent release cycle. By doing this, the team has taken its first step to become a full-fledged enterprise update tool that supports both firmware and software upgrade.

My task is to think about a scalable design solution to support not only firmware update, but also software update that will be added in LCM 2.2.

Problem Statement

Defining Design Problems for LCM 2.1

When I first started, no major change request was given to me except a few improvement tickets, such as address customer feedback of having too much white space, dependency information is not clear. These tickets are seemingly simple requests, until I have worked on a few and think based on the current design, it’s quite difficult to make changes as requested. 

Click

Lands on second page

Updates of HBA is dependent on certain models of Disks component. This dependency information is displayed on tooltip. User can't see dependency information in the same view.

 Customer complain about having too much white space and would like to view more information at a time.

After several explorations and not feeling satisfied with any of them, I realized what makes it difficult is information is architectured on a component version base. It will be a tedious process if a user wants to view information at a host level, which is hidden at a second layer. The fundamental question arises:

"How do users deploy update?"

When a BIOS update is available, do they update BIOS altogether in their environment (current component based view), or do they update BIOS host by host (host based view)? 
User Study

Understanding the process and needs

User study is the best way for guidance in such an ambiguous state. I created a few mockups showing how update differ between host based view and component based view. I printed out two views as paper prototype, and used them as a prompt in .Next UX lab.

Prototype 1: component based view

Users are notified when there's new version for updates, and click on a component to select certain hosts that they want to update.

Prototype 2: host view

Displays host, and user clicks in each host to select what component needs to be updated.

Feedback from 12 participants show that update behaviors vary.  8 people update their environment at host level, this includes updating everything in a host at once, and updating a certain component in a host, move on to next component till all are update in a host. These admins tend to have a large environment to manage.

 

Host view is considered a must have for the following reasons:​

Identify Issues

The biggest concern for IT admins is an update goes wrong and creates downtimes. To avoid this, they tend to update bits by bits, so that if there’s an issue, they can quickly pinpoint the problem. 

Stay in Maintenance Window

Many companies limit how long each maintenance window should take. Since updating an host with LCM is long, admins tend not to update all at once in large environment. 

Estimate Update Time

Since LCM can’t provide a time estimate, users would update one component on a host to see how long it takes, and plan for their maintenance window. 
Design

Design for Scale

A few other people have different update behaviors. Since it’s impossible to cater to everyone, I decided to prioritize design for host view, what I considered as primary user case. However, I also do not intend to dismiss secondary user needs. Iterations led to this combined final view that address user needs and make several other improvements.

Quick Select

The final design highlights the host based information structure. It also allows any user to quickly select what they want to update, regardless of different update behaviors. 

Dependency Taken Care Of

New design consolidates everything in one page, which solves the problem of not having a clear idea of which needs to updated together when there’s an auto-selected dependency. 

Before and After

Before, software and firmware update is combined in one view. There is no information architecture. This can be  a learning curve for new users in terms of understanding what each component is. Same applies to firmware update. Certain component such as disks can have multiple models in a host, but there is no visual hierarchy showing some disks belong to the same model and should be updated to the same version.  

Grouped Entities

Rather than exposing each disks and allow users to update to different version for each disks, I grouped disks that belongs to same model, which encourages them to keep the version consistent within the same model. 

Task Re-Design

Task is re-designed to surface the overall progress, what is going on during an update operation, and if things go wrong, where it goes wrong.

Design for Clarity

Tile view looks modern, but is not the best for readability for data rich content. Also, based on user study and PM conversation, user want more information about an update so they can understand everything to avoid any risk. 

Inventory & LCM History

Users often stumbled to quickly tell what version is installed in that environment in support all. Inventory view allows user to identify installed version. 

LCM History gives transparency on what update is deploy at a certain time in multi-tenant environment.

Business Impact & Feedback

LCM 2.2 was released in June. While customers have complained that upgrade’s product quality has been going down over the years, but 2.2 is a turnaround with engineering effort and the re-design. Support call has reduced 20% since product release

June 11, 2019

Great job with LCM 2.2! I wanted to provide some positive feedback from Exelons’ Dave Starcher... He’s since upgraded to 2.2 and has been very complimentary on the reliability while running inventory.

 

He also complimented the other feature enhancements, specifically the ability to upgrade a single host at once. The actual quote was “I feel they took all the things I said I wanted and put it in”. This is a great example of engineering listening to the customer.

June 14, 2019

Rating: 5/5

Reason: I got what I was looking for.

I wanted to say thank you for the new LCM 2.2 - I like the new look and feel - with a small button I can even see the versions of NIC drivers and versions/quantity of disks easily! I think it's visually better looking interface than before... I know many people have been working hard on this - it's appreciated!

June 10, 2019

2.2 is very impressive and every customer/prospect that has seen it on slides or when I show them live has had nothing but great things to say about the UI/UX. This new UX plus the increased backend LCM fixes will go a long way.

Future Strategy

Although we got great feedback after release, we also look for things that we can further iterate on. Design is never done and there are things that can be always improved. One issue is support team often gets phones calls answering questions such as what is the recommended update order. Although there is already KB articles, there might be discoverability issues. Also, reading a long KB to find a simple answer might be intimidating.

Onboarding & Tooltips

To solve this , we want new users to have an onboarding experience which provides a brief product walkthrough.

In-product Instruction

Another future strategy is to implement in-product instruction to prevent user from switching context in next major release.

What I've Learned

  • Take initiatives. In an engineering driven company, design requests are often oversimplified without realizing the real problem. I’ve learned to take initiative to start discussions set goals, define UX roadmap.

  • Use research to back up design. Internal stakeholders are concerned about changes, even though a product is not well-designed. Having contact with customers and knowing their real needs help me to provide rationale of why we need to change and how this will be an improvement. 

 

  • Keeping asking why, else root cause of a problem might not appear in user study. People might not tell what they need directly. In user research, the first reaction many people have is that the two views are essentially reaching the same goal. While this is true, it’s important to follow up and ask for preference, and understand why.

— THANKS FOR WATCHING —

© 2020 by Alison Cheng