IT Infrastructure Update Tool Re-design
As an enterprise UX designer, working at a IT infrastructure company means interacting with stakeholders who often pushes for velocity, but bypasses understanding user needs.
Lifecycle Manager (LCM) is a tool that many Nutanix customers use for infra update, but it has also been complained a lot for its instability and legacy design. I initiated this re-design and worked with PM for 6 months on this project. After its release, the team got great customer feedback.
Sole UX/UI Designer
PM and Engineers
Oct, 2018- Mar, 2019
What is LCM?
Companies manage their data in data center that has a certain number of servers (hosts). All these servers needs to be maintained and updated by corporate IT admins every once in a while. Lifecycle Manager (LCM) is such a tool for IT admins to manage and update their environment.
Boot Device (SATA DOM)
SATA disk on a module, the hypervisor boot drive for Nutanix platforms up
A disk server allow users to have access to specific data or be able to store or maintain data on the server
A Network Interface Card is a computer hardware component that allows a computer to connect to a network.
Baseboard management controller, the micro-controller that manages the motherboard
Host bus adapter, a device that manages communication between storage media and other system components
Basic input/output system, the firmware that initializes the motherboard and runtime services on startup
What update does LCM support?
LCM started as a simple firmware update feature that sits within settings of Nutanix’s infrastructure management software Prism. In LCM 2.1, it only supports several firmware component updates.
Since 2018, the engineering team has decoupled themselves from Prism and has its own independent release cycle. By doing this, the team has taken its first step to become a full-fledged enterprise update tool that supports both firmware and software upgrade.
My task is to think about a scalable design solution to support not only firmware update, but also software update that will be added in LCM 2.2.
Defining Design Problems for LCM 2.1
When I first started, no major change request was given to me except a few improvement tickets, such as address customer feedback of having too much white space, dependency information is not clear. These tickets are seemingly simple requests, until I have worked on a few and think based on the current design, it’s quite difficult to make changes as requested.
Lands on second page
Updates of HBA is dependent on certain models of Disks component. This dependency information is displayed on tooltip. User can't see dependency information in the same view.
Customer complain about having too much white space and would like to view more information at a time.
After several explorations and not feeling satisfied with any of them, I realized what makes it difficult is information is architectured on a component version base. It will be a tedious process if a user wants to view information at a host level, which is hidden at a second layer. The fundamental question arises:
"How do users deploy update?"
When a BIOS update is available, do they update BIOS altogether in their environment (current component based view), or do they update BIOS host by host (host based view)?
Understanding the process and needs
User study is the best way for guidance in such an ambiguous state. I created a few mockups showing how update differ between host based view and component based view. I printed out two views as paper prototype, and used them as a prompt in .Next UX lab.
Prototype 1: component based view
Users are notified when there's new version for updates, and click on a component to select certain hosts that they want to update.
Prototype 2: host view
Displays host, and user clicks in each host to select what component needs to be updated.
Feedback from 12 participants show that update behaviors vary. 8 people update their environment at host level, this includes updating everything in a host at once, and updating a certain component in a host, move on to next component till all are update in a host. These admins tend to have a large environment to manage.
Host view is considered a must have for the following reasons:
The biggest concern for IT admins is an update goes wrong and creates downtimes. To avoid this, they tend to update bits by bits, so that if there’s an issue, they can quickly pinpoint the problem.
Stay in Maintenance Window
Many companies limit how long each maintenance window should take. Since updating an host with LCM is long, admins tend not to update all at once in large environment.
Estimate Update Time
Since LCM can’t provide a time estimate, users would update one component on a host to see how long it takes, and plan for their maintenance window.
Design for Scale
A few other people have different update behaviors. Since it’s impossible to cater to everyone, I decided to prioritize design for host view, what I considered as primary user case. However, I also do not intend to dismiss secondary user needs. Iterations led to this combined final view that address user needs and make several other improvements.
The final design highlights the host based information structure. It also allows any user to quickly select what they want to update, regardless of different update behaviors.
Dependency Taken Care Of
New design consolidates everything in one page, which solves the problem of not having a clear idea of which needs to updated together when there’s an auto-selected dependency.
Before and After
Before, software and firmware update is combined in one view. There is no information architecture. This can be a learning curve for new users in terms of understanding what each component is. Same applies to firmware update. Certain component such as disks can have multiple models in a host, but there is no visual hierarchy showing some disks belong to the same model and should be updated to the same version.
Rather than exposing each disks and allow users to update to different version for each disks, I grouped disks that belongs to same model, which encourages them to keep the version consistent within the same model.
Task is re-designed to surface the overall progress, what is going on during an update operation, and if things go wrong, where it goes wrong.
Design for Clarity
Tile view looks modern, but is not the best for readability for data rich content. Also, based on user study and PM conversation, user want more information about an update so they can understand everything to avoid any risk.
Inventory & LCM History
Users often stumbled to quickly tell what version is installed in that environment in support all. Inventory view allows user to identify installed version.
LCM History gives transparency on what update is deploy at a certain time in multi-tenant environment.
Business Impact & Feedback
LCM 2.2 was released in June. While customers have complained that upgrade’s product quality has been going down over the years, 2.2 is a turnaround with engineering effort and the re-design. Support call has reduced 20% since product release
June 11, 2019
Great job with LCM 2.2! I wanted to provide some positive feedback from Exelons’ Dave Starcher... He’s since upgraded to 2.2 and has been very complimentary on the reliability while running inventory.
He also complimented the other feature enhancements, specifically the ability to upgrade a single host at once. The actual quote was “I feel they took all the things I said I wanted and put it in”. This is a great example of engineering listening to the customer.
June 14, 2019
Reason: I got what I was looking for.
I wanted to say thank you for the new LCM 2.2 - I like the new look and feel - with a small button I can even see the versions of NIC drivers and versions/quantity of disks easily! I think it's visually better looking interface than before... I know many people have been working hard on this - it's appreciated!
June 10, 2019
2.2 is very impressive and every customer/prospect that has seen it on slides or when I show them live has had nothing but great things to say about the UI/UX. This new UX plus the increased backend LCM fixes will go a long way.
Although we got great feedback after release, we also look for things that we can further iterate on. Design is never done and there are things that can be always improved. One issue is support team often gets phones calls answering questions such as what is the recommended update order. Although there is already KB articles, there might be discoverability issues. Also, reading a long KB to find a simple answer might be intimidating.
Onboarding & Tooltips
To solve this , we want new users to have an onboarding experience which provides a brief product walkthrough.
Another future strategy is to implement in-product instruction to prevent user from switching context in next major release.
What I've Learned
Take initiatives. In an engineering driven company, design requests are often oversimplified without realizing the real problem. I’ve learned to take initiative to start discussions set goals, define UX roadmap.
Use research to back up design. Internal stakeholders are concerned about changes, even though a product is not well-designed. Having contact with customers and knowing their real needs help me to provide rationale of why we need to change and how this will be an improvement.
Keeping asking why, else root cause of a problem might not appear in user study. People might not tell what they need directly. In user research, the first reaction many people have is that the two views are essentially reaching the same goal. While this is true, it’s important to follow up and ask for preference, and understand why.