Estimated reading time: 5 minute
More and more research areas now require HPC. Access to scalable computing is required, and so is high performance storage. Despite the rapid growth in HPC, scientific storage remains a bottleneck to discovery. Commonly, price is the primary determining factor when purchasing HPC storage. But several other key metrics must be met for HPC storage systems.
Reliability – The importance of file system reliability cannot be overstated. Uptime is key to meeting project timetables, reducing staff frustration, and research continuity.
Scalability – Storage must scale in capacity, performance, and data protection.
Performance – The performance level must meet researcher needs. While some file systems may offer scalability, the performance level may not meet the demands of computational research.
Without meeting these needs, saving money upfront ends up costing research long term.
How Storage is Growing
Data storage needs in the life sciences continue to grow, in capacity, performance, and reliability. Inability to provide consistent uptime in the face of these needs threatens the success of important research experiments.
The fact is, in most cases the high maintenance requirements and regular service interruptions continue to get in the way of research progress. In some cases, system downtime and poor performance doesn’t just interfere with project timetables, it can actually delay time to a cure.
NIH BioWulf cluster active users per month
In the past 5 years, monthly users have nearly tripled
Pain Points in HPC Storage
Over the years, users got used to the fact that HPC storage deployments were notoriously hard to manage. Organizations had to devote considerable resources in order to employ people who could master the intricacies involved in operating these complicated storage systems. But that’s not a scalable model.
We can no longer assume that HPC data center managers will be ready – or able – to expend time, money, and staff to buy and maintain clunky, complex HPC storage systems.
Change is needed in the HPC storage industry. Researchers are using HPC in a growing number of disciplines, yet storage downtime remains a key issue. Consider these findings from a Hyperion survey of data managers commissioned by Panasas:
Nearly 50% of the respondents experienced storage system failures once a month, with users coming to expect downtime as the norm in HPC storage.
After a system failure, 40% of HPC sites typically require more than two days to restore their storage system to full functionality.
More than 75% of respondents experienced reduced productivity in the past year due to storage related issues. 12% of sites experienced this more than 10 times in the past 12 months.
Some outages lead to downtimes that last as long as a week. A single day of downtime costs can range from $100,000 to more than $1 million.
The most common challenges for HPC storage operations are recruiting and hiring qualified staff, followed by the time and cost needed to tune and optimize the storage systems.