As enterprises continue to stockpile massive amounts of information generated by people, businesses, vehicles, and a virtually endless list of other sources, many are wondering where they can store all of that data accessibly, safely, securely, and cost effectively.
The data storage business has changed significantly over the last five years and that transformation is continuing and broadening. The big difference today is that while storage used to be about hardware-related issues, such as solid-state drives, faster read/write speeds, and capacity expansion, the cloud and other storage breakthroughs have flipped the market to the opposite side.
"For most organizations, storage is more about software, including software-defined storage, software managing virtualization, and integrating AI and ML to improve storage optimization,” said Scott Golden a managing director in the enterprise data and analytics practice at global business and technology consulting firm Protiviti.
Here's a quick rundown of five promising storage technologies that can now, or at some point in the foreseeable future, help enterprises cope with growing data storage needs.
1. Data lakes
When it comes to handling and getting value from large data sets, most customers still start with data lakes, but they leverage cloud services and software solutions to get more value from their lakes, Golden said. "Data lakes, like Azure ADL and Amazon’s S3, provide the ability to gather large volumes of structured, semi-structured, and unstructured data and store them in Blobs (Binary Large OBjects] or parquet files for easy retrieval."
2. Data virtualization
Data virtualization allows users to query data across many systems without being forced to copy and replicate data. It also can simplify analytics, make them timelier and more accurate, since users are always querying the latest data at its source. "This means that the data only needs to be stored once, and different views of the data for transactions, analytics, etcetera, ... versus copying and restructuring the data for each use," explained David Linthicum, chief cloud strategy officer at business and technology advisor Deloitte Consulting.
Data virtualization has been around for some time, but with increasing data usage, complexity, and redundancy, the approach is gaining increasing traction. On the downside, data virtualization can be a performance drag if the abstractions, or data mappings, are too complex, requiring extra processing, Linthicum noted. There's also a longer learning curve for developers, often requiring more training.
3. Hyper-converged storage
While not exactly a cutting-edge technology, hyper-converged storage is also being adopted by a growing number of organizations. The technology typically arrives as a component within a hyper-converged infrastructure in which storage is combined with computing and networking in a single system, explained Yan Huang, an assistant professor of business technologies at Carnegie Mellon University's Tepper School of Business.
Huang noted that hyper-converged storage streamlines and simplifies data storage, as well as the processing of the stored data. "It also allows independently scaling computing and storage capacity in a disaggregated way," she said. Another big plus is that enterprises can create a hyper-converged storage solution using the increasingly popular NVMe over Fabrics (NVMe oF) network protocol. "Due to the pandemic, remote working became the new normal," Huang said. "As some organizations make part of their workforce remote permanently, hyper-converged storage is attractive because it is well-suited for remote work."
4. Computational storage
An early-stage technology, computational storage combines storage and processing together, allowing applications to run directly on the storage media. "Computational storage embeds low-power CPUs and ASICs onto the SSD, lowering data access latency by removing the need to move data," said Nick Heudecker, senior director of strategy for technology services provider Cribl.
Computational storage can benefit virtually any data-intensive use case. Observability data sources, such as logs, metrics, traces, and events, dwarf other data sources in most companies, Heudecker noted. Currently, searching for and processing such data becomes a challenge, even at small volume levels. "It's easy to see applications for computational storage in observability, where complex searches are pushed directly to the SSD, lowering latency while also improving performance and carbon efficiency," he observed.
The technology's main drawback is that applications must be rewritten to take advantage of the new model. "It will take time and, before that happens, the space has to mature," Heudecker said. Additionally, the technology is currently dominated by small startups, and standards haven’t emerged, making it difficult to move past early proofs of concept. "If organizations want to get involved, they can follow the work of the Storage Networking Industry Association’s Computational Storage Technical Working Group to monitor the development of standards," he suggested.
5. DNA data storage
Farthest out on the time horizon, yet a potentially game-changing technology, is DNA-based data storage. Synthetic DNA promises unprecedented data storage density. A single gram of DNA can store well over 200PB of data. And that data is durable. "When stored in appropriate conditions, DNA can easily last for 500 years," Heudecker stated.
In DNA data storage, digital bits (0s and 1s) are translated into nucleobase codes, then converted into synthetic DNA (no actual organic bits are used). The DNA is then stored. "If you need to replicate it, you can do this cheaply and easily with PCR (polymerase chain reaction), making millions of copies of data," Heudecker said. When it's time to read it back, existing sequencing technology convert the nucleobases back into 0s and 1s.
In the next step, enzymes are used to process the data in its DNA representation. "Just as computational storage takes the processing to the data, you can introduce enzymes into the DNA data, giving you massive processing parallelization over massive amounts of data," he noted. "The enzymes write new DNA strands as the result, which are then sequenced and converted back into digital data."
DNA data storage also offers the benefit of carbon efficiency. "Because these are all-natural biological processes, there is minimal carbon impact," Heudecker said. The technology's drawbacks, however, are significant. Creating enough synthetic DNA for a meaningful DNA drive is currently prohibitively expensive, but companies such as CATALOG are working on the problem, he noted.
Meanwhile, multiple firms looking to advance DNA storage technology, such as Microsoft, Illumina, and Twist Bioscience, are working hard to make it practical enough for routine use. "I forecast the earliest DNA drives will be available in a cloud delivery model within four years," Heudecker said.
Related Content:
How CDOs Can Build Insight-Driven Organizations
How Data, Analytics & AI Shaped 2020, and Will Impact 2021
A Question for 2021: Where’s My Data?