More data is transfered and generated globally than ever before. The analysts from IDC are expecting that by 2025 the global datasphere will grow to 163 zettabytes. That is an increase by more than 1000 % from the 16.1 ZB of data of 2016. The reasons for this vast increase of data are various:
A lot more sources and devices are generating data than before – embedded systems and devices are gathering data and transferring it to Big Data applications and solutions to make real time analysis. The ongoing trend to using mobile devices and social media platforms as well as online shopping and using all kind of applications whenever and wherever is producing lots of data every day. Also, companies are undergoing a transformation to delivering data to their customers to meet their increasing demand for news and real time data never seen before.
According to a new Gartner forecast, more than half of major business processes and systems will incorporate some element of IoT (Internet of things) in their organization by 2020. And with that the amount of data generated, transferred, and analyzed by Big Data applications (and which is to be stored either on premise or off premises), will grow enormously.
Because of this the demand for storage, solutions that will be able to handle and archive more digital content than ever before for a long times has increased drastically by management and representatives from the IT departments.
However, it is not only the bigger amount of storage devices that are needed now from a hardware perspective – such as hard disks, SSDs, or SSHDs for example – but it also requires a proper file system that can handle the outcome of this Big Data growth. That is because of the fact that even when not all the data will be stored on storage devices, the most important data, as well as the analysis results, will be. And this will result in more demand in storage space. Additionally much of this storage demand will be handled both internally as well as in the Cloud using services like Amazon´s S3 or Microsoft Azure by enterprises.
The old concepts of storage with File storage and Block storage will not work for this data growth in the future both for enterprises and for also not for the Cloud providers. The solution for storing these huge amounts of data to come is Object storage (also named Object-based storage). But what are the differences and what makes Object storage to suit the data explosion better than the former concepts?
To understand the benefits Object storage is offering, one has to first look at the older concepts of File storage and Block storage since they differ greatly.
The differences between File, Block, and Object storage
File and Block storage are methods to store data on NAS and SAN storage systems.
On a NAS system, it exposes its storage as a network file system. When devices are attached to a NAS (Network Attached Storage) system a mountable file system is displayed and users can access their files with proper access rights. Because of that a NAS system has to manage user privileges, file locking and other security measures so several users can access files. The access to the NAS is handled via NFS and SMB/CIFS protocols. As with any server or storage solution a file system is responsible for positioning the files in the NAS. This works very well for hundreds of thousands or even millions of files, but not for billions.
Block storage works in a similar way, but unlike file storage where the data is managed on the file level, data is stored in data blocks. Several blocks (for example in a SAN system) build a file. A block consists of an address and the SAN application gets the block, if it makes a SCSI-Request to this address. The storage application decides then were the data blocks are stored inside the system and on what specific disk or storage medium. How the blocks are combined in the end and how to access them decides the storage application. Blocks in a SAN do not have metadata that are related to the storage system or application. In other words: Blocks are data segments without description, association and without an owner to the storage solution. Everything is handled and controlled by the SAN software. Because of that SAN and Block storage is often used for performance hungry applications like data bases or for transactions because the data can be accessed, modified and saved.
Both methods for storing data worked fine for years. So why is there a need for another concept? That is because solutions for both concepts need to implement functionality for user access rights that they can make changes to the data.
What we now see is that much of the data that is being produced is “immured” or unstructured data. Content or material that will never be changed again. And this is where Object storage comes into play:
Objects in Object storage are "bundled data" (aka a file) with corresponding meta data. This object gets a unique ID (identifier), that is calculated out of the file content and the meta data. Applications identify the object via this ID. The many objects inside an object storage system are stored all over the given storage disks. In its pure form object storage can "only" save one version of a file (object). If a user makes a change another version of the same file is stored as a new object. Because of this reason an object storage is a perfect solution for a backup or archive solution. Or, for example, storage that holds vast amounts of video or movies that are only watched but not changed like for example online movie streaming sites or videos on YouTube.
The main difference between the other concepts is that the objects are managed via the application itself that supports Object storage. That means that no real file system is needed here. This layer is obsolete. An application that uses Object storage sends a storage inquiry to the solution where to store the object. The object is then given an address inside the huge storage space and saved there by the application itself.
Because of the much simple management of data – with no real file system in place - Object storage solutions can be scaled up much easier than File storage or Block storage based systems. You just add some disks in the solution and no big management is needed anymore to have more storage space. That´s a main benefit especially in times of exponential data growth.
So Object storage is a perfect solution for huge amounts of data and therefore highly used by big cloud service providers like Amazon, Google and others. But what about data protection and data recovery? The answers to these questions we provide in our second part of this article.
Gabi Schoenemann