A growing number of high-throughput, low-latency business data centres have relied on hard disk drives (HDDs) in their servers and are now facing performance bottlenecks. Today, they envisage Solid State Drives (SSDs) as a viable storage solution that can increase the performance, efficiency, and reliability of their data centres while lowering operating costs (OpEx).
To understand the differences between each SSD class, we first need to distinguish between the two key components of an SSD, the Flash Storage Controller and the nonvolatile NAND flash memory used to store data.
In today's market, SSD and NAND flash memory consumption is divided into three main groups:
- Consumer devices (tablets, cameras, mobile phones)
- Client systems (netbook, notebook, ultrabook, AIO, desktop PCs), embedded / commercial (gaming kiosk, purpose-built systems, digital signage)
- Enterprise computing platforms (HPC, data center server)
Choosing the right SSD storage device for a company's data centre can be a tedious learning process, in which a variety of different SSD vendors and product types need to be tested for suitability, as not all SSDs and NAND flash memories are made in the same way.
SSDs are manufactured for easy-to-install replacement or supplementation of magnetic disk-based hard drives (HDD) and come in many different form factors, including 2.5 inches, and with communication protocols / interfaces, including Serial ATA (SATA), Serial Attached SCSI (SAS) and, more recently, PCIe to transfer data to or from the central processing unit (CPU) of a server.
Although SSDs are easy to install, it is not guaranteed that they will all be suitable for the applications the company has selected them for the long term. If SSDs wear out prematurely because they are over-written, sustained write power is significantly lower during their expected lifetime, or they cause extra latency in the storage array and therefore need to be replaced early, the cost of a mis-selected SSD can often negate all of their original cost savings and performance benefits.
To help you decide on your next purchase of spare storage or additional storage for a corporate data centre, this study looks at the three key features that distinguish an enterprise-class SSD from a client-class SSD: Performance, reliability, and endurance.
Performance
By using multi-channel architecture and parallel access from the SSD controller to the NAND flash chips, SSDs can achieve incredibly high read and write speeds for both sequential and random CPU data queries.
The typical scenario of a data centre is the processing of millions of bytes of random company data, including technical CAD drawings and seismic analysis data (e.g., Big Data), or customers worldwide access to banking transactions (e.g., OLTP). Access to the storage devices must be done with the lowest latency, and it may also be necessary for many customers to have access to the same data at the same time, without reducing response times. User experience is based on low latency, which increases user productivity.
A client application affects only one user or application, and the tolerance limit between the minimum and the maximum response time (or latency) for user or system activities is higher.
Mismatched performance can adversely affect complex SSD storage arrays (such as Network Attached Storage, Direct Attached Storage, or Storage Area Network) and wreak havoc on storage array latency, sustained performance, and ultimately, service quality, that is perceived by users.
Unlike client SSDs, enterprise-class SSDs are not only optimized for peak performance in the first few seconds of access but also provide greater stable performance over longer periods of time by using a larger, oversized area (OP). For more information about each drive, visit the Kingston Website under Enterprise SSDs.
This ensures that the performance of the storage array is consistent with the organisation's expected quality of service (QoS), even at peak loads.
Reliability
There are a number of issues associated with NAND flash memory, the two most important ones being limited life expectancy, as NAND flash cells wear out during repeated writes, as well as a normal occurring error rate.
Each NAND flash die is tested by silicon wafers during the manufacturing process of a NAND flash memory and labeled with a bit raw error rate (BER or RBER). The BER defines the rate at which normally occurring bit errors occur in the NAND flash without compensation by the Error Correction Code (ECC) and that the SSD controllers correct with spontaneous Advanced ECC (usually called by different SSD controller manufacturers either BCH ECC, or Strong ECC or LDPC). without interrupting user or system access.
The ability of the SSD controller to correct these bit errors can be interpreted by the Uncorrectable Bit Error Ratio (UBER), "a data corruption rate metric corresponding to the number of data errors per bit read after the usage of certain error correction methods". [1]
As defined and unified by the Industry Standards Association JEDEC in 2010 with the documents JESD218A: Solid State Drive (SSD) Requirements and Endurance Test Method and JESD219: Solid State Drive (SSD) Endurance Workloads, the Enterprise Class differs in a number of ways from the capabilities of client-class SSDs, including, but not limited to, their ability to support higher write utilisation, handle more extreme environmental conditions, and recovery of higher BER than a client SSD. [2] [3]
Application-Class |
Workload (see JESD219) |
Active Usage (switched on) |
Data retention (switched off) |
UBER-Requirements |
Client |
Client |
40° C 8 hours/day |
30° C 1 Year |
≤10 - 15 |
Enterprise |
Enterprise |
55° C 24 hours/day |
40° C 3 Months |
≤10 - 16 |
Table 1 - JESD218A Solid State Drive (SSD) Requirements and Endurance Test Method
Copyright JEDEC. Reprinted with permission from JEDEC.
With the UBER requirements for SSDs proposed by the JEDEC, it is assumed when comparing enterprise SSDs to client SSDs, that with a 1-bit error ratio of 10 quadrillion bits (~ 1.11 petabytes), only 1 unrecoverable bit error occurs at an enterprise SSD, unlike client SSDs, where 1 bit error is processed per 1 quadrillion bits (~ 0.11 petabytes).
Kingston Enterprise SSDs also have additional technologies that allow the recovery of corrupted data blocks with parity data stored in other NAND dies (similar to RAID drives, which allows for the recovery of certain blocks associated with the parity data, that is stored in other blocks, for a rebuild).
To complement the redundant data burst recovery technologies in Kingston Enterprise SSDs, periodic checkpointing, Cyclic Redundancy Check (CRC), and ECC error correction are also implemented in an end-to-end internal backup system to ensure the integrity of the data from the host over the flash and back to the host. End-to-end privacy means that data received from the host is checked for integrity as it is stored in the internal cache of the SSD and when written or rendered by the NAND storage areas.
Similar to improved ECC protection against bit errors in enterprise-class SSDs, SSDs also include power loss detection circuitry that manages the power storage capacitors on the SSDs. Powerfail support in hardware monitors the incoming power to the SSD and temporarily powers the SSD circuits with tantalum capacitors during a surprising power loss to complete internal or external pending writes before the SSDs turn off. Powerfail protection circuits are typically required for applications where data loss is irreversible.
Powerfail protection can also be implemented in the SSD firmware by frequently deleting data in the SSD controller cache areas (eg, its FTranslation layer table) to the NAND memory. While this does not ensure that no data is lost during a power outage, it attempts to minimize the effects of insecure power outages. Firmware power fail protection also ensures that the SSD is unlikely to be inoperable after an insecure shutdown.
In many situations, using Software Defined Storage or server clustering can reduce the need for hardware-based powerfail support because all data is replicated to a separate and stand-alone storage device on a different server or servers. Web-scale data centers often relinquish powerfail support and use software-defined storage on RAID servers to effectively store redundant copies of the same data.
- Kingston Technology
- Uncorrectable Bit-error-rate (UBER)JEDEC dictionary,
- JEDEC Committee JESD218A: JESD218A: Solid State Drive (SSD) Requirements and Endurance Test Method, JEDEC Committee
Text Copyright: Kingston Technology
In the second and final part of this article, we show the differences in endurance between the two SSD classes and give a short summary of the results found.
Enterprise and Client SSDs compared Part 2
As we have discussed the differences between Enterprise SSDs and Consumer SSDs in regards to performance and reliability in the first part of this article, we now continue in showing their differences in endurance. Following will be a short summary of the findings of this study. Let's start with:
Endurance
For all NAND flash memories in flash memory devices, the reliability of storing data bits decreases with each program or erase cycle (P / E) of a NAND flash memory cell until the NAND flash blocks can no longer reliably store data. At this point, a degraded or weak block is moved from the user-addressable storage pool and the logical block address (or LBA) to a new physical address in the NAND flash memory array. A new memory block replaces the bad using the spare block pool that is part of the Over Provisioned (OP) memory on the SSD.
As the cell is constantly being programmed or erased, the BER also increases linearly, and therefore, a complex set of management techniques must be implemented on the enterprise SSD controller to manage cell capacity, thus reliably predicting the expected life of the SSD can be stored. [4]
The P / E lifetime of a particular NAND flash memory can be significantly different, depending on the current lithographic manufacturing process and the type of NAND flash produced.
NAND- Flash Memory Type |
TLC |
MLC |
SLC |
Architecture |
3 Bits per Cell |
2 Bits per Cell |
1 Bit pro Cell |
Capacity |
Highest capacity |
Highest capacity |
Lowest capacity |
Lifespan (P/E) |
Lowest lifespan |
Medium lifespan |
Highest lifespan |
Price |
$ |
$$ |
$$$$ |
Approximate NAND bit error rate (BER) |
10^4 |
10^7 |
10^9 |
Table 2 - NAND flash memory types [5] [6]
Enterprise SSDs also differ from client SSDs in terms of their duty cycle. An enterprise-class SSD must be able to handle high-level read or write activity typical of data center server scenarios that require access to data throughout the 24 hours, every day of the week, as opposed to an SSD the client class, which is typically fully utilized for only 8 hours a day per week. Enterprise SSDs have a 24x7 work cycle, unlike client SSDs that have a 20/80 work cycle (20% of the time active, 80% in standby or sleep mode during computer use).
Understanding the write-resistance of applications or SSDs can be very complex. Therefore, the JEDEC Committee has proposed a lifetime measurement metric that uses the TeraBytes Written (TBW) value to display the amount of raw data that can be written to an SSD before the NAND flash in the SSD starts to store unreliably and should be removed.
The proposed JESD218A test methods and JESD219 enterprise-class workloads by the JEDEC simplify the task of interpreting SSD manufacturer life cycle calculations using TBW and extrapolating a more understandable lifetime measurement that can be applied to data centers.
As noted in documents JESD218 and JESD219, different workloads in the application class may also suffer from a Write Amplification Factor (WAF), which is larger in magnitude than the actual host-supplied writes. This can easily lead to uncontrollable NAND flash wear, over time due to over-description, to higher NAND flash BER and to slower performance due to invalid pages scattered throughout the SSD.
While TBW is an important topic for discussion between SSDs of the enterprise class and the client class, TBW is just a predictive model for the NAND flash lifetime and the Mean Time Between Failure (MTBF) is considered to be the component of the predictive life model and the Reliability based on the reliability of the components used in the device. Expectations of enterprise-grade SSD components include ongoing and tougher work to manage the tensions across all NAND flash memories over the life expectancy of the SSD. All Enterprise SSDs should be rated at least one million hours MTBF, which is more than 114 years! Kingston gives the specifications of his SSDs very conservative, and it is not uncommon to see higher MTBF specifications on SSDs. It's important to note that 1 million hours is more than a good starting point for enterprise SSDs.
With S.M.A.R.T. monitoring and reporting of enterprise-class SSDs, the device can easily query its life expectancy based on the current Write Amplification Factor (WAF) and wear status before failure. Predictive warnings of failures such as a power failure, bit errors occurring on the physical interface, or uneven wear are also commonly supported. The Kingston SSD Manager utility can be downloaded from the Kingston website and used to indicate the status of a drive.
For client-class SSDs, only the minimum SSM.A.R.T. services may be available to monitor the SSD during default use or after a failure.
Depending on the application class and capacity of the SSD, an increased reserve capacity of the NAND flash memory may also be allocated as oversized (OP) reserve capacity. Op Capacity is hidden in user and operating system access and can be used temporarily as a write buffer for higher, sustained performance and as a replacement for defective flash memory cells throughout the life expectancy of the SSD to increase the reliability and longevity of the SSD (with a larger number of spare blocks) to improve.
Summary
The differences between enterprise-class and client-class SSDs are significant, ranging from the lifecycle of their NAND flash memory program and erase cycles to their complex management techniques to accommodate workloads across different application classes.
Understanding these differences in application classes can be an effective tool in minimizing and managing disruptive downtime in the demanding and often mission-critical business environment because it is about performance, reliability, and longevity. For further questions, please contact your Kingston representative or use the "Ask An Expert" or the Tech Support Chat feature on Kingston.com.
[4] JEDEC Committee JESD219: JESD219: Solid State Drive (SSD) Endurance Workloads JEDEC Committee
[5] The Bleak Future of NAND Flash Memory, University of California
[6] 10. Characterization and Error-Correcting Codes for TLC Flash Memories, University of California
[7] NAND Flash Qualification Guideline, California Institute of Technology.
Text Copyright: Kingston Technology