This article is about a technical note from Micron regarding the proper calculation of write amplification factor on SSDs (Solid State Drives) using S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology). This short paper may be useful if you would like to better understand how SMART disk values can be used to estimate a SSD's current risk of failure.
If you're not familiar with Write Amplification Factor (WAF), it is the measurement of a type of wear on NAND based SSD devices. Higher WAFs indicate a SSD will fail sooner. Accurately calculating a disk's WAF impacts how long a warranty a manufacturer will feel comfortable with, and is a good indicator of how reliable a particular SSD is. The S.M.A.R.T. disk monitoring tool uses IDs 247 and 248 to keep track of WAF and its impact on a SSD. This technical brief discusses how to come up with a realistic algorithm using those SMART values to determine when a SSD is at increased risk of failure.
WAF values must be correlated with detailed knowledge of an SSD's characteristics in order for them to be useful in creating algorithms capable of accurately predicting potential SSD failure risk.
Basically, NAND memory cells have a limited life-expectancy. Whenever a NAND block is over-written, a tiny portion of the electrical integrity of its is reduced. Over time, this results in a wearing down of the NAND cells' ability to retain correct information. Eventually, one or more cells in the NAND begin to have trouble retaining an active state, and data loss occurs. There are checksum processes within the SSD itself that monitor for this situation. When it it detected, the NAND block in question is marked as bad, its data is relocated, and that block of memory is now effectively dead.
What is Write Amplification (WA)?
Write Amplification is a multiple of physical disk writes to a logical amount of data. Lower ratios are better, but can never be less than 1.
Write Amplification refers to a process where there is more physical wear of these electrical cells over time than there is actual logical data being preserved. This is normal, and it is because a design aspect of SSDs is to copy and move data electronically around the entire SSD for the purpose of improving access (read) efficiency and data integrity. A WAF of 1.0 doesn't exist in the wild, but theoretically it would mean a direct correlation between logical writes (data) and physical writes to a SSD disc.
Imagine a file is re-saved, and the size of the file increases compared to the previous save. On a hard drive, the new data would be placed in the next free space on the drive. This results in a state called fragmentation, which is when a file becomes dispersed or "fragmented" across a physical disk, ultimately yielding less-than-optimal read/write times as the file is sequentially read or updated. New data is placed at the end, where there is sufficient free space. A hard drive is more efficient when the data is physically grouped together.
SSDs on-the-other-hand are able to combat this problem on-the-fly by continuously reorganizing a file's data into sequential bits. This isn't really done for the purpose of speed, as it makes no difference. The SSD reads data at the same rate, regardless of where it is physically located on the disc. However, for other reasons related to reliability, SSDs do move data around frequently.
It's a bit of a chicken-and-egg problem. In order to ensure the overall longevity of the device as a whole, write operations need to be distributed evenly across the NAND blocks on a SSD. However, when files are accessed frequently - such as system files and RAM disk files - if those files were allowed to use the same NAND (overwrite the same physical NAND blocks the prior file was stored on), they would exhaust those particular cells much rapidly than the remaining cells on the disk. In order to combat this problem, data is moved around all over the disk, instead of concentrating it in one spot. This creates the effect of averaging out those write operations across the entire disk, thereby increasing its longevity as a whole.