Introduction to ECC Memory (Error-Correcting Code Memory) [MiniTool Wiki]
Introduction to ECC Memory
What is ECC memory? ECC memory is short for error-correcting code memory. As a type of computer data storage, it can detect and correct the most common kinds of internal data corruption. Keep reading, and this post from MiniTool will tell you a lot of information about ECC memory.
ECC memory is used for most computers that cannot tolerate data corruption under any circumstances, such as scientific or financial computing.
Generally, ECC memory can maintain a memory system that is not affected by sing-bit errors: even if one of the bits actually has been flipped to each word, the data read from each word is always the same as the data that has been written to that word.
Although some non-ECC memory with parity support allows detection but cannot be corrected, most non-ECC memory cannot detect errors. ECC prevents undetected memory data corruption and it also reduces the number of crashes.
Research of ECC Memory
The work published between 2007 and 2009 represented that the error rates vary widely, more than 7 orders of magnitude, ranging from 10-10 error/bit·h (about 1 bit error per hour per GB of memory) to 10-17 error/bit·h (approximately 1 bit error per millennium per gigabyte of memory).
A large-scale research based on Google's pretty large number of servers was performed at the SIGMETRICS/Performance' 09 conference. The actual error rate found was several orders of magnitude higher than the previous small-scale or laboratory studies, with between 25,000 (about 2.5 × 10-11 error/bit·h) and 70,000 (about 7 × 10-11 error/bit·h, or 5 bit errors per 8 GB of RAM per hour) errors per billion device hours per megabit. More than 8% of DIMM memory modules were affected by errors each year.
The consequences of a memory error depend on the system. In systems without ECC memory, errors can cause crashes or data corruption. In mass production sites, memory errors are one of the most common hardware causes of machine crashes.
Memory errors can lead to security holes. If the memory errors were changed a little, neither causing observable failures nor affecting the data used or saved in the calculations, there would be no consequences.
A simulation study in 2010 showed that for Web browsers, only a small percentage of memory errors can cause data corruption, although because many memory errors are intermittent and related, the impact of memory errors was greater than expected for independent soft errors.
Some tests have concluded that the isolation of DRAM memory cells can be avoided by unintended side effects of specially designed accesses to adjacent cells. Therefore, due to the high cell density in modern memory, accessing data stored in DRAM causes the memory cells to leak charges and perform electrical interactions, thereby changing the contents of nearby memory rows that were not actually addressed in the original memory access.
This effect is known as row hammer and has been used in some elevated computer security vulnerabilities.
Advantages and Disadvantages of ECC Memory
There is a trade-off between protection against abnormal loss of data and higher cost performance.
Compared with non-ECC memory, because the production of ECC memory modules requires additional hardware, and because the production volume of ECC memory and related system hardware is lower, the price of ECC RAM is usually higher. Motherboards, chipsets, and processors that support ECC may also be more expensive.
ECC support varies among motherboard manufacturers, so it may not be recognized at all by ECC-incompatible motherboards. For less critical applications, most motherboards and processors are not designed to support ECC, so prices can be kept lower.
Some boards and processors that support ECC can support unbuffered (unregistered) ECC, but can also be used with non-ECC memory; if ECC RAM is installed, the system firmware enables ECC.
ECC may reduce memory performance by 2-3% on some systems, depending on the application and implementation, because the ECC memory controller requires extra time to perform error checking.
However, modern systems integrate ECC testing into the CPU, as long as no errors are detected, there will not be any additional latency for memory access. ECC-supported memory may cause additional power consumption because of error-correcting circuitry.
This post focuses on ECC memory. You can get the advantages and disadvantages of it after reading this post.