For various reasons I need to better understand the gory details of how modern hardware actually works, especially NUMA systems. My first stop on my tour of modern hardware is memory, specifically all the gory details of DDR3 memory (I’ll start worrying about DDR4 when I can buy it).
This is a quick and dirty summary I can check when I get confused. It’s not intended to be a full reference. I found this article to be a good overview of memory history and terminology.
DIMM (Dual in-line memory module) Memory cards that provide 64 bits of data on each data request.
MT/s (Message Transfers / second) A measure of how many messages can be transmitted in a second. To translate this measure to something more potentially useful like MB/s one has to know how much data is in a message. For DIMMs a single message (ignoring double/triple channels) is 64 bits of data. See the next entry for an example of using MT/s.
DDR3 SDRAM (Double Data Rate type three synchronous dynamic random access memory) The third generation of the DDR standard. The Double Data Rate refers to the fact that the DDR family transmits data on two parts of a single clock cycle. To find the total data rate for a single DIMM multiple the MHz * 2 or just use the Message Transfers/s (MT/s) and then multiply that by the memory size (64 bits) and divide by 8 to get bytes. For example 800 Mhz aka 1600 MT/s * 64 / 8 = 12800 MB/s = 12.5 GB/s.
x4, x8 and x16 Refers to how many bits each physical chip on a DIMM can transfer. So if a DIMM is x4 then to transfer 64 bits one needs 16 chips. ECC DIMMs (discussed later) require more chips to provide the extra 8 bits of ECC information. Most systems get really unhappy if you try to mix x4/x8/x16 DIMMs, even in different banks.
bank All DIMMs which are connected to the same memory controller are talked about as being in the same bank. Confusingly the term bank can also refer to matched DIMMs in double or triple rate channels as discussed below. In this article I use bank to just discuss DIMMs that are connected to the same memory controller. In NUMA systems its common to have multiple banks.
channel There are unfortunately two different definitions for channel in memory parlance. One channel refers to the connection between a bank of DIMMs and their CPU. The other type of channel talks about the connection between the memory manager chip (which is what the CPU talks to over its channel - note that in modern CPUs the memory manager is now part of the CPU) and the DIMMs. We are discussing the later channel here. In a single channel system there are 64 wires running between a memory bank and the memory controller. So if there are say 3 DIMMs in the bank then one has to switch between which DIMM can talk to the memory controller. In a dual channel system there are actually 64*2 = 128 wires so that the memory manager can now address 128 bits at a time. Which doubles the effective memory rate. Note that DIMMs are still 64 bits so you need two DIMMs to take advantage of double channel architecture and the DIMMs must be exactly identical. Server systems now also support triple channel with 64*3 = 192 wires or 192 bits. Just make sure to have three matched DIMMs in each bank. The bottom line is that a double channel doubles the effective memory bandwidth and a triple channel triples it. Here is a good article on channels.
ranks A single ranked (SR) DIMM has just enough chips to produce a single 64 bit value. A double ranked (DR) DIMM has enough chips to produce two 64 bit values and so a chip select signal is used to tell the DIMM which one of its ranks to use. And yes, a quadruple ranked (QR) DIMM can produce four 64 bit values and so have 2 chip select signals to choose which of the four possible values to produce. Ranks can be a cheap way to build big memory DIMMs, one just takes a bunch of smaller chips (say for an 8 GB DIMM) and put them in two ranks and voila, a 16 GB DIMM. The reason why ranks are especially important is that most memory controllers can only handle so many chip select signals expressed as ranks. Let’s say a memory controller says it can only handle 8 ranks and it has 3 chips in a bank. If one buys three DIMMs that are each quadruple ranked then one has 16 ranks and the system probably won’t boot or at least will not work correctly.
ECC (Error Control Code) This is a hamming code style function that lets one correct single bit errors in words. The idea is that every single 64 bit value is hashed with an 8 bit value that is recorded with it. If say a cosmic ray causes a bit to flip then the ECC will detect and fix the problem automatically. ECC can also detect (but not fix) 2 bit errors. ECC DIMMs work by adding 8 bits and extra chips to the system. Any of the three types of DIMMS described below can have ECC added to it.
UDIMMs (Unbuffered/Unregistered DIMMs) These are DIMMs that do not use registered or load reduced features. Just plain ole DIMMs. Note, as mentioned above, that ECC can be added to UDIMMS.
RDIMMs (Registered DIMM) These are DIMMs that use a register for the control and address parts of the memory pins but not the data pins. The purpose of the register is to reduce the electrical/capacitive load on the pins by isolating them from the rest of the DIMM, this allows more memory to be packed together without causing interference issues. RDIMMs can be slower than UDIMMs because moving things in and out of the register adds a clock cycle but in practice this doesn’t seem to be a big deal.
LR-DIMMS (Load Reduce DIMM) These are DIMMs that put a full buffer before all the pins on the DIMM (as opposed to RDIMMs that just put a register before the control and address parts but not the data). This allows even more memory to be packed in by reducing interference issues. See here for more details.
UDIMMs vs RDIMMs vs LR-DIMMS I was randomly looking at the specs for the Intel Server Board 2600JF and wanted to see what was the biggest DIMM I could put in at the fastest speed for that DIMM type. The results are in the table below. So what DIMM type one picks has definite consequences. Also keep in mind that the board can only handle 8 ranks per 3 DIMM bank so if we bought, say, a 32GB QRx4 LRDIMM we could only fill two slots. Where as with the 32GB DRx8 RDIMM we could fill all three slots. On the other hand the LRDIMM can run at 1.35V and the RDIMM at 1.5V.
|Memory Type||Rank/# of chips||Maximum memory per DIMM||Maximum MT/s for that amount of memory||Minimum voltage per previous|
Memory modes There are tons of memory modes that usually reduce available memory in order to increase resiliency. Examples range from holding a DIMM in reserve to pairing DIMMs and duplicating all data in both to more advanced Lockstep memory modes which essentially use hamming codes across all DIMMs at once at the cost of cutting the system’s available memory to a 1/3 or so of normal.
SDRAM latency There are four measures of how long it takes a SDRAM to do its job. In theory different SDRAMs can have higher or lower latencies. The relevant Wikipedia article was short and readable so I just scan it when I need to remember the details.
Data Burst length A burst read is one in which a request is made for an address location and that location plus N-1 other locations are read. A burst write involves writing to a location and then following with N-1 other locations. For DDR3 N can only be 8, in other words, a burst read or burst write will read/write 64 bytes (64 bits * 8) of data. This is effectively pipelining, since the bus is just 64 bits it means 64 bits get sent and then immediately another 64 bits and so on until all 8 64 bit blocks are sent. The advantage to bursting is that normally one has to wait multiple clock cycles (see previous entry) before another read/write can be done. A burst lets one pipeline requests on each clock cycle, so things are faster. DDR3 supports two burst patterns. The obvious one is sequential. One writes to location A and then bursts the next 7 blocks to A+1, A+2,... A+7. The other pattern is interleaved whose purpose I don’t grok.