Implementation Options

From PEL Wiki

Jump to: navigation, search

For the final implementation of DiskRAM, it should be implemented using a memory module standard such as FB-DIMM, DDR, DDR2, DDR3, etc. Unfortunately, none of the current techniques can handle the latency of a hard drive. I guess FB-DIMM might be able to in future revisions, but it's pretty dumb right now: it waits until the worst case timing every time instead of allowing the data to go when it arrives.

Since I can't do a memory module implementation, I'll do a memory controller implementation. (I'm not sure that memory controller is the best way, but it is the most portable)

That limits me to using an open spec. Intel Pentium Pro was open, but I've been there, and it wasn't pretty. The newer Intel processors use proprietary interfaces to their memory controllers.

That means that I have a few other options:

  1. Use an AMD Opteron DP system with one of the processors acting as a paging device
    1. What happens when the processor gets an unmapped physical address from the other?
    2. How slow would it be?
  2. Use an AMD Opteron DP system and replace the second processor with: --Adds latency to every memory request
    1. A MIPS processor chip from Broadcom or PMC-Sierra
      1. Integrated HyperTransport and DDR interface
      2. Easily interface to a hard drive
      3. What happens with large addresses that arrive on the HyperTransport bus?
    2. An FPGA Implementation would include:
      1. A Hypertransport interface
      2. A DDR interface
      3. Mapping tables
      4. A Hard Drive interface
    3. An FPGA and a processor
  3. Use a PCI-Express add in card
    1. FPGA-based
      1. Use with a PCI-Express SATA card
      2. Buy SATA core
      3. Use a USB 2.0 drive Drive Speeds
    2. PowerQUICC III or XScale
  4. Use another processor and say goodbye to x86 ugliness
    1. What applications for benchmarks?
    2. What processor?
      1. PowerPC 970 (G5)
        1. Have to implement Elastic I/O controller
      2. MIPS - floating point support?
      3. Alpha?
      4. I don't know!
  5. Software fudging
    1. Is there some way to use a sampling technique for performance?
    2. What if we increase the frequency of the clock algorithm and subtract the extra time. Can we use normal paging at high frequency and get any reasonable performance numbers?

A semi-random thought: "Can you do better performance approximation if you allow the queue to fill before satisfying any requests? Especially if they come fast. Then can you subtract the latency more easily?" Probably not.

The HyperTransport link is configurable enough that I can play with outstanding transactions by lowering buffer depths, etc.

PCI-X won't work because its transactions are not all split, and all our reads need to be able to take longer than the fixed 16 cycles.

Personal tools