Disk on DiskRAM
From PEL Wiki
Contents |
Comparison
One of the things I need to do is use the disk as a disk through the FPGA to make the two systems comparable. I haven't decided on the best interface to the disk. Should I export a memory-mapped interface and write a Linux driver that handles that, or should I expose an AHCI interface to the FPGA. Right now I'm leaning toward the memory-mapped interface.
Options
- ATA vs. SATA
- I think that since the command sets are the same, it really doesn't matter
- Performance is a reason to go with SATA. I get 40 MB/s with ATA and 64 MB/s with SATA.
- PIO vs. DMA
- DMA seems like the way to go
- On-board vs. Add in card?
- On-board Nvidia CK804 has no open driver supporting Native Command Queuing (NCQ)
- Nvidia does a lot of binary only drivers for their "advanced features"
- The open source Nvidia driver is needlessly complex due to lack of documentation
- Nvidia does a lot of binary only drivers for their "advanced features"
- I bought a Silicon Images card that would support NCQ (in case it's important)
- It is billed as a "nice, open spec"
- That might be true if you can get the documentation
- I bought an AHCI (Intel open spec) compliant JMicron-based board
- On-board Nvidia CK804 has no open driver supporting Native Command Queuing (NCQ)
Simple driver
Maybe the best thing for me to do is to make a device driver that does what I would like the FPGA to do in software so that I can debug it better. One trick will be allowing the driver to map the same regions as the real driver. Will I have to remove the driver? Will I have to modify their driver?
I just want to do DMA read and DMA write. The rest is inconsequential.
Probably a better way is to add some simple debugging statements to the existing drivers so that when I do a dd if=/dev/sda1 of=myfile.hex I can see the reads and writes that get used, then the structures that I need.
It turns out that the Nvidia driver is needlessly complex, due to lack of documentation.
I bought a Rosewill card based on the Silicon Images 3132 based on Jeff Garzik's assertion that it was "a nice open spec". I have yet to see a document on how to program it, but I'm hoping that I'll get one soon. It looks like I probably won't get one, so I also ordered a JMicron 360 - based board that implements AHCI (a standard interface from Intel). I already have the spec in hand, so now I'm just waiting for the adapter cable (eSATA to SATA) eSATA is external SATA and supports hot-plug, hence the need for the different cable.
The JMicron board only has one port, but one should be enough.
It turns out that I can't get either JMicron 360 board to work (I bought two to see if just the first one was faulty) I guess it must be the card, because the cables work with other cards.
I bought a 363 board, which is two internal SATA ports and a PATA port. Unfortunately the Linux IDE and SATA drivers both claim the card, so I spent a day migrating everything to SATA so that I can disable IDE completely (there are lots of errors from the card on the IDE side, and I don't want to be competing with another driver when I'm controlling the card)
Now that I'm booting from SATA, I added SATA support to the kernel (but left AHCI support as a module) This will prevent me from debugging the drivers I'm using to access my working drives. (Debugging messages added lots of time and mess to booting.)
Driver Status
I was trying to take a driver and make my own which took out the levels of indirection. I've decided that I'll simplify the existing driver instead. I figure that if I take the existing driver and strip it to the bare bones, but functional, it should reduce the needed effort.
At least temporarily I'm switching back. There is just too much indirection for me to sort through. Now that I have the AHCI spec and an ACHI compliant board, I'm going to try again.
I've converted the bare bones of the AHCI driver and found a way to allocate contiguous memory from user space. I allocate it, pin it, and check if it is contiguous. If not, I try again until I succeed, freeing memory only when I'm successful. It's a hack, but not too bad. I won't have to worry about it when I control the board from my FPGA.
The driver works for reading and writing! I only use it in DMA mode, to read and write pages at a time. I haven't bothered with interrupts, because they will not be used (they would interrupt the processor, which is not what I want.) My FPGA will see the transfers from the disk directly, which means that it doesn't need an external interrupt.
I implemented it in a driver and a user-level program. Here are their responsibilities:
- (The [AHCI spec] names things differently than the ahci driver. I've chosen the driver's names because they are a little more verbose(readable).)
- HOST_* addresses are relative to the base offset of device memory (BAR 5)
- PORT_* addresses are relative to the port address (base + 0x100 + port_num*0x80)
Driver Responsibilities minus Linux bookkeeping
- Initialize PCI space
- Write Cfg 0x41 with 0xa1 (specific to this card)
- Enable PCI device
- Set device as bus master
User-Level Code
- Allocate memory for command lists
- Reset and initialize AHCI
- Read HOST_CTL (0x4) or in HOST_RESET (bit 0) then write to HOST_CTL
- wait 1 second
- Read HOST_CTL and make sure that HOST_RESET was cleared
- Write HOST_CTL to set HOST_ACHI_EN (bit 31)
- Write 0xf to HOST_PORTS_IMPL
- Read HOST_CAP (0x0) -> (val&0x1f)+1 = number of ports
- For each port
- Read PORT_CMD (0x18) and clear PORT_CMD_LIST_ON (15) PORT_CMD_FIS_ON(14) PORT_CMD_FIS_RX(4) PORT_CMD_START(0) if necessary
- Sleep 500 us if bits changed
- Write PORT_CMD_SPIN_UP(0x1) to PORT_CMD(0x18)
- Check PORT_SCR_STAT (0x28) to make sure that the drive is there (check lower 4 bits to see if they're 0x3)
- Read and clear PORT_SCR_ERR(0x30) by writing it back
- Read and clear PORT_IRQ_STAT(0x10) by writing it back
- Read PORT_CMD (0x18) and clear PORT_CMD_LIST_ON (15) PORT_CMD_FIS_ON(14) PORT_CMD_FIS_RX(4) PORT_CMD_START(0) if necessary
- Enable IRQs on the HOST
- Enable each port with a drive
- Allocate and zero memory
- Set PORT_LST_ADDR, PORT_LST_ADDR_HI, PORT_FIS_ADDR, and PORT_FIS_ADDR_HI to point to the memory region you allocated
- Set PORT_CMD_ICC_ACTIVE | PORT_CMD_FIS_RX | PORT_CMD_POWER_ON | PORT_CMD_SPIN_UP | PORT_CMD_START in PORT_CMD
- Read or Write something
- Find an idle cmd slot in PORT_SCR_ACT the number of this bit is the TAG value for later
- Fill fields of command table using base_addr + tag * AHCI_CMD_TBL_SZ
- Byte 0=0x27 , 1=1<<7, 2=ATA_CMD_FPDMA_{READ,WRITE}, 3=num_blocks&0xff
- Byte 4=lbal, 5=lbam, 6=lbah, 7=1<<6
- Byte 8=lbal_hi, 9=lbam_hi, 10=lbah_hi, 11=num_blocks>>8
- Byte 12=tag<<3, 13=0, 14=0, 15=0
- Byte 16-19=0
- Fill scatter-gather list using the command table address + AHCI_CMD_TABLE_HDR_SZ
- addr = lower 32 bits of address
- addr_hi = upper 32 bits of address
- flags_size = size in bytes of transfer - 1
- Fill command slot information
- opts (word 0) = cmd_fis_len (5) | (number of scatter-gather entries)<<16
- status (word 1) = 0
- tbl_addr (word 2) = lower 32-bits of cmd_table + tag * AHCI_CMD_TBL_SZ
- tbl_addr_hi (word 3) = upper bits of cmd_table + tag * AHCI_CMD_TBL_SZ
- Issue cmd
- write 1 << tag to PORT_SCR_ACT
- write 1 << tag to PORT_CMD_ISSUE
TODO:
- Get geometry of the disk (LBA)
- ahci_dev_classify?
Blacklist
I added the libata and sata_sil24 drivers to the blacklist so they won't get loaded automatically. I don't want my changes to break rebooting! /etc/modprobe.d/blacklist I'm not sure it worked anyway, but I removed my entries from the blacklist now that I have libata and sata_nv compiled into the kernel, and I'm just debugging ahci.
VHDL version
I'm starting to convert it to VHDL. Between the BIOS and the VHDL, I have to do everything the driver and user-level code did. I haven't decided how to split it, but I think it will be obvious once I start.
Worries
- Deadlock
- When I am controlling the disk, how do I make sure that the disk's responses don't get stuck behind the processor's traffic? In other words, if I'm not careful, the read queue could fill up while the disk is being accessed, and then the disk won't be able to communicate back with me.
- I think the solution is to only control the disk through writes, so that the disk never reads from the FPGA. The easiest way to do this is to use memory connected to the Opteron (or on an add-in card if you can't use the Opteron's dimms) as scratch RAM.
- Then I have three memory areas
- The area to store commands
- The buffer for writes to disk
- The buffer for reading from disk
- I think I can put the read buffer on the FPGA.
- The sequence of events is then
- Write the command to the command area
- If the command is a write, write the data to the data area
- Tell the disk where to find the command (and data)
- Wait for the responses (if it is a read)
- Then I have three memory areas
BIOS Responsibilities
- Initialize PCI space
- Write Cfg 0x41 with 0xa1 (specific to this card)
- Enable PCI device
- Set device as bus master
- Allocate memory for command lists
- Reset and initialize AHCI
- Read HOST_CTL (0x4) or in HOST_RESET (bit 0) then write to HOST_CTL
- wait 1 second
- Read HOST_CTL and make sure that HOST_RESET was cleared
- Write HOST_CTL to set HOST_ACHI_EN (bit 31)
- Write 0xf to HOST_PORTS_IMPL
- Read HOST_CAP (0x0) -> (val&0x1f)+1 = number of ports
- For each port
- Read PORT_CMD (0x18) and clear PORT_CMD_LIST_ON (15) PORT_CMD_FIS_ON(14) PORT_CMD_FIS_RX(4) PORT_CMD_START(0) if necessary
- Sleep 500 us if bits changed
- Write PORT_CMD_SPIN_UP(0x1) to PORT_CMD(0x18)
- Check PORT_SCR_STAT (0x28) to make sure that the drive is there (check lower 4 bits to see if they're 0x3)
- Read and clear PORT_SCR_ERR(0x30) by writing it back
- Read and clear PORT_IRQ_STAT(0x10) by writing it back
- Read PORT_CMD (0x18) and clear PORT_CMD_LIST_ON (15) PORT_CMD_FIS_ON(14) PORT_CMD_FIS_RX(4) PORT_CMD_START(0) if necessary
- Enable IRQs on the HOST
- Enable each port with a drive
- Allocate and zero memory
- Set PORT_LST_ADDR, PORT_LST_ADDR_HI, PORT_FIS_ADDR, and PORT_FIS_ADDR_HI to point to the memory region you allocated
- Set PORT_CMD_ICC_ACTIVE | PORT_CMD_FIS_RX | PORT_CMD_POWER_ON | PORT_CMD_SPIN_UP | PORT_CMD_START in PORT_CMD
- Write drive information to DiskRAM so that it can find correct port addresses
VHDL
- Read or Write something
- Find an idle cmd slot in PORT_SCR_ACT the number of this bit is the TAG value for later
- Fill fields of command table using base_addr + tag * AHCI_CMD_TBL_SZ
- Byte 0=0x27 , 1=1<<7, 2=ATA_CMD_FPDMA_{READ,WRITE}, 3=num_blocks&0xff
- Byte 4=lbal, 5=lbam, 6=lbah, 7=1<<6
- Byte 8=lbal_hi, 9=lbam_hi, 10=lbah_hi, 11=num_blocks>>8
- Byte 12=tag<<3, 13=0, 14=0, 15=0
- Byte 16-19=0
- Fill scatter-gather list using the command table address + AHCI_CMD_TABLE_HDR_SZ
- addr = lower 32 bits of address
- addr_hi = upper 32 bits of address
- flags_size = size in bytes of transfer - 1
- Fill command slot information
- opts (word 0) = cmd_fis_len (5) | (number of scatter-gather entries)<<16
- status (word 1) = 0
- tbl_addr (word 2) = lower 32-bits of cmd_table + tag * AHCI_CMD_TBL_SZ
- tbl_addr_hi (word 3) = upper bits of cmd_table + tag * AHCI_CMD_TBL_SZ
- Issue cmd
- write 1 << tag to PORT_SCR_ACT
- write 1 << tag to PORT_CMD_ISSUE
Categories: Pel | LinuxKernel | Myles
