TCP Offload 128(TOE128) is a FPGA-based IP that receives and transmits Ethernet/IP/TCP packets on Ethernet networks across 128 simultaneous sessions. TOE128 delivers payload data, in order, to the user’s application with:
l Extra TCP/IP packet fields removed
l No missing data
l Verified by appropriate CRCs and checksums
l Flow control
The purpose is to offload the TCP/IP function from the CPU and perform it directly in FPGA-based hardware. TOE dramatically reduces the input to output response time and jitter by eliminating the need for host processor intervention when analyzing data packets. This IP is designed to be utilized in FPGA-based high frequency, low latency Wall Street trading applications. Input to output packet latency of less than 1μs can be achieved. Assuming a 100-byte payload (164-byte packet), the theoretical minimum input to output latency is about 500 ns.
The TOE128 works at the full 10 GbE line rate and was developed internally at DINI Group. TCP Offload is a required function in low latency networking application. Data critical functions are executed directly in the FPGA. Infrequent, non-data TCP/IP functions such as setup/teardown, ARP, ping, DHCP, et al) are passed through to a standard Linux driver. Other, software based TCP sessions run normally with no changes required. At the intended target frequency of 156.25 MHz, the TOE128 operates at the full 10GbE line rate, generating no Ethernet pause frames.
1. FPGA Resource Utilization (click to enlarge)
10 GbE Media Access Controller from Xilinx
In minimum latency approaches, it is necessary to avoid using external PHYs since they add significant latency. This IP assumes an FPGA PHY is used. The Xilinx PHY needs a MAC, and the 10 Gigabit Ethernet Media Access Controller (10GEMAC) is required to use this TOE IP. You purchase this separately from Xilinx as it is not included. Note that Xilinx has a free version that disables itself after a few hours. This free version contains all of the functionality of the full version and can be used for evaluation.
The TOE128 IP can connect to the slower 1 GbE MAC and we can make modifications here in La Jolla to interface the TOE128 to different MACs. Contact DINI sales for more information.
This Xilinx core is compatible with the Virtex-6 HXT FPGAs and works fine on both Virtex-7 and Kintex-7.
Features of the Xilinx 10GEMAC include:
l Designed to IEEE 802.3-2005 specification
l Configured and monitored through an independent microprocessor-neutral interface
l Optional Statistics counters
l Configurable flow control through MAC Control pause frames; symmetrically or asymmetrically enabled
l Generate customized core using the CORE Generator? technology
l Cut-through operation with minimum buffering for maximum flexibility in 64-bit client bus interfacing
l Ability to generate core with no physical interface to allow users to connect the PHY-side interface of the core to user logic
l Powerful EtherStats-based statistics gathering
l Programmable Interframe Gap
l Custom preamble preservation mode
l Supports Deficit Idle Control (DIC) for max. data throughput
l Maintains minimum IFG under all conditions and line rate performance
l Remote Fault/Local Fault signaling at the Reconciliation Sublayer
We use the AXI4-S bus interface option. Our testing and debug was performed using the unrestricted version of this core.
1. PCIe Bridge (GEN1/GEN2)
A host interface is required to handle a number of functions related to the TOE128 with configuration being the most important. A PCI bridge is supplied in encrypted net list format (.ngc) for this purpose. The PCIe Bridge has 4-lanes of GEN1/GEN2, and is a full function PCIe core. Configuration, BARs (base address registers), and master-moding DMA engines are included. Drivers with 'C' source for Linux are included.
The TOE128 implements the TCP function directly in FPGA gates. No external FPGA memory is required. TOE128s can be cascaded to support multiple sessions: mix-and-match with TOE1 is also supported. The TOE128 IP is intended to be clocked at the standard Ethernet interface frequency of 156.25 MHz, allowing fully synchronous and lowest latency data exchange with the MAC. At 156.25 MHz, the TOE128 operates at the full 10GbE line rate, generating no Ethernet pause frames. The IP is supplied either as an encrypted .ngc netlist for implementation in Xilinx-based FPGAs or as Verilog source to do with as you see fit. Intel/Altera Stratix-5 and ASIC versions will follow shortly. A host interface is required and this IP package includes an integrated 4-lane GEN1/GEN2 PCIe bridge. Simulation models and test fixtures are included.
This IP is optimized for low latency: the host CPU is NOT involved in payload data transfer. Not all TCP functions are handled in the IP. High complexity/low importance network features such as setup/teardown, ARP, ping, DHCP, et. al. are passed to a Linux driver via the PCIe interface. 'C' source for this driver is included, allowing customization.
All of the functions associated with TCP/IP layers 2, 3, 4, 5 (datalink, network, transport, and session) are implemented. The user is responsible for presentation layer 6 and application layer 7 and can be implemented in the FPGA or elsewhere. The maximum transmission unit (MTU) is 1536 bytes. CRC validation and checksum validation and reordering of out-of-order packets are done directly in the FPGA, along with packet retransmission upon error/lost/out of order packet reception. The TX and RX replay buffers are configurable: 4KB -> 64KB. Protection Against Wrapped Sequences (PAWS) is handled in the FPGA.
3. TOE128 IP Distribution Model
The TOE IP is distributed in two different ways:
l encrypted .ngc file
l complete verilog source
Model 1: Xilinx .ngc file
An .ngc file enables integration at the place and route stage into the Xilinx FPGA tools. Source is not provided, but full simulation libraries are supplied. You get this version when you get the optional FIX support package for our FPGA boards: DN_FBSP. Required operating system driver functions and APIs are supplied, with source, in 'C' for Linux.
The TOE128 IP, supplied as part of the optional DN_FBSP, is restricted to DINI products and will not operate on other FPGA-based boards. You are welcome to deploy this IP free of royalties or restrictions on DINI Group products. A single DN_FBSP license is required for your company and allows your company to use it worldwide in any number of DINI boards and any number of applications.
Model 2: Verilog Source
Verilog is our native language. This second distribution option gets you the complete source. You are not allowed to redistribute the source. The license agreement has all the details and the information in the license agreement supersedes what is written here.
Under extreme duress and only under extreme duress, we will convert to VHDL. Should we do this conversion, please note that new features and bug fixes will be first available in Verilog. We really don’t like VHDL and all reputable synthesis tools accept mixed language RTL anyway.
A maintenance contract, for bug fixes and feature enhancements is probably a good idea. 1 year is required at the time of purchase, with optional extensions sold on a yearly basis. Contact DINI salesfor more details.
l FPGA TCP Offload Engine (TOE) IP for networking applications requiring minimum latency and deterministic latency
l Supplied as encrypted .ngc (Xilinx) or optional verilog source
l Integrated PCIe bridge (required) provided in encrypted .ngc format
l Complete simulation models and text fixtures
l Host CPU NOT involved in payload data transfer
? 0% CPU load during middle of TCP session
? TCP data packets handled by TOE not passed to CPU
l Full 10GbE line rate
? No Ethernet pause frames generated
l CPU required only for High complexity/low importance network features:
? Setup/teardown of TCP session
? ARP, ping, DHCP, SMTP, et. al.
? Linux driver with 'C' source included
l Layers 2, 3, 4, 5 (datalink, network, transport, and session)
l Layers 6, 7 (presentation , application) is user’s responsibility in FPGA
l MTU of 1536 bytes
l CRC validation and checksum validation
? Ethernet CRC validation
? IP and TCP checksum validation
l Reordering of out-of-order packets
l Nagle algorithm
l Fast retransmit
l Congestion avoidance
l Packet retransmission upon error/lost/out of order packet reception
l 128 TCP/IP session per instantiated TOE
l Additional TOEs can be cascaded to support multiple sessions
? Limited only by FPGA resources
l Client or server mode
l Configurable TX and RX replay buffer
? 4KB -> 64KB
l Protection Against Wrapped Sequences (PAWS)
l Configurable port number
l IPv4 with future upgrade paths to IPv6/IPng
? TBD (consult factory)
l TCP timestamps for congestion avoidance (optional)
l Configurable timeouts
l Targeted to the DINI Group boards:
? DNPCIe_10G_HXT_LL with Virtex-6 HXT
? Cost reduced Kintex-7 version available in two models:
? Kintex Ultrascale version available in two models:
l Direct interface to the 10 Gigabit Ethernet Media Access Controller (10GEMAC) (required).
l Tested also with the free 10G MAC from OpenCores
l 64-bit bus interface:
? Synchronous FIFO clocked at 156.25Mhz
? Optional asynchronous FIFO interface with 4-6 clocks cycles of added latency
l Optimized for lowest receive (about 13 clk cycles) AND transmit latency (about 15 clk cycles) at 156.25Mhz. (2 RX longer latency, 5 TX longer latency - roughly). With store&forward latency in each direction.
l Does 128 TCP/IP sessions in a single module (each session is a connection to 1 other computer).
l Takes less than 50% of the FPGA (410).
l Xilinx or Intel/Altera versions available.
l User selectable amount of internal FPGA ram for replaybuffers (4KBytes to 256KBytes).
l Optional external memory (DRAM) for increased size retransmit buffers.
l Can achieve >90% of the 10GBE bandwidth (in both transmit and receive) with a single module. Multiple modules can achieve 100% of the 10GBE bandwidth.
l Multiple TOE128's can be connected to the same ethernet if 128 sessions isn’t enough. Possibility of doing more than 128 sessions in a TOE128 - contact factory with your requirements.
l Multiple TOE1 and TOE128 can be connected to the same ethernet.
l Multiple sessions can be a server on the same port number (like a webserver), or different port numbers.
l Good packetization controls from the user for TX data (i.e. you can control where the packet boundaries occur).
l Single RX and TX bus to the user design (with N-to-1 multiport mux example for TX to make the bus interface simple for different requestors).
l ZERO software overhead after setting up the connection information (IP and PORT numbers).
l TCPIP setup (SYN) and teardown (FIN) all done in hardware.
l RX packets not claimed by TOE128 get sent to the NIC.
l 64KByte transmit buffer available on ALL session at the same time without using that much internal FPGA memory.
l One TX session hogging too much memory, won’t stop the other sessions from making TX progress.
l Each session can be on a different VLAN (if needed).
l TX datapath has byte packer for ease of connection to your design.
l RX early datapath available if you can do something useful with the packet data before it’s been validated.
l TCPIP Features:
? RTT/persist/replay timers
? RX reordering (optional)
? server and client mode supported on each concurrent TCPIP session
? upto 64KBytes per session for retransmit buffering
? congestion control
l FPGA resources required:
? SLICES: assume about the same as TOE1 (approx 3% of V5-565).
? RAMB36s: 28 to 105 depending on TX buffer size (3.5% to 13% of Kintex 410, or 3% to 12% of V6-565)
l Netlist for toe128 and PCIE interface, verilog source for all other pieces
l Entire TOE128 design runs at 156.25Mhz. Clock domain change FIFOs available for the user interface side so you can run it at a slower/faster frequency
l Coming soon to a Dini board near you!
l Other items coming soon (contact factory for details):
? Dini 10GBE MAC low latency
? 40GBE support