VIII Workshop PisaTel - December, 6th 2005 - SSSUP DESIGN AND IMPLEMENTATION OF A MULTI-DIMENSIONAL TITOLO PACKETTESI CLASSIFIER FOR NETWORK PROCESSORS Ing. Fabio Vitucci Gruppo RETI di TELECOMUNICAZIONI Dipartimento di Ingegneria dell’Informazione - Università di Pisa 1 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Outline • Resume of previous activities • Implementation of classification module • Programming problems • Measurements • Future works • Conclusions 2 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Resume of previous activities/1 • Detailed analysis of the Intel® IXP2400 Network Processor and the available board (Radysis ENP-2611) • Choice of a proper application to be implemented on NPs: a packet classification Source Address Layer 4 Destination Layer 4 Protocol ... Rule 11.14.2.21 www TCP ... R1 13.11.23.* gt 1023 TCP ... R2 112.*.*.* www UDP ... R3 • Comparative analysis among many research algorithms 3 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Resume of previous activities/2 Comparative analysis among many research algorithms Algorithm Worst case Time Worst Case Storage O(N) O(N) Hierarchical tries O(WD) O(NDW) Set-pruning tries O(WD) O(ND) Grid-of-tries O(WD-1) O(NDW) Cross-producting O(DW) O(ND) Area-Based Quadtree O(NW) O(W) O((L+1)W) O(LN1+1/L) O(D) O(ND) O(DW+N/W) O(DN2) HiCuts O(D) O(ND) Ternary CAMs O(1) O(N) Linear Search FIS-tree RFC Bitmap-intersection N = number of entries D = number of fields to be processed 4 W = maximum number of bit for level L = number of level of data structure Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Resume of previous activities/3 Multidimensional Multibit Trie SA Trie • Fields: – IP Source Address and IP Destination Address – Layer 4 Source Port and Destination Port – Layer 4 Protocol Type • Hierarchical trie: a tree per dimension – Many levels for dimension – A fixed number of bits for level DA Trie SP Trie DP Trie • Performance parameters: – Research speed: 5×O(W/K) – Memory accesses: 12 – Storage complexity: 5×O(2(k-1)×N×W/K) 5 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP PR Trie Resume of previous activities/4 • Main bound: – Memory consumption – Rules with unspecified fields (e.g. 131.114.*.*) need explosion of all possible rules • Modifications: – A level transition in case of wild-cards • Less number of nodes • Sometimes more memory accesses • More complexity • Validation tests with a C simulator – Large saving in memory consumption (table in SRAM) – Small increase in instruction store size 6 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Implementation of module/1 IPv4 Forwarder Intel uE 1:0 Reflector Bus Scheduler uE 1:3 Eth_Decap_Classify IPv4 Fwd uE 0:0 SCHEDULER_TO_QM_SCR_RING alias QM_RING_IN_1 alias DEQ_RING_NUMBER NN_RING ID = 6 (x 4) BASE_ADDRESS = 6144 SIZE = 512 L2_Validate uE 0:3 uE 0:2 uE 1:1 MSF MSF Packet _RX Eth_Decap_Classify IPv4 Fwd ETH_RX_TO_IPV4_SRC_RING ID = 4 (x 4) BASE_ADDRESS = 0 SIZE = 1024 7 L2_Validate uE 0:1 Eth_Decap_Classify IPv4 Fwd uEL2_Validate 0:3 Packet_QM IPV4_TO_QM_SCR_RING alias QM_RING_IN alias QM_RING_IN_0 alias ENQ_RING_NUMBER ID = 5 (x 4) BASE_ADDRESS = 4096 SIZE = 512 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Sphy_mphy4_tx QM_TO_PACKET_TX_SCR_RING_0 alias PACKET_TX_IN_0 ID = 7 (x 4) BASE_ADDRESS = 8192 SIZE = 128 Implementation of module/1 IPv4 Forwarder Intel uE 1:0 Reflector Bus Scheduler uE 1:3 Eth_Decap _Classify Classify IPv4 Fwd uE 0:0 SCHEDULER_TO_QM_SCR_RING alias QM_RING_IN_1 alias DEQ_RING_NUMBER NN_RING ID = 6 (x 4) BASE_ADDRESS = 6144 SIZE = 512 L2_ Validate uE 0:3 uE 0:2 uE 1:1 MSF Eth_Decap _Classify Packet _RX Classify IPv4 Fwd ETH_RX_TO_IPV4_SRC_RING ID = 4 (x 4) BASE_ADDRESS = 0 SIZE = 1024 8 L2_ Validate uE 0:1 Eth_Decap _Classify Classify IPv4 Fwd uE 0:3 L2_ Validate MSF Packet_QM IPV4_TO_QM_SCR_RING alias QM_RING_IN alias QM_RING_IN_0 alias ENQ_RING_NUMBER ID = 5 (x 4) BASE_ADDRESS = 4096 SIZE = 512 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Sphy_mphy4_tx QM_TO_PACKET_TX_SCR_RING_0 alias PACKET_TX_IN_0 ID = 7 (x 4) BASE_ADDRESS = 8192 SIZE = 128 Implementation of module/2 • Functions of XScale (implemented in C language): – Receiving classification rules – Building multidimensional trie according to received rules to calculate the number of nodes per level and SRAM addresses – Rebuilding multidimensional trie to put data in SRAM to precalculated addresses • Functions of Microengines: – – – – 9 Receiving packets Retrieving proper fields to packet headers Finding matching rules using data structure in SRAM Modifying TOS fields Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Implementation of module/2 • Functions of XScale (implemented in C language): – Receiving classification rules – Building multidimensional trie according to received rules to calculate the number of nodes per level and SRAM addresses – Rebuilding multidimensional trie to put data in SRAM to precalculated addresses • Functions of Microengines: – – – – 10 Receiving packets Retrieving proper fields to packet headers Finding matching rules using data structure in SRAM Modifying TOS fields Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Implementation of module/2 • Functions of XScale (implemented in C language): – Receiving classification rules – Building multidimensional trie according to received rules to calculate the number of nodes per level and SRAM addresses – Rebuilding multidimensional trie to put data in SRAM to precalculated addresses • Functions of Microengines: – – – – 11 Receiving packets Retrieving proper fields to packet headers Finding matching rules using data structure in SRAM Modifying TOS fields Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Implementation of module/3 SRAM Data Table long word index of node * index of node of 2nd level index of node of 2nd level index of node of 2nd level index of node of 2nd level index of node of 2nd level index of node of 2nd level index of node of 2nd level value of field index of next node value of field index of next node value of field index of next node value of field index of next node index of node * value of field index of next node index of node * value of field index of next node index of node * index of next node minimum value maximum index of node * index of next node minimum value index of node * value of field 12 value number of rule maximum value value of field number of rule value of field number of rule Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Implementation of module/4 • Functions of µ-engines (implemented in µ-code assembler): – – – – Receiving packets Retrieving proper fields to packet headers Finding matching rules using data structure in SRAM Modifying TOS fields • Number of added cycles: 1600 – – – – – – 13 50 = memory registers initialization 180 = reading first node 150 × 2 = reading nodes of ports 145 × 7 = reading other nodes 15 = final matching 40 = writing TOS field Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Programming problems/1 uE 1:0 Reflector Bus Scheduler uE 1:3 Eth_Decap _Classify Classify IPv4 Fwd uE 0:0 SCHEDULER_TO_QM_SCR_RING alias QM_RING_IN_1 alias DEQ_RING_NUMBER NN_RING ID = 6 (x 4) BASE_ADDRESS = 6144 SIZE = 512 L2_ Validate uE 0:3 uE 0:2 uE 1:1 MSF Eth_Decap _Classify Packet _RX Classify IPv4 Fwd ETH_RX_TO_IPV4_SRC_RING ID = 4 (x 4) BASE_ADDRESS = 0 SIZE = 1024 L2_ Validate uE 0:1 Eth_Decap _Classify Classify IPv4 Fwd uE 0:3 L2_ Validate MSF Packet_QM IPV4_TO_QM_SCR_RING alias QM_RING_IN alias QM_RING_IN_0 alias ENQ_RING_NUMBER ID = 5 (x 4) BASE_ADDRESS = 4096 SIZE = 512 Sphy_mphy4_tx QM_TO_PACKET_TX_SCR_RING_0 alias PACKET_TX_IN_0 ID = 7 (x 4) BASE_ADDRESS = 8192 SIZE = 128 • Main problems: – Number of SRAM accesses – Rate of SRAM accesses 14 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Programming problems/2 Multithreaded Programming running thread context swap µe control idle thread idle µe memory access latency thread 0 thread 1 thread 2 thread 3 thread 4 thread 5 thread 6 thread 7 time We want to reduce the idle time 15 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Programming problems/3 Stalling running thread context swap µe control idle thread idle µe memory access latency thread 0 thread 1 thread 2 thread 3 time thread 0 thread 1 thread 2 thread 3 time • Decrease the number of active threads for µ-engine 16 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Programming problems/4 • Consolidate adjacent memory accesses • Filling running thread context swap µe control idle thread idle µe memory access latency thread 0 thread 1 thread 2 thread 3 time thread 0 thread 1 thread 2 thread 3 time 17 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Measurements/1 Developers’ Workbench (Microengines Programming) Cross-Compiler (XScale programming) Serial Cable AdTech AX4000 18 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Measurements/2 ADTech AX4000 19 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Measurements/3 • • • • 20 Max packet rate: 2033000 pkt/s (0 lost packets) Number of supported rules: 10000 Performance indipendent from number of rules A fundamental feature: robustness Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Measurements/4 • Packet delay 100 μsec 1130 μsec 35 μsec 21 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Future Works: Resources/Link Scheduler Scratchpad Memory uE 0:1 Eth_Decap Classify IPv4 Fwd L2_Valida te FB uE 0:2 uE 0:0 uE 1:0 Eth_Decap Classify IPv4 Fwd IPv4L2_Valida L2_Vali Fwd te date FB FB uE 1:1 uE 1:2 MSF Packet _RX Classifier Resource Scheduler Link Scheduler uE 0:3 Eth_Decap Classify IPv4 Fwd IPv4L2_Valida L2_Vali Fwd te date FBFB uE1:3 1:3 uE Eth_Decap Classify 22 IPv4 Fwd IPv4L2_Valida L2_Vali Fwd te date FBFB Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP MSF Packet_TX Conclusions • • • • Analyse the Intel® IXP2400 hardware architecture Select a proper algorithm of packet classification for the IXP2400 Modify the algorithm to capitalize properties of our hardware Build a C Simulator to test the new version • Implement XScale functions in C language (building rule table) • Implement μ-engines functions in µ-code (finding matching rule) • Analyse multithreaded programming • Study stalling, filling, and other “phenomenons” • Test working and performance of the classifier • Characteristics: 1600 added cycles, 2 Mpkt/s, 10000 rules supported, scalability, robustness in case of congestion 23 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP Workshop PisaTel - December 6th 2005 - SSSUP DESIGN AND IMPLEMENTATION OF A MULTI-DIMENSIONAL TITOLO PACKETTESI CLASSIFIER FOR NETWORK PROCESSORS Ing. Fabio Vitucci Gruppo RETI di TELECOMUNICAZIONI Dipartimento di Ingegneria dell’Informazione - Università di Pisa 24 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP