VIII Workshop PisaTel - December, 6th 2005 - SSSUP
DESIGN AND IMPLEMENTATION
OF A MULTI-DIMENSIONAL
TITOLO
PACKETTESI
CLASSIFIER
FOR NETWORK PROCESSORS
Ing. Fabio Vitucci
Gruppo RETI di TELECOMUNICAZIONI
Dipartimento di Ingegneria dell’Informazione - Università di Pisa
1
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Outline
• Resume of previous activities
• Implementation of classification module
• Programming problems
• Measurements
• Future works
• Conclusions
2
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Resume of previous activities/1
• Detailed analysis of the Intel® IXP2400 Network Processor and the
available board (Radysis ENP-2611)
• Choice of a proper application to be implemented on NPs: a packet
classification
Source
Address
Layer 4
Destination
Layer 4
Protocol
...
Rule
11.14.2.21
www
TCP
...
R1
13.11.23.*
gt 1023
TCP
...
R2
112.*.*.*
www
UDP
...
R3
• Comparative analysis among many research algorithms
3
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Resume of previous activities/2
Comparative analysis among many research algorithms
Algorithm
Worst case Time
Worst Case Storage
O(N)
O(N)
Hierarchical tries
O(WD)
O(NDW)
Set-pruning tries
O(WD)
O(ND)
Grid-of-tries
O(WD-1)
O(NDW)
Cross-producting
O(DW)
O(ND)
Area-Based Quadtree
O(NW)
O(W)
O((L+1)W)
O(LN1+1/L)
O(D)
O(ND)
O(DW+N/W)
O(DN2)
HiCuts
O(D)
O(ND)
Ternary CAMs
O(1)
O(N)
Linear Search
FIS-tree
RFC
Bitmap-intersection
N = number of entries
D = number of fields to be processed
4
W = maximum number of bit for level
L = number of level of data structure
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Resume of previous activities/3
Multidimensional Multibit Trie
SA Trie
• Fields:
– IP Source Address and IP Destination Address
– Layer 4 Source Port and Destination Port
– Layer 4 Protocol Type
• Hierarchical trie: a tree per dimension
– Many levels for dimension
– A fixed number of bits for level
DA Trie
SP Trie
DP Trie
• Performance parameters:
– Research speed: 5×O(W/K)
– Memory accesses: 12
– Storage complexity: 5×O(2(k-1)×N×W/K)
5
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
PR Trie
Resume of previous activities/4
• Main bound:
– Memory consumption
– Rules with unspecified fields (e.g. 131.114.*.*) need
explosion of all possible rules
• Modifications:
– A level transition in case of wild-cards
• Less number of nodes
• Sometimes more memory accesses
• More complexity
• Validation tests with a C simulator
– Large saving in memory consumption (table in SRAM)
– Small increase in instruction store size
6
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/1
IPv4 Forwarder Intel
uE 1:0
Reflector Bus
Scheduler
uE 1:3
Eth_Decap_Classify
IPv4 Fwd
uE
0:0
SCHEDULER_TO_QM_SCR_RING
alias QM_RING_IN_1
alias DEQ_RING_NUMBER
NN_RING
ID = 6 (x 4)
BASE_ADDRESS = 6144
SIZE = 512
L2_Validate
uE 0:3
uE 0:2
uE 1:1
MSF
MSF
Packet _RX
Eth_Decap_Classify
IPv4 Fwd
ETH_RX_TO_IPV4_SRC_RING
ID = 4 (x 4)
BASE_ADDRESS = 0
SIZE = 1024
7
L2_Validate
uE 0:1
Eth_Decap_Classify
IPv4 Fwd
uEL2_Validate
0:3
Packet_QM
IPV4_TO_QM_SCR_RING
alias QM_RING_IN
alias QM_RING_IN_0
alias ENQ_RING_NUMBER
ID = 5 (x 4)
BASE_ADDRESS = 4096
SIZE = 512
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Sphy_mphy4_tx
QM_TO_PACKET_TX_SCR_RING_0
alias PACKET_TX_IN_0
ID = 7 (x 4)
BASE_ADDRESS = 8192
SIZE = 128
Implementation of module/1
IPv4 Forwarder Intel
uE 1:0
Reflector Bus
Scheduler
uE 1:3
Eth_Decap
_Classify
Classify
IPv4
Fwd
uE
0:0
SCHEDULER_TO_QM_SCR_RING
alias QM_RING_IN_1
alias DEQ_RING_NUMBER
NN_RING
ID = 6 (x 4)
BASE_ADDRESS = 6144
SIZE = 512
L2_
Validate
uE 0:3
uE 0:2
uE 1:1
MSF
Eth_Decap
_Classify
Packet _RX
Classify
IPv4
Fwd
ETH_RX_TO_IPV4_SRC_RING
ID = 4 (x 4)
BASE_ADDRESS = 0
SIZE = 1024
8
L2_
Validate
uE 0:1
Eth_Decap
_Classify
Classify
IPv4
Fwd
uE
0:3
L2_
Validate
MSF
Packet_QM
IPV4_TO_QM_SCR_RING
alias QM_RING_IN
alias QM_RING_IN_0
alias ENQ_RING_NUMBER
ID = 5 (x 4)
BASE_ADDRESS = 4096
SIZE = 512
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Sphy_mphy4_tx
QM_TO_PACKET_TX_SCR_RING_0
alias PACKET_TX_IN_0
ID = 7 (x 4)
BASE_ADDRESS = 8192
SIZE = 128
Implementation of module/2
• Functions of XScale (implemented in C language):
– Receiving classification rules
– Building multidimensional trie according to received rules to
calculate the number of nodes per level and SRAM addresses
– Rebuilding multidimensional trie to put data in SRAM to
precalculated addresses
• Functions of Microengines:
–
–
–
–
9
Receiving packets
Retrieving proper fields to packet headers
Finding matching rules using data structure in SRAM
Modifying TOS fields
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/2
• Functions of XScale (implemented in C language):
– Receiving classification rules
– Building multidimensional trie according to received rules to
calculate the number of nodes per level and SRAM addresses
– Rebuilding multidimensional trie to put data in SRAM to
precalculated addresses
• Functions of Microengines:
–
–
–
–
10
Receiving packets
Retrieving proper fields to packet headers
Finding matching rules using data structure in SRAM
Modifying TOS fields
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/2
• Functions of XScale (implemented in C language):
– Receiving classification rules
– Building multidimensional trie according to received rules to
calculate the number of nodes per level and SRAM addresses
– Rebuilding multidimensional trie to put data in SRAM to
precalculated addresses
• Functions of Microengines:
–
–
–
–
11
Receiving packets
Retrieving proper fields to packet headers
Finding matching rules using data structure in SRAM
Modifying TOS fields
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/3
SRAM Data Table
long word
index of node *
index of node of 2nd level
index of node of 2nd level
index of node of 2nd level
index of node of 2nd level
index of node of 2nd level
index of node of 2nd level
index of node of 2nd level
value of field
index of next node
value of field
index of next node
value of field
index of next node
value of field
index of next node
index of node *
value of field
index of next node
index of node *
value of field
index of next node
index of node *
index of next node
minimum
value
maximum
index of node *
index of next node
minimum
value
index of node *
value of field
12
value
number of rule
maximum
value
value of field
number of rule
value of field
number of rule
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/4
• Functions of µ-engines (implemented in µ-code assembler):
–
–
–
–
Receiving packets
Retrieving proper fields to packet headers
Finding matching rules using data structure in SRAM
Modifying TOS fields
• Number of added cycles: 1600
–
–
–
–
–
–
13
50 = memory registers initialization
180 = reading first node
150 × 2 = reading nodes of ports
145 × 7 = reading other nodes
15 = final matching
40 = writing TOS field
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/1
uE 1:0
Reflector Bus
Scheduler
uE 1:3
Eth_Decap
_Classify
Classify
IPv4
Fwd
uE
0:0
SCHEDULER_TO_QM_SCR_RING
alias QM_RING_IN_1
alias DEQ_RING_NUMBER
NN_RING
ID = 6 (x 4)
BASE_ADDRESS = 6144
SIZE = 512
L2_
Validate
uE 0:3
uE 0:2
uE 1:1
MSF
Eth_Decap
_Classify
Packet _RX
Classify
IPv4
Fwd
ETH_RX_TO_IPV4_SRC_RING
ID = 4 (x 4)
BASE_ADDRESS = 0
SIZE = 1024
L2_
Validate
uE 0:1
Eth_Decap
_Classify
Classify
IPv4
Fwd
uE
0:3
L2_
Validate
MSF
Packet_QM
IPV4_TO_QM_SCR_RING
alias QM_RING_IN
alias QM_RING_IN_0
alias ENQ_RING_NUMBER
ID = 5 (x 4)
BASE_ADDRESS = 4096
SIZE = 512
Sphy_mphy4_tx
QM_TO_PACKET_TX_SCR_RING_0
alias PACKET_TX_IN_0
ID = 7 (x 4)
BASE_ADDRESS = 8192
SIZE = 128
• Main problems:
– Number of SRAM accesses
– Rate of SRAM accesses
14
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/2
Multithreaded Programming
running thread
context swap
µe control
idle thread
idle µe
memory access latency
thread 0
thread 1
thread 2
thread 3
thread 4
thread 5
thread 6
thread 7
time
We want to reduce the idle time
15
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/3
Stalling
running thread
context swap
µe control
idle thread
idle µe
memory access latency
thread 0
thread 1
thread 2
thread 3
time
thread 0
thread 1
thread 2
thread 3
time
• Decrease the number of active threads for µ-engine
16
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/4
• Consolidate adjacent memory accesses
• Filling
running thread
context swap
µe control
idle thread
idle µe
memory access latency
thread 0
thread 1
thread 2
thread 3
time
thread 0
thread 1
thread 2
thread 3
time
17
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/1
Developers’ Workbench
(Microengines
Programming)
Cross-Compiler
(XScale programming)
Serial Cable
AdTech AX4000
18
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/2
ADTech AX4000
19
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/3
•
•
•
•
20
Max packet rate: 2033000 pkt/s (0 lost packets)
Number of supported rules: 10000
Performance indipendent from number of rules
A fundamental feature: robustness
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/4
• Packet delay
100 μsec
1130 μsec
35 μsec
21
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Future Works: Resources/Link Scheduler
Scratchpad Memory
uE 0:1
Eth_Decap
Classify
IPv4
Fwd
L2_Valida
te
FB
uE 0:2
uE 0:0
uE 1:0
Eth_Decap
Classify
IPv4
Fwd
IPv4L2_Valida
L2_Vali
Fwd te date
FB
FB
uE 1:1
uE 1:2
MSF
Packet _RX
Classifier
Resource
Scheduler
Link
Scheduler
uE 0:3
Eth_Decap
Classify
IPv4
Fwd
IPv4L2_Valida
L2_Vali
Fwd te date
FBFB
uE1:3
1:3
uE
Eth_Decap
Classify
22
IPv4
Fwd
IPv4L2_Valida
L2_Vali
Fwd te date
FBFB
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
MSF
Packet_TX
Conclusions
•
•
•
•
Analyse the Intel® IXP2400 hardware architecture
Select a proper algorithm of packet classification for the IXP2400
Modify the algorithm to capitalize properties of our hardware
Build a C Simulator to test the new version
• Implement XScale functions in C language (building rule table)
• Implement μ-engines functions in µ-code (finding matching rule)
• Analyse multithreaded programming
• Study stalling, filling, and other “phenomenons”
• Test working and performance of the classifier
• Characteristics: 1600 added cycles, 2 Mpkt/s, 10000 rules
supported, scalability, robustness in case of congestion
23
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Workshop PisaTel - December 6th 2005 - SSSUP
DESIGN AND IMPLEMENTATION
OF A MULTI-DIMENSIONAL
TITOLO
PACKETTESI
CLASSIFIER
FOR NETWORK PROCESSORS
Ing. Fabio Vitucci
Gruppo RETI di TELECOMUNICAZIONI
Dipartimento di Ingegneria dell’Informazione - Università di Pisa
24
Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Scarica

TITOLO TESI