kernel1<<<…>>>()
GPU
Time
CPU
Thread Block
Grid 1
B0
B1
B2
B3
B4
B5
Serial code
Thread
synchronization
kernel2<<<…>>>()
Serial code
Grid 2
B0
B1
B2
B3
B4
B5
CPU
GPU
Memory
Transfer
Host
Memory
Device
Memory
Host Pinned
Memory
Shared Memory
CPU/GPU
GPU
Time
CPU
Serial code
kernel1<<<…>>>()
Grid 1
B0
B1
B2
B3
B4
B5
Pinned
Memory
Serial code
GPU
Time
CPU
kernel1<<<…>>>()
Grid 1
CUQU::push()
B0
B1
B2
B3
B4
B5
CUQU::fetch()
Serial code
Serial code
Marco Esposito Micenin
Pinned
Memory
A/A 2010/2011
B0
B1
B2
B3
B4
B5
4/21
GPU
Time
CPU
kernel1<<<…>>>()
Grid 1
CUQU::fetch()
barrier_wait()
Serial code
Pinned
Memory
Serial code
barrier_wait()
Offload
Time
Time
Computation
Time
Synchronization
Time
Sincronizzazione thread GPU
KSM-implicit
1
for(…) {
kernel<<<…>>>();
}
Time
KSM-explicit
for(…) {
kernel<<<…>>>();
cudaThreadSync();
}
2
Time
CSM-oneloop & CSM-lockfree
barrier_wait()
__global__ void csm_kernel() {
for(…) {
compute();
barrier_wait();
}
}
3
Time
Marco Esposito Micenin
A/A 2010/2011
7/21
Scarica

Analisi delle prestazioni di CUQU: una libreria per lo scambio di