并行架构与编程（七）Interconnection Networks

This is about network in a single chip. We care about performance, energy efficiency and also scalability.

General rule: latency increases with load.

1. terminology

network node: a network endpoint connected to a router/switch
network interface
- connects nodes to the network
switch/router
- connects a fixed number of input links to a fixed number of output links
link
- a bundle of wires carrying a signal

topology: how switches are connected via links
routing: how a message gets from its source to its destination in the network
buffering and flow control
- what data is stored in the network? packets, partial packets?
- how does the network manage buffer space?

routing distance
- diameter: the maximum routing distance between two nodes
- average distance: average routing distance over all valid routes
direct vs. indirect networks
- direct network: node在网络内部生成traffic，同时作为endpoint和switch
- indirect network: node在网络边界生成traffic

bisection bandwidth: cut network in half, sum bandwidth of all severed links
blocking vs. non-blocking
- non-blocking: 若节点集合一一映射进行通信且不存在冲突，则该网络是non-blocking的。

pros
- simple design
- cost effective for a small number of nodes
- easy to implement coherence
cons
- contention: all nodes contend for shared bus
- limited bandwidth: all nodes communicate over same writes
- high electrical load

pro
- simple
- cheap: O(N) cost
con
- high latency: O(N)
- scalability issue: bisection bandwidth remains constant as nodes are added
ex
- Core i7

mesh topology are different based on whether node is near edge or middle of network
O(N) cost
higher path diversity and bisection BW than mesh
higher complexity
- difficult to layout in chip
- unequal link lengths

latency: O(lgN)
“fat trees”: alleviate root bandwidth problem (higher bandwidth links near root)

circuit switching: preallocate all resources
- higher bandwidth
- low utilization
- once setup, no contention
packet switch
- higher utilization
- overhead due to dynamic switching logic

message: unit of transfer between network clients, multiple of packets
packet: unit of transfer of network
- header
- payload
- tail
flit(flow control bit): a unit of flow control
- packets broken into smaller units called “flits”
- minimum granularity of routing/buffering
packet format
- header: routing and control information
- payload/body
- tail: additional control information, error code

contention: when two packets need to be routed onto the same outbound link at the same time

solution:

buffer one packet, send it over link later
drop one packet
reroute one packet (deflection)
circuit switching
- no contention due to preallocation, so no need for buffering
- arbitrary messages size

store and forward
- buffer: channels and buffers are allocated per packet
- pro: each time use one link
- flow control unit: packet
- latency: 由于需要先存储然后转发，若一个packet包含#flits则延迟为#flits * #links
cut-through flow control: 借助link进行pipeline式发送，即一个switch收到flit后立即转发
- switch starts forwarding data on next link as soon as packet header is received
- buffer: per packet channel
- reduced latency
  - best latency = #links + #flits, 即header到达目的地的时间加上packet的flits的数量
  - worst latency: under high contention, reduces to store-and-forward contention
wormhole flow control
- buffer: buffers are allocated per flit, reducing significantly
- routing information only in head flit
- policy:
  - body flits follows head, tail flit follows body, if head flit blocks, rest of packet stops
- completely pipelined transmission
  - for long messages, latency is independent of network distance
  - latency = #links + #flits, for long message, #flits » #links

prob: head-of-line blocking, 同一个buffer中后续抵达的packet的传输由于最先抵达的packet被阻塞而阻塞，即使后续抵达的需要被发往idle的端口。

virtual channel flow control
- divide switch’s input buffer into multiple buffers sharing a single physical channel
- deadlock avoidance
- prioritization of traffic classes