PCI and PCI Expres
Computer Science & En
Arizona Stat
Tempe, A
Dr. Yann-
yhlee@a
(480) 72
7/23
ss Bus Architecture
ngineering Department
te University
AZ 85287
-Hang Lee
asu.edu
27-7507
Buses in PC-X
ISA (Industry Standard A
IBM-PC and PC-XT: 8 bits at
8088, 2-stage bus cycle (2.3
AT bus: extension slot + 8 bi
16 bits at 8.33MHz for 8028
CPU BIOS timer
int. con
DRAM
contrl. DMA
DRAM con
XT and PC-AT
Architecture)
t 4.77MHz, directly connect to
38Mbyte/sec bus bandwidth)
it ISA
86
r,
ntl. bus buffer
ISA bus
A
ntrl.
expansion slots
1
Buses in
16-bit ISA cannot support W
data
VESA LB (local bus) -- linke
32 bits
486 local bus
CPU
bus buffer
video LAN HDD
card adapter contrl.
n PC(486)
Window applications --- video
ed to 486 local bus, 33MHZ,
DRAM
L2
cache
ISA bridge
ISA bus
expansion slots
2
Buses in PC
Backside Bus
Frontside Bus
PCI
Direct access to system
memory for connected
devices
Uses a bridge to connect
to the frontside bus and
therefore to the CPU
ISA
C (Pentium)
3
Advantages and Disa
Versatility:
New devices can be added e
Peripherals can be moved be
the same bus standard
Low Cost:
A single set of wires is share
Manage complexity by part
It creates a communication
The bandwidth of that bus ca
The maximum bus speed is
The length of the bus and the
The need to support a range
data transfer rates
advantages of Buses
easily
etween computer systems that use
ed in multiple ways
titioning the design
n bottleneck
an limit the maximum I/O throughput
s largely limited by:
e number of devices on the bus
e of devices with varying latencies and
4
Master versus
Bus Master issue
Master Data can go
Control lines: Signal requests a
Data/address lines carry inform
the destination:
A bus transaction includes thr
Arbitration
Issuing the command (and addr
Transferring the data
Master is the one who starts th
issuing the command (and add
Slave is the one who responds
Sending data to the master if th
Receiving data from the master
Slave in a Bus
es command Bus
either way Slave
and acknowledgments
mation between the source and
ree parts:
– which master can use the bus
ress) – request
– action
he bus transaction by:
dress)
s to the command by:
he master asks for data
if the master wants to send data
5
Types o
Processor-Memory Bus (desig
Short and high speed
Only need to match the memory
Maximize memory-to-processo
Connects directly to the process
Optimized for cache block trans
I/O Bus (industry standard)
Usually is lengthy and slower
Need to match a wide range of
Connects to the processor-mem
Backplane Bus (standard or p
Backplane: an interconnection s
Allow processors, memory, and
Cost advantage: one bus for all
of Buses
gn specific)
y system
or bandwidth
sor
sfers
I/O devices
mory bus or backplane bus
proprietary)
structure within the chassis
d I/O devices to coexist
components
6
Synchronous and A
Synchronous Bus:
Includes a clock in the contr
A fixed protocol for communi
Advantage: involves very littl
Disadvantages:
Every device on the bus mu
To avoid clock skew, they c
Asynchronous Bus:
It is not clocked
It can accommodate a wide r
It can be lengthened without
It requires a handshaking pro
Asynchronous Bus
rol lines
ication that is relative to the clock
le logic and can run very fast
ust run at the same clock rate
cannot be long if they are fast
range of devices
worrying about clock skew
otocol
7
Simple Synchro
BusReq Cmd+Addr
BusGrant Data1
R/W Address
Data
All agents operate synchronou
same rate
A simple protocol to manage t
Even memory busses are mor
memory (slave) may take time t
it needs to control data rate
onous Protocol
Data2
usly – all source / sink data at
the source and target
re complex than this
to respond
8
Asynchronou
A read transaction Master Assert
Address
Data
Read
Req
Ack
t0 t1 t
t0 : Master has obtained control a
data, waits a specified amou
t1: Master asserts request line
t2: Slave asserts ack, indicating
t3: Master releases req, data rec
t4: Slave releases ack
us Handshake
ts Address Next Address
Slave Asserts Data
t2 t3 t4 t5
and asserts address, direction,
unt of time for slaves to decode target
ready to transmit data
ceived
9
Multiple Bus Masters: t
To obtain access to the bus
Bus arbitration scheme:
A bus master wanting to use the
A bus master cannot use the bu
A bus master must signal to the
Bus arbitration schemes usua
Bus priority
Fairness and starvation
Bus arbitration schemes can b
classes:
Daisy chain arbitration: single d
Centralized, parallel arbitration
Distributed arbitration by self-se
places a code indicating its iden
Distributed arbitration by collisio
the Need for Arbitration
e bus asserts the bus request
us until its request is granted
e arbiter after it finishes using the bus
ally try to balance two factors:
be divided into four broad
evice with all request lines.
election: each device wanting the bus
ntity on the bus.
on detection: Ethernet uses this.
10
Increasing the B
Separate versus multiplexed a
Address and data can be transm
address and data lines are avai
Cost: (a) more bus lines, (b) inc
Data bus width:
By increasing the width of the da
require fewer bus cycles
Example: SPARCstation 20’s m
Cost: more bus lines
Block transfers:
Allow the bus to transfer multiple
Only one address needs to be s
The bus is not released until the
Cost: (a) increased complexity
(b) decreased response t
Bus Bandwidth
address and data lines:
mitted in one bus cycle if separate
ilable
creased complexity
ata bus, transfers of multiple words
memory bus is 128 bit wide
e words in back-to-back bus cycles
sent at the beginning
e last word is transferred
time for request
11
Increasing Bus T
Overlapped operations (pip
perform arbitration for next tr
transaction
initiate next address phase d
Bus parking
master holds onto bus and p
long as no other master mak
Split-phase (or packet switc
completely separate address
arbitrate separately for each
address phase yield a tag wh
”All of the above” in most m
busses
Transaction Rate
pelined)
ransaction during current
during current data phase
performs multiple transactions as
kes request
ched) bus
s and data phases
hich is matched with data phase
modern processor-memory
12
PCI
Release 2.1 -- 66MHz, 32
3.3V or 5V based on PCI chi
1 12,13
3.3V key
Agent, bus master (initiat
Bus transaction :
bus masters issue requests
issues address and comman
(transaction)
memory, I/O, configuration r
a target is selected (device s
it is ready to complete the da
Bus
2-bit and 64-bit connectors.
ip set’s buffer/drivers
50,51 62 94
5V key 64-bit portion
tor) and slave (target)
arbitration bus grant
nd and begins a cycle frame
read/write commands
select)
ata transfer phase
13
PCI Bus Op
Address phase
At the same time, initiator identi
transaction
The initiator assert the FRAME#
Every PCI target device latch th
Data Phase
Number of data bytes to be tran
of Command/Byte Enable signa
Both of initiator and target must
IRDY# and TRDY# used
Transaction completion and re
By deasserting the FRAME# bu
When the last data transfer has
bus to idle state by deasserting
peration
ifiers target device and the type of
# signal
he address and decode it
nsformed is determined by the number
als asserted by initiator
be ready to complete data phase
eturn of bus to idle state
ut asserting IRDY#
completed the initiator returns the PCI
IRDY#
14
PCI Read/Write
All signals sampled on risin
Centralized Parallel Arbitrati
overlapped with previous tran
All transfers are (unlimited)
Address phase starts by ass
Next cycle “initiator” asserts
Data transfers happen when
IRDY# asserted by master wh
TRDY# asserted by target wh
transfer when both asserted o
FRAME# deasserted when m
only one more data transfer
e Transactions
ng edge
ion
nsaction
bursts
serting FRAME#
s cmd and address
n
hen ready to transfer data
hen ready to transfer data
on rising edge
master intends to complete
r
15
PCI Bus
A typical PCI read transaction
s Signals
PCI master device
CLK AD[63:32]
FRAME C/BE[7:4]
IRDY REQ64
TRDY ACK64
DEVSEL Misc
STOP control
INT REQ
C/BE[3:0] BIST
signals
AD[31:0]
Error
reporting
REQ
GNT
RST
16
PCI Lin
CLK – PCI input clock
All signals sampled on rising, allow
RST# -- asynchronous reset
PCI device must tri-state all I/Os d
TRDY# –
When the target asserts this signa
send or receive data
STOP# –
Used by target to indicate that it ne
DEVSEL# –Device select
When a target recognizes its addre
corresponding transaction
FRAME# – Signals the start and en
IRDY# –
Assertion by initiator indicates that
nes (1)
wed to vary from 0 to 33 MHz
during reset
al, it tells the initiator that it is ready to
eeds to terminate the transaction
ess, it asserts DEVSEL# to claim the
nd of a transaction
t it is ready to send receive data
17
Address and
AD[31:0] – I/O
32-bit address/data bus
PCI is little endian (lowest numeric
C/BE#[3:0] – I/O
4-bit command/byte enable bus
Defines the PCI command during a
Indicates byte enable during data p
byte enable for AD[7:0]
PAR – I/O
Parity bit, used to verify correct tra
command/byte-enable
The XOR of AD[31:0], C/BE#[3:0],
parity)
Data Signals
c index is LSB)
address phase
phases, for example, C/BE#[0] is the
ansmittal of address/data and
, and PAR should return zero (even
18
Arbitration and
REQ# – O
Asserted by initiator to request bu
Point-to-point connection to arbite
GNT# – I
Asserted by system arbiter to gran
Point-to-point connection from arb
line
PERR# – I/O
Indicates that a data parity error h
An agent that can report parity err
during PCI configuration
SERR# – I/O
Indicates a serious system error h
error
May invoke NMI (non-maskable in
d Error Signals
us ownership
er – each initiator has its own REQ# line
nt bus ownership to the initiator
biter – each initiator has its own GNT#
has occurred
rors can have its PERR# turned off
has occurred, such as address parity
nterrupt, i.e., a restart) in some systems
19
Example – B
A four-DWORD burst from an
Addressing, handshaking, and
Basic Write
initiator to a target
d data transfer phases
20
Write Example –
The initiator has a phase
First data can be transferred
address +data = “3”)
The 2nd, 3rd, and last data are
1”)
If the profile is 5-1-1-1
Medium decode – DEVSEL#
FRAME#
One clock period of latency (
the transfer
DEVSEL# asserted on clock
clock 4
Total of 4 data phases, but re
Only 50% efficiency
– Things to Note
e profile of 3-1-1-1
d in three clock cycles (idle +
e transferred one cycle each (“1-1-
# asserted on 2 nd clock after
(or wait state) in the beginning of
k 3, but TRDY# not asserted unti
equired 8 clocks
21
Target Addre
PCI uses distributed add
— A transaction begins over
— Each potential target on th
PCI address to determine wh
assigned address space
– One target may be assign
another, and would thus res
— The target that owns the P
transaction by asserting DEV
ess Decoding
dress decoding
r the PCI bus
he bus decodes the transaction’s
hether it belongs to that target’s
ned a larger address space than
spond to more addresses
PCI address then claims the
VSEL#
22
More T
Turnaround cycle
“Dead” bus cycle to prevent b
Wait state
A bus cycle where it is possib
transfer occurs
Wait states may be inserted
target
Target deasserts TRDY# to
Initiator deasserts IRDY# to
Target termination
Either agent may signal the e
The target signals terminatio
The initiator signals complet
Terms
bus contention
ble to transfer data, but no data
dynamically by the initiator or
o signal it is not ready
o signal it is not ready
end of a transaction
on by asserting STOP#
tion by deasserting FRAME#
23
Zero and On
A one-wait-state agent ins
beginning of each data ph
This is done if an agent – built
pipeline critical paths internally
Reduces bandwidth by 50%
The need to insert a wait s
only when the agent is sou
target read)
This is because such an agen
counterpart’s xRDY# signal to
then fan out to 36 or more cloc
possibly C/BE#[3:0]) to drive th
bus . . . all within 11 ns!
ne Wait State
serts a wait state at the
hase
t in older, slower silicon – needs to
y
state is typically an issue
urcing data (initiator write or
nt would have to sample its
o see if that agent accepted data,
ck enables (for AD[31:0] and
he next piece of data onto the PCI
24