Introduction
Co
to Reconfigurable
omputing
4
Introduction to Re
l Configurable Comp
Increase Performanc
Efficiency Through
FPGA and FPGA-li
l Hardware Algorithm
Of CC Modules Mu
Perform Software P
l Factors Impacting th
Õ Logic Speed
Õ Speed Of Reconfigurati
Õ Flexibility Of Configura
econfigurable Computing
puting (CC) Attempts To
ce And Silicon Utilization
h Logic Recycling using
ike Devices
ms Can Be “Paged” Into/Out
uch As Operating Systems
Paging
he Performance
ion
ation
5
Resourc
l Standard Microprocess
Õ Specialized Unit For Each
Task
Õ Unit Functionality Fixed
Õ Idle Units Lower Silicon U
Õ Basic Algorithms Fixed
l Reconfigurable Process
Õ Each Unit Specialized To
Õ Unit Functionality Alterab
Time
Õ Idle Units Reconfigured F
Tasks
Õ Basic Algorithms Can Be
Application
ce Utilization
sor Micro Code Address Generation
Clock Gen.
h Essential
Utilization ALU Registers Cache
FPU and
sor I/O
Fit Task
ble At Run
For New
Tailored To
6
FPGAs v
l FPGAs can suppor
l FPGAs outperform
Õ Parallelism in the a
Õ Simple operations i
Õ FPGAs provide gre
power
Õ Large data sets, low
Õ Simple control
l DSPs outperform F
Õ MAC operations
Õ Complex arithmetic
vs. DSPs
rt multiple memory ports
m DSPs:
algorithm
in a fixed sequence
eater computational density using less
w resolution (8 - 12 bits)
FPGAs
c
7
Colt Inte
egrated Circuit
Colt Prototype
HP 0.5um 3 Metal,
PGA-132
(MOSIS)
16 FUs, XBar, DPs
5.5mm x 6.1mm
50 MHz
Full-scale device:
Stallion 8
2nd Genera
The
l Successor of the
l Six data ports ac
flow control
l Smart crossbar fo
programming and
data-ports and m
l Two IFU meshes
l Ready for fabrica
ation Processor--
e Stallion
Colt chip
chieving basic pipelined data-
or the purpose of passing
d data words to and from
meshes
s and 4 multipliers
ation
9
The Stallio
Alloca
PPrrooggrraammmmaabblele
DDaatataPPoorrtsts
““SSmmaarrt”t”CCrroossssbbaarr
NNeetwtwoorrkk
on Organization
able Resources IIFFUUMMEESSHH
((ccoommppuutatatitoionnaal)l)
Stream I/O
IInnteteggeerr 10
MMuultlitpiplileierrss
((aallloloccaabblele))
Example Sub
Port Left
1 Right
1 Y is HMiguhltiplLieor
Y valid w
~0
Pass
Valid if
0 Load 0
if F2=1
Pass else
load
valid
data
1
Dec Resul YY
t >=0
Pass
Output
Valid if 1 if Select Y Y
Y=0 if v
F1 Delay
Delay Delay F1
Y Y Y
Pass
Delay
Valid if F2=1 Select Y Y is Valid if F2= F1
if vali Delay 1
Delay d
F2 F2
Port Left Port Right
O3 verflow 4Result
Factorial
b-Mesh Mapping
4x4 sub matrix of IFUs
Factorial computation
Demonstrates conditional
Yis execution capabilities
valid
Configured in < 30 usec
11
System B
Crossbar Slot Slot
Slot Slot
Crossbar
Board Layout
Features
Crossbar • Each slot
Crossbar contains a single
port
• Clusters
connected using
a module to
bridge adjacent
slots
• Bridging
extendible to
other system
boards
• System is
inherently
scalable
12
Core Computin
l XILINX FPGA (currently
l Problem: Pipeline process
current ASIC design practi
l Solution:
l Colt chip (fabricated and te
Õ 0.8 um HP CMOS proces
Õ Run time configurable
Õ 50 MHz clock
l Stallion chip (designed but
Õ 0.5 um HP CMOS proces
Õ 64 functional units in mes
Õ Dedicated multiplier
Õ Six data ports
Õ 100 MHz clock
ng Component
used in test-bed)
sing fast but not readily modified with
ice
ested)
ss fabricated by MOSIS
t not yet fabricated)
ss
sh
13