The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

5 Introduction to Reconfigurable Computing l Configurable Computing (CC) Attempts To Increase Performance And Silicon Utilization Efficiency Through Logic Recycling using

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2017-03-13 03:20:03

Introduction to Reconfigurable Computing - Spread spectrum

5 Introduction to Reconfigurable Computing l Configurable Computing (CC) Attempts To Increase Performance And Silicon Utilization Efficiency Through Logic Recycling using

Introduction
Co

to Reconfigurable
omputing

4

Introduction to Re

l Configurable Comp
Increase Performanc
Efficiency Through
FPGA and FPGA-li

l Hardware Algorithm
Of CC Modules Mu
Perform Software P

l Factors Impacting th

Õ Logic Speed
Õ Speed Of Reconfigurati
Õ Flexibility Of Configura

econfigurable Computing

puting (CC) Attempts To
ce And Silicon Utilization
h Logic Recycling using
ike Devices
ms Can Be “Paged” Into/Out
uch As Operating Systems
Paging
he Performance

ion
ation

5

Resourc

l Standard Microprocess

Õ Specialized Unit For Each
Task

Õ Unit Functionality Fixed
Õ Idle Units Lower Silicon U
Õ Basic Algorithms Fixed

l Reconfigurable Process

Õ Each Unit Specialized To
Õ Unit Functionality Alterab

Time
Õ Idle Units Reconfigured F

Tasks
Õ Basic Algorithms Can Be

Application

ce Utilization

sor Micro Code Address Generation
Clock Gen.
h Essential

Utilization ALU Registers Cache
FPU and
sor I/O

Fit Task
ble At Run

For New

Tailored To

6

FPGAs v

l FPGAs can suppor
l FPGAs outperform

Õ Parallelism in the a
Õ Simple operations i
Õ FPGAs provide gre

power
Õ Large data sets, low
Õ Simple control

l DSPs outperform F

Õ MAC operations
Õ Complex arithmetic

vs. DSPs

rt multiple memory ports
m DSPs:

algorithm
in a fixed sequence
eater computational density using less
w resolution (8 - 12 bits)

FPGAs

c

7

Colt Inte

egrated Circuit

Colt Prototype

HP 0.5um 3 Metal,
PGA-132
(MOSIS)

16 FUs, XBar, DPs

5.5mm x 6.1mm

50 MHz

Full-scale device:

Stallion 8

2nd Genera
The

l Successor of the
l Six data ports ac

flow control
l Smart crossbar fo

programming and
data-ports and m
l Two IFU meshes
l Ready for fabrica

ation Processor--
e Stallion

Colt chip
chieving basic pipelined data-

or the purpose of passing
d data words to and from
meshes
s and 4 multipliers
ation

9

The Stallio

Alloca

PPrrooggrraammmmaabblele
DDaatataPPoorrtsts

““SSmmaarrt”t”CCrroossssbbaarr
NNeetwtwoorrkk

on Organization

able Resources IIFFUUMMEESSHH
((ccoommppuutatatitoionnaal)l)

Stream I/O

IInnteteggeerr 10
MMuultlitpiplileierrss
((aallloloccaabblele))

Example Sub

Port Left
1 Right

1 Y is HMiguhltiplLieor
Y valid w
~0
Pass

Valid if

0 Load 0
if F2=1

Pass else
load
valid
data

1

Dec Resul YY
t >=0
Pass
Output

Valid if 1 if Select Y Y
Y=0 if v

F1 Delay

Delay Delay F1

Y Y Y
Pass
Delay
Valid if F2=1 Select Y Y is Valid if F2= F1
if vali Delay 1
Delay d
F2 F2

Port Left Port Right
O3 verflow 4Result

Factorial

b-Mesh Mapping

 4x4 sub matrix of IFUs
 Factorial computation
 Demonstrates conditional
Yis execution capabilities

valid

 Configured in < 30 usec

11

System B

Crossbar Slot Slot

Slot Slot

Crossbar

Board Layout

Features

Crossbar • Each slot
Crossbar contains a single
port

• Clusters
connected using
a module to
bridge adjacent
slots

• Bridging
extendible to
other system
boards

• System is
inherently
scalable

12

Core Computin

l XILINX FPGA (currently

l Problem: Pipeline process
current ASIC design practi

l Solution:
l Colt chip (fabricated and te

Õ 0.8 um HP CMOS proces
Õ Run time configurable
Õ 50 MHz clock

l Stallion chip (designed but

Õ 0.5 um HP CMOS proces
Õ 64 functional units in mes
Õ Dedicated multiplier
Õ Six data ports
Õ 100 MHz clock

ng Component

used in test-bed)
sing fast but not readily modified with
ice
ested)

ss fabricated by MOSIS

t not yet fabricated)

ss
sh

13


Click to View FlipBook Version