OVERVIEW OF OVERSAMPLING CLOCK AND DATA
RECOVERY CIRCUITS
S. I. Ahmed Tad A. Kwasniewski
Carleton University Carleton University
Department of Electronics Department of Electronics
Ottawa ON K1S 5B6 Ottawa ON K1S 5B6
email: [email protected] email: [email protected]
Abstract NRZ data Symbol period = Bit period
Phase-Locked Loop (PLL) based Clock and Data Recovery Recovered clock T
(CDR) circuits use a 2x-oversampling (2XO) of the incoming
Non-Return to Zero (NRZ) data stream to recover the data. As falling edge aligned rising clock edge
an extension of the idea, 3x-oversampling (3XO) CDR circuits with data transitions makes a decision
provide improved performance in the presence of total asym-
metric jitter. This paper presents an overview of the oversam- Fig. 1. Conventional 2x-oversampling concept
pling CDR circuits with an emphasis on digital architectures.
These include, but are not limited to, the 3XO jitter-tolerant improve CDR performance. We will review the mixed-signal,
variable-interval 3XO architecture, the 3XO eye-tracking archi- digital and all-digital implementations of 3XO-CDR circuits
tecture, and the blind oversampling architecture. We propose a that have been presented in the literature during the last decade.
modified architecture that utilizes multiple rotating phases to We will present a modified CDR architecture that improves one
improve the performance of the 3XO eye-tracking architecture. aspect of an existing 3XO design. Simulation results are pre-
sented that show the equivalence of the new architecture with
Keywords: CDR, PLL, DR, Oversampling, Jitter Tolerance the existing one. Using this result, some other research possibil-
ities are also mentioned.
1. INTRODUCTION
2. FILTER-TYPE CDR CIRCUIT
This paper concentrates on CDR architectures used at the
receiver end of a Serializer-Deserializer (SERDES) chain. A The main task of CDR circuits is to recover the clock that is
CDR circuit has to counter the amplitude and phase degrada- not transmitted with the NRZ data in order to save power and
tions induced by the transmitter, channel and the receiver as it avoid skew at the transmitter end. The block diagram of a filter-
recovers the clock and retimes the data [1]. The channel can be style or a microwave-style CDR is shown in Fig. 2 [3]. The
a satellite-link, a fiber-optic cable, a coaxial cable, a backplane clock component has to be created in the received data spec-
Copper trace or an integrated circuit (IC) interconnect over a trum using an edge-detector block (d/dt) followed by a non-lin-
semiconductor substrate each with its own challenges and limi- earity (x2) and, subsequently, to be isolated using a high-Q
tations [2]. With the IC technologies undergoing constant fea- Bandpass Filter (BPF) or an external Surface-Acoustic Wave
ture-size reduction, it is in the interest of portability and reduced
time-to-market that digital, adaptive and automatically synthe- Variable Retimed
sizable CDR architectures be used and re-used. delay data
We will briefly discuss the filter-type CDR circuits to elabo- NRZ data DQ
rate the point that a PLL is not necessarily required to imple- + jitter
ment a CDR circuit. Nevertheless, PLLs provide a cost-
effective and a well-studied CDR implementation alternative CK
despite the numerous circuit- and system-design challenges
associated with them. As shown in Fig. 1, one of the clock d/dt x2 Recovered
edges is aligned with the data edges using a PLL. The other BPF clock
clock edge samples the incoming data to implement a 2XO-
CDR circuit with a maximum timing margin of 0.5 Unit Inter- Fig. 2. Filter-type CDR circuit [3]
vals peak-to-peak (UI p-p). A logical extension of this idea
would be to acquire more samples per period in order to
0-7803-8886-0/05/$20.00 ©2005 IEEE 1876
CCECE/CCGEI, Saskatoon, May 2005
Authorized licensed use limited to: Carleton University. Downloaded on July 17, 2009 at 14:01 from IEEE Xplore. Restrictions apply.
(SAW) filter. A high-Q on-chip filter that is implemented using Full rate NRZ data Symbol period = Bit period
a PLL with a narrow loop bandwidth is also being actively
researched in conjunction with novel receiver architectures [4]. Recovered clock uses both edges T
A filter-type CDR circuit using MESFET-based off-the-shelf
chips has been reported [5] with a bit rate of 35.1 Gb/s. By D+ve D+ve and D-ve to be
using this as a simulation technique, however, a large number of D-ve multiplexed
CDR circuits in a system can be replaced by black boxes to
reduce simulation times. Fig. 4. Half-rate PLL-type CDR concept
3. PLL-BASED CDR ARCHITECTURE cycle problems, data feedthrough, increased VCO jitter (due to
high-VCO gain resulting from supply voltage reduction) and
In a conventional PLL-based CDR circuit shown in Fig. 3, poor performance in the presence of asymmetric jitter. In order
the clock is recovered using a PLL. The PLL acts like high-Q to achieve high data rates while maintaining an acceptable per-
bandpass filter that can switch from one frequency to another formance, reduced-rate architectures are employed [13]. A
electronically, if there exists a frequency divider (not shown), in Novel 1/8th-rate PD implementation is reported in [14].
the feedback loop. A D-type flipflop forms the Data Recovery
(DR) or the decision circuit. The operating principle for this 3.2. Reduced-Rate CDR Architectures
CDR was introduced earlier in Fig. 1. In practice, one would
have to carefully analyze jitter, systematic skews, effect of long In order to sample the full-rate data stream with a half-rate
run-lengths and acquisition of lock problems [6]. PLL-type VCO, one has to use both the edges of the recovered clock as
CDR circuits are roughly classified according to the type of shown in Fig. 4 and later multiplex the two data streams
Phase Detector (PD) used. The most important ones are the lin- labelled D+ve and D-ve. The PD could either be linear [15] or
ear and the non-linear CDR structures. A linear CDR circuit non-linear [16]. If further reduction in clock rate is sought, one
uses a Hogge’s Phase Detector [7] or one of its variants [8], can use additional equi-distant phases provided by a locked
whereas a non-linear CDR circuit uses a Bang-Bang Phase oscillator. The main problem in such a case becomes the gener-
Detector (BBPD). It is also known as ‘Alexander PD’ [9] or an ation of equi-distant phases, modeling and capacitance issues
‘Early-Late PD’. Bang-Bang PDs provide high gain and there- associated with multiplexer and de-multiplexer circuit design.
fore require no charge-pump (CP), no limiting pre-amplifier,
automatic compensation of timing offsets since these PDs use 4. VARIABLE-INTERVAL OVERSAMPLING
sampled data, the possibility of multi-bit sampling in one clock CDR ARCHITECTURE
cycle to implement a reduced-rate architecture and no fre-
quency drifts during long run-lengths [10]. A sample-and-hold
style PD that combines the advantages of a linear-PD and a
BBPD has been reported in [11]. For more information on mod-
ern PD architectures, the reader can refer to [12].
3.1. Disadvantages A mixed-signal, variable-interval 3XO CDR circuit has
been proposed in [17][18]. This 3XO concept [17] is shown in
Traditional PLL-based CDR circuits suffer from device Fig. 5 . Using a VCO and a DLL combination, the architecture
speed limitations with increasing data rates, degradation of on- tracks the data edges and then puts the sampling strobe exactly
chip Q for inductors (if an LC-VCO is used), 50 percent duty- in the centre of the two data edges. In this quarter-rate architec-
ture operating at 5 Gb/s (shown in Fig. 6), four NRZ data bits
Retimed data
NRZ data + jitter DQ
Phase CK
Detector Recovered
VCO clock
Timing CP LPF
Recovery
PLL
Fig. 3. Linear PLL-based CDR architecture Fig. 5. Jitter pdf tracking 3XO concept [17]
1877
Authorized licensed use limited to: Carleton University. Downloaded on July 17, 2009 at 14:01 from IEEE Xplore. Restrictions apply.
NRZ sampler Edge-Detect / DSP
data Buffer(FIFO) control
block
3XO Interpolate
clock missing Pick
edges Data
Fig. 8. Blind Oversampling Architecture block diagram
Fig. 6. Variable Interval Oversampling Architecture with Motorola’s 16x-oversampling Universal Synchronous and
block diagram [18] Asynchronous Transceiver (UART) chips and later used in
other similar industrial derivatives and variants [21]. An odd-
are recovered every clock cycle. The ring oscillator locks to number of samples (three or five) are acquired per bit. The data
1.25 GHz using a ‘reference loop’. Once the frequency-lock is edges are detected using an XOR gate (and the missing ones are
signaled by the ‘lock-detect’ circuit, the control voltages for the interpolated digitally). The sample picked using the center-
DLL and the PLL become independent of each other. The DLL phase is declared as the correct data bit. Majority-voting can
tracks the jitter probability density function (pdf) of the incom- also be employed but is less superior than center-picking [20].
ing data edges using the ‘eye-measuring loop’ and puts the sam- The block diagram of the architecture appears in Fig. 8. Its main
pling clock edge in the centre of the two edge-clocks. The BER advantage is the all-digital nature and the main disadvantage is
is improved and a high-frequency jitter tolerance of 0.65 UI p-p the extra power and increased latency due to the DSP core and
in the presence of asymmetric jitter is achieved. the algorithm. This operation was named ‘phase-picking’ [20],
although a data bit acquired by a particular phase is picked and
As opposed to the 2XO architectures, this 3XO circuit does not the phase itself. The ambiguity can be resolved by the con-
not acquire equidistant samples except at the end of tracking text of the discussion.
stage, so it is not strictly a 3XO CDR circuit. The authors call it
a ‘variable-interval’ oversampling circuit. It is possible to dis- 6. EYE-TRACKING DR ARCHITECTURE
card the extra information (or rather not collect it all) due to the
presence of a three loops that track the data edges so that three An all-digital DR circuit core has been presented in [22] as
samples are collected only where these are the most relevant. shown in Fig. 9. This is a 3XO architecture that tracks the data
eye instead of the data edges. The edge-detection interval is
5. PHASE-PICKING OR BLIND OVERSAM- realized using CMOS style delay elements in the BBPD. The
PLING CDR ARCHITECTURE decisions are accumulated in a serial shift register and a rotating
clock phase pointer either accelerates or decelerates the phase
The blind oversampling [19] (also called ‘phase-picking’ in addition to providing an unlimited phase-range to the DR cir-
[20]) concept is shown in Fig. 7. This oversampling concept has cuit. The paper provides excellent design information and equa-
been around since the early 1970s when it was originally used tions. Fig. 10 shows the 3XO concept for this DR circuit. One
has to detect the edges and keep the recovered clock away from
NRZ these edges. The sampling occurs at the centre of the eye and
data provides a low BER and a high-frequency jitter tolerance of 0.7
3XO
clock
Edge
Detect
Data 1 0 0 1 1 0 1 0 1 0 1 Fig. 9. Eye-Tracking DR circuit block diagram [22]
Picking
Fig. 7. Blind oversampling concept
1878
Authorized licensed use limited to: Carleton University. Downloaded on July 17, 2009 at 14:01 from IEEE Xplore. Restrictions apply.
Din BBPD
CKD_Early
CKD_Center DQ
CKD_Late
CK Dout DR circuit
DQ
CK
DQ
CK
Fig. 10. Eye-Tracking 3XO concept Phase DLL-based local
Switcher INC/DEC Phase fref
UI p-p. The architecture is compliant with the SFI-5 specifica-
tion. Of all the architectures reviewed, this consumes the least N-phases Generator
power and area as well.
Fig. 11. Block Diagram of the Modified Architecture
7. OTHER OVERSAMPLING CDR ARCHI-
TECTURES data frequency. The BBPD architecture shown in [22] (also see
Fig. 9) is retained, but the delay elements are removed and the
A true phase-picking or phase-selection CDR architecture Early, Centre and Late Phases as shown in Fig. 11 are used to
would pick and use one of the phases directly [23][24] or feed it clock the three front-end flipflops. This implements a 3XO all-
back to another PLL in order to achieve further dithering of the digital CDR architecture that doesn’t suffer from PVT induced
jitter [23]. Some of the other important publications in this field jitter tolerance shifts due to CMOS type delay blocks.
are [25][26][27][28]. One noteworthy entry is a quad 3.125
transceiver chip featuring an analog phase rotator [29]. Due to The phase resolution would depend on the speed of the tech-
limited space, we now limit this overview and present our mod- nology and the design requirements for the DLL. For achieving
ified DR architecture.
Jitter
8. MODIFIED DR ARCHITECTURE Frequency
In our previous paper [30], we swept three critical Digital IN – 1 Recovered
PLL (DPLL) parameters for the eye-tracking architecture [22]; Clock Phase
i.e., the edge-detection interval of the BBPD, the number of
clock phases and the phase update interval, in order to investi- I0
gate their effects on the jitter tolerance of the DR circuit. The
edge-detection interval for the DR circuit is a critical parameter I1 (a)
of this design. Its value has a conflicting influence on the high-
and low- frequency jitter tolerance of the DR circuit. If the I2
edge-detection window is too narrow, data edges cannot be
detected effectively and the low-frequency tracking ability of Jitter
the DPLL suffers. If the edge-detection window is too wide, the Frequency Center
high-frequency jitter tolerance deteriorates but the tracking
improves. The simulations were performed in Matlab/Simulink. Phase
Our modified architecture ameliorates this problem by Early IN – 1 I0 Late
removing the CMOS style delay blocks in the BBPD altogether. Phase Phase
Instead we rely on the presence of a DLL-based phase generator
[31] that can provide equidistant phases with much less Process, I1 (b)
Voltage and Temperature (PVT) dependence, perhaps for many
on-chip data lanes. In addition, we not only use one rotating I2
phase as shown in Fig. 12(a) [22], but three rotating phases, a
fixed distance apart (as seen in Fig. 12 (b)). The difference Fig. 12. Phase Rotation Concept (a) from [22] (b)
between this circuit and a conventional VCO type circuit would The modified architecture
be the unlimited phase range due to three rotating phases and
the fact that the phases generated by the DLL are not synchro-
nized to the data stream, although they have to be close to the
1879
Authorized licensed use limited to: Carleton University. Downloaded on July 17, 2009 at 14:01 from IEEE Xplore. Restrictions apply.
finer resolutions with slower CMOS technologies, a phase- • Changing from the full-rate architecture to half-rate or
interpolator could be used. With the improvement in the speed quarter-rate architecture and having a multiplicity of flip-
of the technology, the entire design could be implemented using flops in the BBPD clocked by DLL generated phases. The
digital-style delay cells in the DLL with sufficient resolution. multiple phases would still keep rotating.
8.1. Simulation results This requires the addition of some digital circuitry and
would produce a DR architecture that is all-digital, and mea-
Matlab simulation results available at the time were sures the eye-opening digitally with low power dissipation.
reported in [30]. In order to maintain continuity, we report the
comparison of the jitter tolerance for our version of the refer- 9. Conclusions
ence architecture and our modified architecture using the Mat-
lab/Simulink platform. The number of clock phases is eight. • Following the basic clock recovery concept behind a filter-
The phase update interval is 16 bits and the clock phases main- type CDR circuit, a brief review of PLL-style 2x-oversam-
tain a 0.25 UI separation. Fig. 13 shows that the two approaches pling CDR architectures was presented. Several 3x-over-
produce equivalent results for the jitter tolerance. A typical sim- sampling CDR architectures were reviewed including the
ulation of 2 Ps is shown in Fig. 14 (5000 bits are not shown). eye-tracking, jitter-tolerant variable-oversampling and the
The DR circuit uses three clock phases, a fixed distance (0.25 blind oversampling architectures.
UI) apart and comfortably tracks a 0.5 MHz, 8.5 UI p-p jitter
sinusoid as it recovers the NRZ data at 2.5 Gb/s. The simulation • A modified architecture was presented that minimizes the
is self-explanatory, except that any spike reaching a +1 thresh- shifts in jitter tolerance performance of the eye-tracking
old would have meant a bit error in Fig. 14(e) [30]. architecture using a 3XO architecture, with three rotating-
8.2. Future Research
If we take a closer look, the eye-tracking architecture and (a)
the jitter-tolerant variable-oversampling architecture can both (b)
be merged using the modified architecture. One would have to (c)
make two significant adjustments. These would be:
• The static distance between the early, center and late
phases could become dynamic, similar to the one presented
in [17], limited by the available phase resolution from the
DLL circuitry. This would allow one to measure the eye-
opening digitally. This information can be used to control
equalizers in the overall SERDES architecture.
(d)
(e)
Fig. 13. Jitter Tolerance Comparison: Reference Fig. 14. Typical Simulation Results (a) Early Clock
Design vs Modified design Phase (b) Center Clock Phase (c) Late Clock Phase
(d) Amplitude of added sinusoidal jitter (e) Residual
Phase Error
1880
Authorized licensed use limited to: Carleton University. Downloaded on July 17, 2009 at 14:01 from IEEE Xplore. Restrictions apply.
phases instead of one rotating phase, and utilizing a DLL- [15] J. Savoj, B. Razavi, “A 10-Gb/s CMOS clock and data recovery
based phase generator. circuit with a half-rate linear phase detector,” IEEE J. Solid-
State Circuits, vol. 36, no. 5, May 2001, pp. 761-768
• The two architectures were found equivalent with respect
to their jitter tolerance performance. [16] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery
Circuit with a half-rate binary Phase/Frequency Detector,” IEEE
• Some possibilities about work in progress were also men- J. Solid-State Circuits, vol. 38, no. 1, Jan. 2003, pp. 13-21
tioned. Current work is being done using the Verilog-A
platform, the added benefits of which will be described in [17] S-H. Lee, M.-S. Hwang et al, “A 5-Gb/s 0.25-Pm CMOS Jitter-
another publication along with a pertinent comparison of Tolerant Variable-Interval Oversampling Clock/Data Recovery
the simulation results. Circuit,” IEEE J. Solid-State Circuits, vol. 37, no. 12, Dec.
2002, pp. 1822-1830
Acknowledgements
[18] S-H. Lee, M.-S. Hwang et al, “A 5-Gb/s 0.25-Pm CMOS Jitter-
The authors would like to thank the Ministry of Science and Tolerant Variable Interval Oversampling Clock/Data Recovery
Technology, Government of Ontario Canada, NSERC, Altera Circuit,” ISSCC Dig., 2002, paper 15.5
Canada and Carleton University for their financial support.
Thanks to the many colleagues and friends for sharing their [19] J. Kim, D-K. Jeong, “Multi-Gigabit-Rate Clock and Data Recov-
expertise and for their patience during long discussions. ery Based on Blind Oversampling,” IEEE Communications
Mag., Dec. 2003, pp. 68-74
References
[20] C-K. K. Yang, “Design of High-Speed Serial Links in CMOS”,
[1] B. Razavi, in Monolithic Phase-Locked Loops and Clock Recov- Ph.D. Dissertation, CSL-TR-98-775, December 1998, Stanford
ery Circuits, Theory and Design, IEEE Press 1996 University, California
[2] M. Horowitz, C-K. K. Yang, S. Sidiropoulos, “High-Speed Elec- [21] R. Wagner, DAN 108, Data Comm. Application Note, EXAR Cor-
trical Signaling: Overview and Limitations,” IEEE Micro, 1998, poration, California
pp. 12-24
[22] Y. Miki et. al, “A 50-mW/ch 2.5-Gb/s/ch Data Recovery Circuit
[3] R. Walker, “Clock and Data Recovery for Serial Digital Commu- for the SFI-5 Interface with Digital Eye-Tracking,” IEEE J.
nications,” ISSCC Short Course, Feb. 2002 Solid-State Circuits, vol. 39, no. 4, Apr. 2004, pp 613-621
[4] C. DeVries, R. Mason, “A 0.18 Pm CMOS 900 MHz receiver [23] P. Larsson, “A 2-1600 MHz CMOS Clock Recovery PLL with
front-end using RF Q-enhanced filters,” ISCAS 2004, May 2004, Low-Vdd Capability,” IEEE J. Solid-State Circuits, vol. 34, no.
vol. 4, pp. IV-325-328 12, Dec. 1999, pp. 1951-1960
[5] K. Murata, T. Otsuji, “A Novel Clock Recovery Circuit for Fully [24] J. M. Khoury, K. R. Lakshmikumar, “High-Speed Serial Trans-
Monolithic Integration,” IEEE Trans. on Microwave Theory and ceivers for Data Communication Systems,” IEEE Comms. Mag.,
Tech., vol. 47, no. 12, Dec. 1999, pp. 2528-2533 Jul. 2001, pp. 160-165
[6] B. Razavi, “Challenges in the Design of High-Speed Clock and [25] Y. Choi, D-K. Jeong, W. Kim, “Jitter Transfer Analysis of Tracked
Data Recovery Circuits,” IEEE Communications Magazine, Oversampling Techniques for Multigigabit Clock and Data
Aug. 2002, pp. 94-101 Recovery,” IEEE Trans. Circuits and Systems II, Analog and
Digital Signal Processing, vol. 50, no. 11, Nov. 2003
[7] C. R. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE
J. Lightwave Tech.,vol. 3, Dec. 1985, pp. 1312-1314 [26] Y-H. Lin, S.H.-L. Tu, “A pipelined serial data receiver with over-
sampling techniques for high-speed data communications,”,
[8] T. H. Lee, J. F. Bulzacchelli, “A 155-MHz Clock Recovery Delay- IEEE Conf. Electron Devices and Solid-State Circuits, 16-18th
and Phase-Locked Loop,” IEEE J. Solid-State Circuits, vol. 27, Dec. 2003
no. 12, Dec. 1992, pp. 1736-1745
[27] S. Kim, K. Lee, D-K. Jeong, D. D. Lee, A. G. Nowatzyk, “An 800
[9] J. D. H. Alexander, “Clock Recovery from Random Binary Data,” Mbps multi-channel CMOS serial link with 3x oversampling,”
Elect. Lett., vol. 11, Oct. 1975, pp. 541-542 IEEE Proc. CICC ‘95, pp. 451-455
[10] J. Lee, K. S. Kundert, B. Razavi, “Analysis and Modeling of [28] C-K. K. Yang, R. Farjad-Rad, M. A. Horowitz, “A 0.5-Pm CMOS
Bang-Bang Clock and Data Recovery Circuits,” IEEE J. Solid 4.0-Gbit/s serial link transceiver with data recovery using over-
State Circuits, vol 39, no. 4, Apr. 2004, pp. 613-621 sampling,” IEEE J. Solid-State Circuits, vol. 33, no. 5, May
1998, pp. 713-722
[11] S. B. Anand, B. Razavi, “A CMOS Clock Recovery Circuit for
2.5Gb/s NRZ Data,” IEEE J. Solid-State Circuits, vol. 36, no. 3, [29] D. Zheng, X. Jin, E. Cheung, M. Rana, G. Song, Y. Jiang, Y-H.
Mar. 2001, pp. 432-439 Sutu, B. Wu, “A Quad 3.125 Gb/s/Channel Transceiver withAn-
alog Phase Rotators,” ISSCC Dig. Tech. papers, 2002, paper 4.2
[12] S. Soliman, F. Yuan, K. Raahemifar, “An Overview Of Design
Techniques For CMOS Phase Detectors,” ISCAS 2002, vol. 5, [30] S. I. Ahmed, T. A. Kwasniewski, “An All-Digital Clock and Data
pp V-457 - V460, May 2002 Recovery Circuit Optimization Using Matlab/Simulink,” IEEE
ISCAS, May 2005 (Accepted)
[13] J. Lee, B. Razavi, “A 40-Gb/s clock and data recovery circuit in
0.18 Pm CMOS technology,” IEEE J. Solid-State Circuits, vol. [31] R. Zhang, G. S. La Rue, “Clock and Data Recovery Circuits with
38, no. 12, Dec 2003, pp. 2181-2190 Fast Acquisition and Low Jitter,” IEEE Workshop on Microelec-
tronics and Electron Devices, 2004, pp. 48-51
[14] A. Seedher, G. E. Sobelman, “Fractional-rate phase detectors for
clock and data recovery,” Proc. IEEE SoC Conf., 17-20 Sept.
2003, pp. 313 - 316
1881
Authorized licensed use limited to: Carleton University. Downloaded on July 17, 2009 at 14:01 from IEEE Xplore. Restrictions apply.