# Serializer/Deserializer Component Design and Test

Kahn Li Lim



Department of Electrical & Computer Engineering McGill University Montreal, Canada

June 2006

A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Engineering

© Kahn Li Lim, 2006



Library and Archives Canada

Published Heritage Branch

395 Wellington Street Ottawa ON K1A 0N4 Canada Bibliothèque et Archives Canada

Direction du Patrimoine de l'édition

395, rue Wellington Ottawa ON K1A 0N4 Canada

> Your file Votre référence ISBN: 978-0-494-28607-4 Our file Notre référence ISBN: 978-0-494-28607-4

#### NOTICE:

The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.

The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

#### AVIS:

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par télécommunication ou par l'Internet, prêter, distribuer et vendre des thèses partout dans le monde, à des fins commerciales ou autres, sur support microforme, papier, électronique et/ou autres formats.

L'auteur conserve la propriété du droit d'auteur et des droits moraux qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.

While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.



Conformément à la loi canadienne sur la protection de la vie privée, quelques formulaires secondaires ont été enlevés de cette thèse.

Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.

# Abstract

i

Serializer/Deserializers (SerDes) commonly used in telecommunication networks are now becoming widespread in computer and embedded systems to meet higher data bandwidth demand and support higher peripheral device performance requirements. These input/output (IOs) peripherals are design to provide reliable high speed data transfer capabilities to computers and embedded devices. This thesis provides a novel phase detector design in clock recovery system of a multi-level SerDes component. The multi-level phase detector incorporates a high speed array Flash Analog to Digital converter (ADC) front-end and an all-digital phase detection block. The all-digital design allows for lower power consumption and ease of transfer between technologies. This thesis also presents a Built-In-Self-Test (BIST) component that provides an enhancement to existing SerDes BIST measurements of duty cycle jitter. The measurement method provides better accuracy and is scalable due to its digital implementation. We provide histogram analysis for the enhanced BIST measurement for both constant and varying test input, repeatability and the effect of histogram bin size.

# Résumé

Le Sérialiseur/Désérialiseur (SerDes) permet d'obtenir la transmission de données à haut débit avec une grande efficacité. Il est communément utilisé dans les réseaux de télécommunication et son adoption a récemment progressé en informatique et dans les systèmes embarqués afin de fournir une plus grande capacité de transfer. Ce mémoire de Matrise présente une nouvelle architecture de détecteur de phase pour la récupération de l'horloge dans un module SerDes. Le détecteur de phase à multiple niveaux incorpore une série de convertisseurs Analogique à Numérique de type Flash (Flash ADC) en première ligne ainsi qu'un détecteur de phase entièrement numérique. Cette architecture permet une consommation réduite de puissance ainsi qu'une plus grande facilité de transfer vers d'autres technologies. Ce mémoire présente aussi la composante d'auto-test (BIST) permettant de meilleures mesures des performances du SerDes. La méthode présentée offre une meilleure précision ainsi qu'une plus grande flexibilité grâce à son architecture entièrement digitale. Une analyse avec histogramme est présentée afin d'améliorer la mesure du BIST pour un test à entrée constante et variable. La possibilité de répéter les tests ainsi que l'effet de la grosseur des entrées dans l'histogramme y est aussi étudiée.

# Acknowledgement

I would like to take this opportunity to thank and show my sincere appreciation to my supervisor Zeljko Zilic. He provided me with several interesting research opportunities and side projects while pursuing my Master's degree. They enriched me with numerous practical experiences for undertaking challenges in the real working world.

I would like to extend my appreciation to my peers in the McGill Microelectronics and Computing Systems (MACS) lab. I like to thank Jean-Samuel Chenard for sharing his wealth of technical expertise and help in completing my thesis. I am very grateful to have Sadok Aouini to help me with the process of submitting my thesis while I am in Vancouver. I would also like to thank Milos Prokic, Usman Khalid and Atanu Chattopadhyay for working on the McGill MicroProcessor Systems Board (McGumps) together.

I am grateful towards Micronet and Canadian Microelectronics Corporation (CMC) for providing financial support during my Master's studies. Again, I am grateful to my supervisor for his dedication in securing funding for me and the rest of his graduate students.

# Contents

| 1. | Introduction                                  |    |
|----|-----------------------------------------------|----|
|    | 1.1. Serializer/Deserializer Design           | 1  |
|    | 1.2. Serializer/Deserializer Testing          |    |
|    | 1.3. Thesis Contribution                      |    |
|    | 1.4. Thesis Outline                           |    |
| 2. | Background                                    |    |
|    | 2.1. Serializer/Deserializer Components       | 10 |
|    | 2.2. Clock Recovery Architecture              |    |
|    | 2.2.1. Phase-picking Architecture             | 13 |
|    | 2.2.2. Feedback Loop Architecture             |    |
|    | 2.2.3. Multilevel Clock Recovery Architecture |    |
|    | 2.3. Jitter Types                             |    |
|    | 2.4. Random Jitter                            | 19 |
|    | 2.4.1. Random Jitter Sources                  | 19 |
|    | 2.5. Deterministic Jitter                     |    |
|    | 2.5.1. Periodic Jitter                        |    |
|    | 2.5.2. Data Dependent Jitter                  |    |
|    | 2.5.2.1.Duty Cycle Distortion                 |    |
|    | 2.5.2.2.Inter-Symbol Interference             |    |
|    | 2.6. BIST – Complement of ATE                 |    |
|    | 2.6.1. Loopback Test                          |    |
|    | 2.6.2. Circular BIST                          |    |
|    | 2.6.3. Vernier Delay Line                     |    |
|    | 2.6.4. Undersampling BIST                     |    |
| 3. | Multilevel Phase Detector                     |    |
|    | 3.1. Multilevel Signaling                     |    |
|    | 3.2. Phase Detector for Multilevel CDR        | 33 |
|    | 3.2.1. Structure and Operation                |    |
|    | 3.2.2. Transition Detection and Decomposition |    |
|    | 3.2.3. Early/Late Signal Generation           |    |

|    | <ul> <li>3.3. 2-Bit Flash ADC</li></ul> | 37<br>37<br>38<br>41 |
|----|-----------------------------------------|----------------------|
| 4. | Serializer/Deserializer BIST            |                      |
|    | 4.1. Sampling Theorem and Aliasing      | 45                   |
|    | 4.2. Law of Large Numbers               | 49                   |
|    | 4.3. Duty Cycle Measurement BIST        |                      |
|    | 4.3.1. Structure and Operation          |                      |
|    | 4.3.2. Phase Lock Loops                 | 53                   |
|    | 4.3.3. FPGA Design                      | 53                   |
|    | 4.3.4. Duty Cycle Counter Controller    | 57                   |
|    | 4.3.5. PERL Processing                  | 58                   |
|    | 4.4. Results                            | 59                   |
|    | 4.4.1. Histogram Theory of Operation    | 60                   |
|    | 4.4.2. Bin Size                         | 61                   |
|    | 4.4.3. Single vs. Multiple PLL sampling | 63                   |
|    | 4.4.4. Repeatability                    | 66                   |
|    | 4.4.5. Varying Duty Cycle               | 67                   |
| 5. | Conclusion                              | 69                   |
| Re | ference                                 | 70                   |

<u>v</u>

# **List of Figures**

| Figure 1.1:   | System Performance vs. Bus Architecture                        | 2    |
|---------------|----------------------------------------------------------------|------|
| Figure 1.2:   | Relationship between Eye Diagram and Bathtub Curve             | 5    |
| Figure 2.1:   | SerDes Transceiver                                             | . 10 |
| Figure 2.2:   | SerDes Functional Diagram                                      | . 11 |
| Figure 2.3a:  | Feedback Clock Data Recovery Architecture                      | . 13 |
| Figure 2.3b:  | Phase-picking Clock Data Recovery Architecture                 | 13   |
| Figure 2.4:   | Digital Phase Detectors                                        | . 14 |
| Figure 2.5:   | CDR PLL designs over time                                      | . 15 |
| Figure 2.6a:  | SSMMSE based Phase Detection                                   | 16   |
| Figure 2.6b:  | Proportional Phase Tracking Detection Method                   | . 16 |
| Figure 2.7:   | Jitter and Unit Interval                                       | . 18 |
| Figure 2.8:   | Jitter Decomposition                                           | . 18 |
| Figure 2.9:   | Periodic Jitter Effects on Ideal Clock                         | 21   |
| Figure 2.10:  | Duty Cycle Distortion due to DC Offset                         | . 23 |
| Figure 2.11a: | Uneven Data Width Output from Distorted Clock                  | . 25 |
| Figure 2.11b: | Low Timing Margin Data Sampling using Distorted Clock          | . 25 |
| Figure 2.12:  | Circular BIST Flip-Flop                                        | . 28 |
| Figure 3.1:   | PAM-2 and PAM-4 Signaling                                      | 32   |
| Figure 3.2:   | Improved SNR using PAM-4 Signaling                             | . 32 |
| Figure 3.3:   | Multilevel PAM-4 Clock Recovery with Multiple Clock Phases     | . 33 |
| Figure 3.4:   | Operation of Multilevel Phase Detector                         | 34   |
| Figure 3.5:   | Decomposition of Multilevel PAM-4 Signal for Clock Recovery    | 35   |
| Figure 3.6:   | Digital Transition Detection and Decomposition Circuit         | . 35 |
| Figure 3.7:   | Early/Late Signal Generation                                   | . 36 |
| Figure 3.8:   | Sample-and-Hold Preamplifier                                   | . 38 |
| Figure 3.9:   | Track and Latch Stage                                          | 39   |
| Figure 3.10:  | Regeneration Circuit Simulation                                | . 40 |
| Figure 3.11:  | Flash ADC Output                                               | 41   |
| Figure 3.12:  | 3-Phase Flash ADC Sampling Output                              | 42   |
| Figure 3.13:  | Early/Late Signal Generation and Signal Decomposition Output   | . 43 |
| Figure 3.14:  | Sampling of Data Symbol at Different Offset from Symbol Center | 44   |
|               |                                                                |      |

vi

| Figure 3.15: | Phase Detection Output Sampled Across the Symbol Period             | . 44 |
|--------------|---------------------------------------------------------------------|------|
| Figure 4.1:  | Frequency Domain Effects                                            | . 47 |
| Figure 4.2:  | Time Domain Effects of Aliasing                                     | . 48 |
| Figure 4.3:  | Application of Duty Cycle Measurement                               | . 51 |
| Figure 4.4:  | Structure and Operation of Duty Cycle Measurement BIST              | . 52 |
| Figure 4.5:  | Undersampling Example of Recovered Clock Signal                     | . 53 |
| Figure 4.6:  | Multiple PLL Undersampling                                          | . 54 |
| Figure 4.7:  | Detail FPGA Design of BIST                                          | . 55 |
| Figure 4.8:  | State Machine Implementation of Duty Cycle Controller               | . 57 |
| Figure 4.9:  | Histogram of Cycle Offset Measurement with Bin Size of 200ps        | 61   |
| Figure 4.10: | Histogram of Cycle Offset Measurement with Bin Size of 20ps         | . 62 |
| Figure 4.11: | Histogram of Cycle Offset Measurement with Bin Size of 10ps         | 62   |
| Figure 4.12: | Histogram of Duty Cycle Measurement from SDA600 using 20 bins       | . 62 |
| Figure 4.13: | Histogram of Duty Cycle Measurement from SDA600 using 100 bins .    | 63   |
| Figure 4.14: | Histogram of Duty Cycle Measurement from SDA600 using 2000 bins     | s 63 |
| Figure 4.15: | Histogram of Cycle Offset Measurement using 1 PLL with 3 outputs    | 64   |
| Figure 4.16: | Histogram of Cycle Offset Measurement using 2 PLL with 6 outputs    | 64   |
| Figure 4.17: | Histogram of Cycle Offset Measurement using 3 PLL with 9 outputs    | 64   |
| Figure 4.18: | Histogram of Cycle Offset Measurement using 4 PLL with 12 outputs   | 65   |
| Figure 4.19: | Histogram Mean offset for Range of Duty Cycle Inputs                | . 67 |
| Figure 4.20: | Histogram of cycle offset measurement of a signal with randomly var | ying |
|              | duty cycles                                                         | . 68 |
| Figure 4.21: | Histogram of duty cycle measurement of a signal with randomly var   | ying |
|              | duty cycles using LeCroy SDA6000 with 2000 bins                     | . 68 |
| Figure 4.22: | Histogram of duty cycle measurement of a signal with randomly var   | ying |
|              | duty cycles using LeCroy SDA6000 with 100 bins                      | . 68 |

# **List of Tables**

| Table 1.1: Disadvantages    | of Existing Laboratory Tools and Production T | esters 6 |
|-----------------------------|-----------------------------------------------|----------|
| Table 4.1: Altera Stratix P | PLL Features                                  | 54       |
| Table 4.2: Logic and Men    | nory Usage                                    | 56       |
| Table 4.3: Histogram para   | meters using Different Measurement Methods    | 66       |

# **Chapter 1**

# Introduction

#### **1.1 Serializer/Deserializer Design**

New information technology and networks, including powerful microprocessors and multimedia appliances with enormous bandwidth requirements, are pushing the limits of system performances and data transfers. Traditional techniques to increase system performance of shared multi-drop buses such as increasing frequency, widening bus interface, pipelining transactions, splitting transactions, and allowing out of order completion, creates several design issues. Switched backplane topologies with boards interconnected through central switch fabric that resides on a separate cards has become the topology of choice in today's high speed digital system [1]. Multiple line cards can connect to a switched fabric card through point-to-point serial links. Point-to-point links utilizes high speed serial Input/Output technology or SerDes (Serializers/Deserializers) that allows for lower pin count on ASICs, savings in board real estate through reduction in number of PCB traces and optimized signal integrity.

High speed SerDes (Serializers/Deserializers) used commonly in telecommunication and storage systems applications (SONET, Ethernet and Fiber Channel) are now also being adopted in computer applications. In computer and embedded system, the need for higher bus performance is driven by the need for higher raw data bandwidth to support higher peripheral device performance requirements and the need for more system concurrency. The overall system bandwidth has also increased because of the increasing use of DMA (direct memory access), smart processor-based peripherals, and multiprocessing in systems. To meet bandwidth requirements, bus architecture are moving to packet-switched, point-to-point technology utilizing both mesh and fabric backplane architectures, and new emerging protocols such as RapidIO, PCI Express and Serial ATA are being developed, Figure 1. In a statistical survey [2], the most common SerDes operates at 3.125Gbps and within two years, the next-generation system operation speed will triple into the 10Gbps range. Majority of the sources of SerDes comes from off-the-shelf standard chips, FPGAs and in-house design. Top concerns with using high speed interconnect is signal integrity, including crosstalk, electro-magnetic interference (EMI), jitter, reflections, package noise, skew and static.



Figure1.1: System Performance vs. Bus Architecture. Higher system performance levels require adoption of point-to-point switched interconnects.

As the next generation systems employ 10Gbps backplane signaling, current low cost backplane materials and connectors do not provide sufficient bandwidth to support this transmission rate. Bandwidth limitation is caused by dielectric loss, skin effect and impedance discontinuities. At above 2 GHz, channels vary significantly depending on signaling layer (and thus the thru/stub ratio of via), the trace length (and thus the dielectric loss), and dielectric material [3]. The two main categories of approaches to increase backplane transmission speed are passive and active techniques. Passive solutions incorporate the use of high-quality microwave substrate materials, innovative via-hole techniques, and new connector technology [4, 5]. However, the passive approach requires costly microwave substrates and special high-bandwidth backplane connectors that may still have unacceptable transmission characteristics for long trace lengths.

Active solutions to increase backplane throughput includes adaptive (decision feedback, feed forward) equalization [6], pre-emphasis, multilevel signaling, or combination of thereof [7, 8]. The active approach works for long trace lengths and is cost effective, requiring only line card replacements and not the whole system. Rambus, Accelerant Networks and Lucent Technologies are among the commercial companies pursing multilevel signaling schemes to increase bandwidth on backplane systems.

Among the multilevel signaling schemes, different receiver architectures using different phase detection schemes has been proposed [9],[10]. These receivers require complex analog phase detection blocks in their clock recovery architectures. Here, a novel digital phase detector was designed for PAM-4 clock recovery [11]. The phase detector design is based on the commonly used Alexander phase detector in binary signaling clock recovery systems. As such, the benefits inherent to the Alexander phase detector are brought to the multilevel signaling domain. The all-digital implementation allows for low voltage operation and portability between technologies.

3

## 1.2 Serializer/Deserializer Testing

As SerDes transceivers are increasingly being used for high speed data communication and its data rates continue to climb, many difficulties regarding the accuracy, complexity and cost of I/O characterization arise during testing. At multi-Gbps operating speeds, signal integrity of the line can be disturbed by probe contacts. The eye diagram captured through external probing does not emulate what the actual receiver captures, since channel components include PCB traces, vias, packages and pad capacitance. Besides the onset of data rates in the multi-Gbps range, next generation computer chipset architecture will have devices with multiple port (in the order of 10's) types utilizing high data bandwidth, bringing about major challenges in high volume manufacturing (HVM) test environment [12]. Conventional per-pin Bit Error Rate (BER) testing for low cost commodity multi-port logic device will result in high Automated Test Equipment (ATE) test cost and long test time for BER less than 10<sup>-12</sup> requirements.

Bit Error Rate (BER) is the most fundamental figure of merit for communication system performance [13]; BER is the ratio of number of bits in error divided by the number of bits received. Bit Error Rate Testers (BERTs) plays a significant role in testing systems as their measurement results are often used as the standard. A BERT consists of a Pattern Generator (PG) and an Error Detector (ED). The BERT Scan technique varies the data edge placement with respect to the clock edge to obtain a series of BER to produce a bathtub curve plot. The bathtub curve measures the eye opening as a function of BER and also allow for random and deterministic jitter separation [13]. The drawback of using BERTs for jitter testing is the long test time. For example, a bathtub curve for BER of 10<sup>12</sup> takes in the order of 2-8 hours to complete. In Figure 2, jitter histogram shows the distribution of data transition accumulated in the eye diagram. The jitter probability density function (PDF) is the jitter histogram rescaled such that the integral is unity. Figure 2 shows the relationship between the eye diagram and the bathtub curve, where it can be obtained by the integral of the jitter PDF, also known as the cumulative distribution function (CDF).

4



Figure 2: Relationship between Eye Diagram and Bathtub Curve. The Bathtub curve is the integral of the jitter PDF.

Digital Automated Test Equipment (ATE) systems are effective for testing high pin count, low frequency complex logic due to the availability of several 100Mbps range parallel busses and automation infrastructure. However, ATE systems are inadequate when data rates goes beyond 1Gbps. High performance and accurate laboratory instruments such as BERTs are commonly used to test high speed SerDes. These instruments on the other hand lack automation and flexibility to test high pin-count complex IC devices. A hybrid test system that combines ATEs and laboratory equipments to harness its individual qualities was proposed in [14] to test multi-gigabit rate complex devices. However, as more and more ASICs and FPGAs have multi-SerDes ports, the limitation of the hybrid approach becomes apparent. Instrument cost, physical size and test time constraints reduce the effectiveness of the hybrid test system for complex ASIC devices with multi-port SerDes.

Various existing jitter testing equipment and digital ATEs lack the ability in HVM test environment due to reason listed in Table 1.

| · · · · · · · · · · · · · · · · · · · |                                                | Disadvantages                                                             |
|---------------------------------------|------------------------------------------------|---------------------------------------------------------------------------|
|                                       |                                                | - Test time of $\sim$ 8 hours for BER <sup>-12</sup> measurement at 2.125 |
|                                       | BERTs                                          | Gbps                                                                      |
|                                       |                                                | - Not flexible for production testing                                     |
| Laboratory                            |                                                | - High bandwidth (30-50GHz) but slow acquisition speed                    |
| Tools                                 | Oscilloscope                                   | - Test time >250 years for BER <sup>-12</sup> measurement at 2.125        |
|                                       |                                                | Gbps                                                                      |
|                                       | Time Interval Analyzers                        | - Expensive and bulky                                                     |
|                                       | (TIA)                                          | - Not flexible for production testing                                     |
| Production                            | ProductionAutomated TestTestersEquipment (ATE) | - Slow serial I/Os operating at only 400Mbps-1.6Gbps                      |
| Tostors                               |                                                | - Lack of differential high speed ports per test head                     |
| 1 63(613                              |                                                | - Interconnect signal degradation at high speeds                          |

Table 1.1: Disadvantages of existing laboratory tools and production testers

According to the International Technology Roadmap for Semiconductors (ITRS), for the near term, ATE manufacturers are required to design multi-port, gigabit data rate instruments and integrate them into test systems [15]. For example in [16], an interface macro providing up to 16 TX and/or RX channels with capability of transmitting from DC to 34.1Gbps (2.13Gbps X 16 Channels) was developed for interconnections in high-speed memory test systems. Currently, there are interim solutions for differential link testing at serial data rate of 4.25Gbps using maximum number of ports of 200. Beyond serial data rate of 4.25Gbps, while manufacturable solutions are not known, Design for Test (DFT) and Built-In-Self-Test (BIST) are seen as potential solutions. In the long term, DFT features needs to extend beyond pattern generation and error detection to provide more related parametric coverage [15]. Combining DFT/BIST methodology and

external test instrument will minimize test cost, execution time and effectively test high speed multi-port devices.

DFT/BISTs test solution provides attractive solution as it takes on some of the areas of future concerns for production testing of SerDes [15]. Among them are high speed serial port count increase, cost factor, test fixture bandwidth and parametric DFT versus Logic DFT. With high port count, the traditional rack-and-stack approach with multiple lab instruments becomes impractical. Multi-port ATEs are required to handle the increasing number of serial ports on single device. By combining DFT/BIST with external test equipments, lower test cost and enhanced functionality can result from it. DFT techniques tend to reduce cost and reduce the need for expensive test equipment. As a result, DFT techniques can bring about cost-efficient testing of high speed serial I/Os because the increasing number of gigabit transceivers are integrated into high-volume, low cost multiport devices. With increasing of data rate and port count, high frequency signals to be delivered to device suffer significant loss and distortion. For data rates of 10Gbps, the fixture bandwidth requirement reaches 20GHz. Integration of the front-end DUT interface to the ATE test head will alleviate this problem as socket and wafer probe will be bottlenecks of gigahertz testing.

BIST/DFT strategies involving delay lines and vernier oscillators presented in [17] and [18] respectively, provide picoseconds resolution jitter measurements. In [19] on-chip eye diagram generation and jitter characterization BIST circuitry was integrated into a transceiver to perform diagnostics on interconnect and transceiver circuit. The diagnostics were performed by capturing periodic waveforms sent across the channel, analogous to that of an equivalent-time sampling oscilloscope. Undersampling techniques were used in [20, 21] to provide jitter testing for high speed SerDes. In [22], circular BIST was used to test digital components of SerDes.

An improvement to the existing BIST was implemented and tested on the Altera Stratix FPGA. To allow for better measurement results, numerous PLLs already available in the FPGA were utilized. Having additional sampling PLLs, the variance reduces and our

----

measurement results will be closer to the mean. The measurements taken are the duty cycle jitter of an alternating data signal or clock signal and display the measured result as a histogram. Histograms are commonly used to give a clear view of the distribution of measured parameters.

The contribution of the thesis is two-fold. In SerDes component design, an all-digital multilevel phase detector for clock data recovery in a multilevel SerDes was designed and simulated. An array of high speed 2-bit Flash ADC (Analog to Digital Converter) was implemented to sample and convert the multilevel signal. The phase detection scheme uses only digital components with multiphases of an oscillator to provide early/late information of the incoming signal. In SerDes component testing, a BIST component was designed and implemented in FPGA to provide measurements for duty cycle jitter. The measurement method provides enhancement over conventional methods by utilizing the Law of Large Numbers and undersampling. The improvements in results of a multi-PLL sampling and single-PLL sampling of a fix and varying input signal is shown. Also shown in the results are the effect of bin sizes on the measurements and the shape of histogram. The following section is the thesis outline providing the organization of the thesis.

#### **1.3 Thesis Outline**

Following the above introduction, Chapter 2 will provide background on SerDes components and jitter types. Literature review on different phase detectors used in different SerDes architectures, along with various BIST/DFT techniques used in high speed SerDes testing will be presented.

In Chapter 3, multilevel signaling and the conditions for its application can be useful are presented. A novel all-digital approach to multilevel phase detection is presented along with simulations of the phase detector.

In Chapter 4, details of how undersampling concept and the law of large numbers can be applied to testing are presented. Next, we present duty cycle jitter measurement BIST for high speed SerDes implemented on Altera Stratix FPGA. The accuracy and repeatability of the using single sampling and multiple sampling are also presented.

Finally, Chapter 5 summarizes the contributions in this thesis.

# **Chapter 2**

# Background

In this chapter, we provide background on important components comprised in a SERDES, and continue to present published works on different phase detectors used in SERDES. Background on different types of jitter how they affect the performances of SERDES are also presented. Finally, we provide a literature review on BIST/DFT techniques for testing SERDES performance.

#### 2.1 Serializer/Deserializer



Figure 2.1: SerDes Transceiver

SerDes stands for <u>Ser</u>ializer/<u>Des</u>erializer. The Serializer takes parallel data and serializes it into serial bit stream. The clock is embedded in the data stream. The 8-bit parallel data bits are encoded into a 10-bit format that is transmitted over a serial output link. The 10-

bit format is used to ensure that there are enough transitions in the data stream to allow the receiver to recover the embedded clock accurately. On the other end of the serial link, the Deserializer takes the serial data, recovers the clock from the data and uses the recovered clock to decode the serial data and convert it back to parallel data as shown in Figure 1. The functional blocks of a SerDes are shown in Figure 2.



Figure 2.2: SerDes Functional Diagram

On the transmit side, the SerDes device has a parallel digital interface, FIFO, 8B/10B encoder and serializer. The transmitter output drives a differential signal into a  $50\Omega$  transmission media. Common output drivers used for SerDes technology is Low Voltage Differential Signal (LVDS) for speeds up to 800MHz or Emitter Couple Logic (ECL) for transmission speeds of 3GHz. Differential signal drive has the advantage of common-mode noise rejection since any noise seen by both signal cancels out by the differential signaling. 8B/10B encoding is used to DC balanced the parallel data and minimize errors. 8B/10B encoding will truncate an all "zero" into DC-balanced 10-bit word made of zeros and ones, decreasing the amount of data dependant jitter.

On the receive side, it has a transition tracking loop that does the data and clock recovery along with byte alignment, 8B/10B decoder, word alignment FIFO and parallel digital interface. The clock data recovery block is the most essential block in the receiver as it recovers the clock from the input serial data stream. Clock signal has to be recovered first before accurate byte/word alignment can occur. The clock data recovery block can only recover the clock and data if the input data stream contains adequate data "eye" where the

maximum data stream data run length is not exceeded, and the average DC component of the signal is zero. Once clock is recovered, byte alignment occurs. The byte aligner looks for a particular 7-bit sequence, in 8B/10B encoded data stream, that occurs in the coma characters K28.1, K28.5 and K28.7. Byte alignment occurs immediately when alignment sequence is detected. Word alignment occurs when specific application (Fibre-Channel or XAUI) state machines achieved synchronized state. Word alignment will cause the output data to be aligned such that the comma character is in the most significant byte.

There are some additional common functional blocks within each chip. These are the configuration logic, functional test logic, and system phase-locked loop (PLL). Other important diagnostic features include Built-In-Self-Test (BIST) functions and JTAG test interface. Devices can also be put into internal loop-back mode for system testing, even when the links are open or shorted.

#### **2.2 Clock Recovery Architecture**

From the previous section, the clock recovery block plays a critical role in the receiver side of the SerDes. The clock recovery block ensures that the recovered clock edge is aligned correctly at the centre of the data eye or midway point between two closest data transitions. The recovered clock will be used for both the multiphase sampler and the receiver front-end blocks in Figure 2. The clock is recovered from the data by detecting the edge transitions of the incoming data using a phase detector and adjusting the clock phase accordingly. Depending on the phase detection method, timing recovery architecture can be of feed-forward/phase-picking architecture or of feedback loop architecture, as shown in Figure 3. The difference between the two is the tracking rate. The former tracks phase changes at the rate of the decision logic. The latter contains a phase-locked loop (PLL) with a finite loop bandwidth. If the transmitter jitter is larger than the receiver PLL loop bandwidth, then the transmitter phase noise will appear as peak-to-peak timing error at the receiver.



Figure 2.3: (a) Feedback Clock Data Recovery Architecture(b) Phase-picking Clock Data Recovery Architecture

## 2.2.1 Phase-picking Architecture

The feed-forward or phase-picking architecture is commonly used in UARTs. Other higher bandwidth application of this architecture is shown in [23, 24]. In general, data is oversampled using a multiphase clock, and the sample that has the best timing margin is picked as the correct data bit by some decision algorithm. The timing margin is largest when clock is sampled at the center of the data eye. Different decision algorithm is used for different application and bandwidth. In [23], the correct sampled data bit is determined by using an average of the sampled values with various bit window position. The algorithm makes new decisions on the input phase every clock cycle. In [24], the decision of the correct sample is made by delaying the samples by the number of clock cycles required for the decision. The algorithm detects the data transition over 24 samples and uses transition information to select eight samples as the correctly sampled data byte. The decision logic makes a new decision per byte of the data.

The main advantage of the feed-forward architecture is that it tracks the phase movements with respect to clock without an intrinsic bandwidth limitation. The maximum tracking rate is limited by the decision logic and transitions in data. Also, since there is no feedback loop, there are no stability issues and hence no settling or locking time. A disadvantage is that this architecture will not be able to distinguish phase changes greater than half bit time from phase shift in the opposite direction. In addition, phase quantization causes an inherent static phase error. There will be a trade off between finer phase quantization (smaller static phase error) and design complexity. A more complex design leads to higher power consumption. Also, finer phase quantization requires increased number of samplers which results in higher input capacitance and lower bandwidth.

### 2.2.2 Feedback Loop Architecture



Figure 2.4: Digital Phase Detectors

In the feedback loop architecture, digital phase detectors such as Hogge's phase detector (HPD) and Alexander (APD) or Bang-Bang phase detector are commonly used [26]. The operation of the APD and HPD is shown in Figure 4. These digital phase detectors use flip-flops and XORs to detect data transitions. A variant of Hogge's phase detector in [27] was used for high bandwidth clock recovery in [28]. A fully symmetric half-rate Alexander detector was used for wide capture range and wide input bit rate clock data

recovery [29]. Since digital phase detectors requires only simple processing of digital values, they can easily generalize to multi-phase sampling structures allowing the clock recovery operation speed to be many times above the maximum speed of a flip-flop. Another advantage of digital phase detectors is that the clock recovery path are combined with data retiming path, avoiding any inherent sampling offset from the center of the data eye. The HPD output provides phase and magnitude information, where the output varies linearly with magnitude of phase offset. A disadvantage of the HPD is that when there are no data transition, the HPD continues to output the last valid phase error. The APD however is immune to the effects of run length and data transition density because it has a "hold" state when there are no data transitions. In a survey of clock data recovery (CDR) designs presented from 1988 to 2001 at the International Solid State Circuits Conference, majority of the designs utilize a combination of multiphase sampling structures and bang-bang PLL as shown in Figure 5. In addition, all clock data recovery operating at data rates greater than 0.4  $f_T$  are bang-bang designs [30].

|                        | Number of Publications |                  |
|------------------------|------------------------|------------------|
| Year of<br>Publication | Linear<br>CDR          | Bang Bang<br>CDR |
| 1990-1991              | 2                      | 1                |
| 1992-1993              | 2                      | 2                |
| 1994-1995              | 3                      | 2                |
| 1996-1997              | 0                      | 5                |
| 1998-1999              | 1                      | 3                |
| 2000-2002              | 1                      | 3                |

|                          | Number of Publications |                  |  |
|--------------------------|------------------------|------------------|--|
| $\frac{DataRate}{f_{T}}$ | Linear<br>CDR          | Bang Bang<br>CDR |  |
| 0.01-0.05                | 4                      | 1                |  |
| 0.5-0.1                  | 1                      | 3                |  |
| 0.1-0.5                  | 4                      | 10               |  |
| 0.5-1.0                  | 0                      | 4                |  |

Figure 2.5: CDR PLL designs over time. The ratio of link speed to effective process transit frequency and the year of publication is tabulated with the number of publications. Multi-phase Bang-Bang PLLs predominate as data rate approaches the process transit frequency  $f_T$  limit. [30]

Since flip-flops used in digital phase detectors have none zero setup times, it can cause static phase offset and reduce timing margins at high data rates. Another disadvantage is that the duty cycle of the clock has to be 50% otherwise it will also result in static phase error. These issues can be avoided using samplers in phase detectors as shown in [31].

Multiple channel samplers are used in [31] where sampling times are separated by the bit period. Clock recovery was then implemented using a decision directed minimum-likelihood algorithm.

### 2.2.3 Multilevel Clock Recovery Architecture

Clock recovery architectures for multilevel signals found in literatures are of the feedback architecture. However, the phase detection schemes used are slightly different as incoming data signals are no longer binary signals. All multilevel clock recovery schemes will require the multilevel signals to be converted into binary signals using high speed flash ADCs.



Figure 2.6: Different Multilevel Phase Detector Scheme. (a) SSMMSE based Phase Detection (b) Proportional Phase Tracking Detection Method

In [9], a clock recovery scheme was presented using Bang-Bang phase detector to generate early/late pulses based on sign-sign minimum mean squared error (SSMMSE). The decision to advance or delay the sampling phase is based on i) the error between the sampled value and the reference signal value and ii) the slope of the received signal at the sampling instant. In Figure 6(a), when A is sampled at  $t_1$ , the sampled value of A is larger than that of the desired signal at point B, also the (negative) slope of the sampled signal is different than that of point C. Using these two quantities, the sampling phase at  $t_1$  is

deduced to be *early*. For the clock phase sampling at point B, the sampled signal is also larger than that of the desired signal at B, however, the (positive) slope is opposite to that of A, so the sampling clock phase at  $t_3$  is deduced to be *late*. Advantage of [9] is that it requires only baud rate sampling of the received waveform thus eliminating the need for quadrature clocks in a half rate system.

In [8], a linear phase detection scheme is used for clock data recovery. It utilizes both the transition edges of the data and the sampled value at the center of data eye. The sample value  $S_e$  shown in the Figure 6(b) will provide the phase error information. As  $S_e$  is proportional to the phase error,  $\Delta \Phi$ , the loop control voltage generated will vary linearly with  $S_e$ . When the clock is in lock, the  $S_e$  sample is zero.

The multilevel phase detection schemes above require additional analog components besides the high speed Flash ADC. Analog components are usually more susceptible to noise sources and process variation. Analog components also tend to be larger and have higher power consumption. An all-digital approach for multilevel clock recovery is proposed in Chapter 3 to avoid some of these issues.

### 2.3 Jitter Types

Jitter is the deviation of a timing event of a signal from its ideal position. Jitter effects become more significant as data rates increases because data/clock pulse width becomes shorter. Jitter is usually expressed in unit intervals (UI). A UI is the ideal time duration of a single bit or clock period shown in Figure 7.



Figure 2.7: Jitter and Unit Interval

Jitter can be decomposed into two main components, deterministic jitter (DJ) and random jitter (RJ) [32]. Deterministic component of the total jitter is bounded and the random component of total jitter is unbounded. The unbounded jitter component is qualified as the tail-ends of the jitter histogram as shown in Figure 7. DJ can be further separated into data dependent jitter (DDJ), periodic jitter (PJ), and bounded-uncorrelated jitter (BUJ) as shown in Figure 8.



Figure 2.8: Jitter Decomposition

The total jitter in a system is the convolution of all independent jitter components' probability density functions (PDFs).

$$P_{TJ}(\Delta t) = P_{DJ}(\Delta t) * P_{RJ}(\Delta t) = \int_{\infty}^{\infty} P_{DJ}(\tau) P_{RJ}(\Delta t - \tau) d\tau$$
$$P_{DJ}(\Delta t) = P_{DDJ}(\Delta t) * P_{PJ}(\Delta t) * P_{BUJ}(\Delta t)$$

The PDF of a DJ can have any arbitrary shape, making it susceptible to errors when using models to approximate. To obtain the PDF of DJ, the  $P_{TJ}$  and  $P_{RJ}$  are measured first, then deterministic PDFs can be determined through deconvolution, and all the appropriate statistical parameters (mean, rms, peak-to-peak) can be calculated.

#### 2.4 Random Jitter

Random jitter is commonly modeled by the Gaussian distribution function:

$$J_{RJ}(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\left(\frac{x^2}{2\sigma^2}\right)}$$

Where  $J_{RJ}(x)$  denotes the RJ PDF,  $\sigma$  is the standard deviation of a Gaussian distribution, and x is the time displacement relative to the ideal position. Its characteristics are described by the mean and rms value.

#### 2.4.1 Random Jitter Sources

Common sources of RJ include shot noise, flicker noise and thermal noise [33, 34]. Shot noise is a represented by a broadband white noise and has a Gaussian distribution. Shot noise occurs when there is a direct current flow and there must also be a potential barrier over which the charge carriers hop. For example, both base and collector currents are sources of shot noise in a bipolar transistor, DC leakage current of FETs also contributes to shot noise. The shot noise current is given by

$$\overline{i_n^2} = 2qI_{DC}\Delta f$$

Where  $\overline{i_n}$  is the rms noise current, q is the electronic charge (about 1.6x10-19 C),  $I_{DC}$  is the DC current in amperes, and  $\Delta f$  is the noise bandwidth in hertz.

19

Flicker noise has spectral distribution that is proportional to  $1/f^{\alpha}$  where  $\alpha$  is generally close to unity. In resistors, 1/f noise is seen when there is a DC current flowing through. Thus, minimizing DC bias can minimize this noise term. Flicker noise in resistors has been explained by some as the result of random formation and extinction of "micro-arcs" among neighboring granules in carbon composition resistors. The mean-square noise for resistors is given by:

$$\overline{e_n^2} = \frac{K}{f} \cdot \frac{R_s}{A} \cdot V^2 \Delta f$$

Where A is the area of resistor, Rs is the sheet resistivity, V is the voltage across the resistor, and K is a material-specific parameter.

In MOSFETs, the source of flicker noise has surface effect due to fluctuations in carrier density as electrons are randomly captured and emitted from oxide interface traps. The mean-square 1/f drain noise current is given by:

$$\overline{i_n^2} = \frac{K}{f} \cdot \frac{g_m^2}{WLC_{ox}^2} \cdot \Delta f \approx \frac{K}{f} \cdot \omega_T^2 \cdot A \cdot \Delta f$$

where A is the area of the gate and K is a device specific constant. Thus, for a fixed transconductance, a larger gate area and thinner dielectric can reduce this noise term.

Thermal noise, like shot noise, can be represented by a broadband white noise and has flat spectral density. Thermally agitated charge carriers in a conductor constitute a randomly varying current that give rise to a random voltage. Electron scattering due to imperfection of lattice structure causes RJ. Even for very low temperature, intrinsic defects such as impurities, missing atoms, or discontinuities in lattice structure caused by an interface, causes a localized scattering center which result in RJ. Thermal noise power is proportional to temperature; the available noise power is given by

$$P_{NA} = kT\Delta f = \frac{\overline{e_n^2}}{4R}$$

where k is Boltzmann's constant (about  $1.38 \times 10^{-23}$  J/K), T is the absolute temperature in Kelvins, and  $\Delta f$  is the noise bandwidth in hertz,  $\overline{e_n}$  is the rms noise voltage generated by resistor R over the bandwidth at a given temperature. The mean square noise voltage is

20

found to be  $\overline{e_n^2} = 4kTR\Delta f$ , where the rms noise voltage is actually square root of the bandwidth and resistance.

## **2.5 Deterministic Jitter**

Unlike random jitter, deterministic jitter is described by a bounded PDF. The PDF needs to be sampled beyond its peak maximum to avoid "loss of information" problem. Since the PDF is bounded, extra samples will not affect the shape of the PDF. Major causes of DJ include electromagnetic interference, crosstalk, signal reflection, driver slew rate, skin effects and dielectric loss. DJ can be further decomposed into periodic jitter (PJ), duty cycle distortion (DCD) and intersymbol interference (ISI). DCD and ISI are categorized as data dependent jitter (DDJ).

#### **2.5.1 Periodic Jitter**

Source of periodic jitter or sinusoidal jitter is the result of electromagnetic interference (EMI) or power supply noise. PJ exhibits a repetitive nature at a fixed frequency, and can be quantified in terms of peal-to-peak value with a frequency and a magnitude. As shown in Figure 9, an ideal clock signal modulated by a sinusoidal noise signal becomes a clock with periodic jitter. When the modulating sinusoidal signal is low, the jittered clock has smaller period, and the jitter clock period is larger when the sinusoidal signal is high.



Figure 2.9: Periodic Jitter Effects on Ideal Clock

PJ can be modeled as [35]:

$$PJ_{Total}(t) = \sum_{i=0}^{N} A_i \cos(\omega_i t + \theta_i)$$

Where  $PJ_{Total}(t)$  denotes the total jitter, N is the number of cosine components, A<sub>i</sub> is the corresponding amplitude,  $\omega_i$  is the corresponding angular frequency, t is the time, and  $\theta_i$  is the corresponding phase.

#### 2.5.2 Data Dependent Jitter

As the name suggests, data dependent jitter, consisting of DCD and ISI, is a function of the data history that occur when the transition density changes. The transmission medium generally have frequency-dependent loss characteristic that can also cause DDJ.

As transmission frequency increases above a certain frequency, the transmission medium experiences skin effect loss and dielectric loss. As high frequency current flow through a conductor, magnetic field will adjust the redistribution of current (due to eddy currents), forcing it to flow in the conduction band just below the surface of the conductor. The increase in transmission media's resistance and attenuation of the propagating signal's high frequency components is known as skin effect. This results in longer rise and fall times. Dielectric loss results from the delay of polarization in dielectric material when it is subjected to a changing field. Above a certain frequency, dielectric loss dominates skin effect losses because dielectric losses are proportional to frequency, while skin effect losses are proportional to square root of frequency [36]. The frequency dependency of skin effect and dielectric losses makes them causes of DDJ.

## 2.5.2.1 Duty Cycle Distortion

Jitter resulting from duty cycle distortion can be represented by the sum of two  $\delta$  functions [35].

$$J_{DCD} = \frac{\delta(x - \frac{W}{2})}{2} + \frac{\delta(x + \frac{W}{2})}{2}$$

Where  $J_{DCD}(x)$  is the DCD PDF, W is the peak-to-peak DCD magnitude, and x is the time displacement relative to the ideal position. The magnitude of the each  $\delta$  function is  $\frac{1}{2}$  because the equation assumes equal numbers of rising and falling transitions in the transmitted signal.



Figure 2.10: Duty cycle distortion due to DC offset

The duty cycle of an alternating bit sequence is given by

$$\frac{t_{high}}{(t_{high} + t_{low})} \times 100\%$$

Where  $t_{high}$  is duration of the high pulse and  $t_{low}$  is the duration pulse.

A symmetric data or clock signal will have a 50% duty cycle where the  $t_{high}=t_{low}$ . Duty cycle distortion is the result of any differences in the mean time allocated for the logic states in an alternating bit sequence. Sources of DCD include rise and fall time discrepancies, DC offset in data signal and device mismatch in signal path.

Electromagnetic interference from other devices or systems can also induce current on signal wires and power rails, and affect the signal voltage biasing and reference voltages. In Figure 10, a symmetric clock signal with 50% duty cycle is shown; also shown is the same clock signal with a duty cycle of greater than 50% as a result of a positive DC offset. The new duty cycle will be a function of the slew rate of the clock transition edges.

Duty cycle distortion is also known as pulse width distortion. Data transmitted in alternating bit sequence should behave like an ideal clock. However, pulse width of the data may be distorted as mention earlier. Duty cycle or pulse width measurements can be useful for SerDes testing and diagnostics in many ways. Besides being able to measure the pulse width when alternating data bit stream is used, the amount of voltage offset in the receiver can also be deduced. When a positive voltage offset is injected, a duty cycle greater than 50% is observed, as shown in Figure 10. The phase delay between two signals can also be determined by XOR-ing the two signals and measuring the duty cycle of the XOR output. These measurements are performed on incoming data to a SerDes. The duty cycle of a receiver's sampling clock must be tested. A sampling clock with varying duty cycle will prevent the clock from sampling the data at the center of the bit where the timing margin is the biggest. Also, when both clock edges are used, a 50% duty cycle is essential to avoid any phase offset.

#### Duty Cycle in SerDes

As data rate in SerDes continues to increase, the characteristics of the clocks operating in SerDes becomes even more important. Characteristics of clock include duty cycle, clock frequency and matching clock phases. Duty cycle is especially important for SerDes logic that utilizes both rising and falling clock edges. Also, since SerDes utilizes multi-phases of a clock to achieve parallelism for high speed operation, mismatch in phases of the clocks in both the transmitter side and the receiver side will degrade the performance of the SerDes. As such, the clocks utilized in SerDes needs to be examined as part of the manufacture testing process or diagnostic process during SerDes operation failure.





As shown in Figure 11(a), data is being transmitted using both rising and falling clock edges. The data transmitted using a clock with uneven duty cycle results in uneven data width output. The uneven data width output becomes more susceptible to bit errors for lossy transmission channels or at high transmission frequencies. At the receiver end, ideal data sampling should be at the centre of the data width on both the rising and falling clock edges of the clock, Figure11 (b). For uneven duty cycle at the receiver, actual data sampled will be off-center and result in lower timing margin. When multiple clock phases are used to achieve higher data rate, the uneven duty cycle on clocks will cause more bit errors and deteriorate timing margins.

#### 2.5.2.2 Inter-Symbol Interference

ISI has three main causes [35]:

- Bandwidth limitation of transmission medium can result in effects on a single bit that come from the sequence of preceding bits.
- 2) Nonlinear phase response of the transmission medium can cause frequencydependent group delay. This nonlinear response causes edge shifts that depend on the transition density within the data stream.
- Reflections can arise from imperfection transmission line terminations resulting in effects on a single bit that come from the sequence of preceding bits.

Since generally high frequencies components of the transmitted signal are attenuated more than its low frequencies counterpart in the transmission media, pre-emphasis is used to increase the voltage of the first data bit in the digital bit stream before data transmission and de-emphasis is used to reduce amplitude of lower frequency components before data transmission to compensate of frequency-dependent transmission losses. Adaptive equalization techniques can also be used at the receiver side to compensate for channel insertion loss by attenuating low frequency components with respect to high frequency components of the signal.

# 2.6 Built In Self Test (BIST) - Complement for ATE

As mentioned before in Chapter 1, next generation ASICs will have multi-port high speed SerDes I/Os to take advantage of the device computation speed and multi-functionality. An example of such device underway is Sun MicroSystem's next generation throughput computing systems [37]. These systems using non standard communication protocols will have SerDes I/Os in order of 100's and operating at multi-Gbps. Current ATEs hardware and software will not be able to solve test and debug of these systems [12, 38]. Existing ATEs possess test functionality catering to test 10's I/O per chip, using standard protocol
and source synchronous applications. In this section, different BIST/DFT techniques are used to complement existing ATEs and make up for its inadequacies are presented.

# 2.6.1 Loopback Test

Most commonly used BIST method for testing of I/O functionality without relying on external tester is by providing internal loopback configurations. In [39], AC IO loopback, a method relying on loop back in the I/O buffer for AC timing parameter testing, was used to generate an eye diagram by combining timing and voltage stress with 100s picoseconds accuracy. In [37], two types of internal loopbacks were used i) pad loopback path which includes the CML drivers and receiver sense amps and ii) inner loopback path which is fully digital and excludes CML driver and receiver sense amps. These loopbacks were used for (i) testing the clock recovery by introducing a pseudo-asynchronous loopback mode, (ii) testing the receive equalizer, (iii) BER measurement and (iv) mapping the data eye. Although loopback BIST are commonly used, there are some drawbacks to it. Among them, process variation and defect mechanism affecting both the transmitter and the receiver will get masked, low fault coverage and long test time for functional test. Both transmitter and receiver I/O in SerDes macro has to work and be available for loop configuration. In addition, during loopback mode, the transmitter and CDR in receiver work in synchronous mode in loopback configuration unlike in-field asynchronous applications. Hence, using loopback test allows for decent functionality test but still lacks coverage due to these drawbacks.

#### 2.6.2 Circular BIST

Circular BIST was briefly mentioned in [37], but presented in more detail in [22]. Circular BIST was chosen over other options like functional BIST, Scan ATPG and STUMPS BIST due to its conciseness and simplicity in implementation, high fault coverage and at speed test capability. Circular BIST is used to test digital logic in SerDes macros. Flip-flops are converted into circular BIST flip-flops as shown in Figure 12. They are then connected in a circular path and follow a sequence of operation: Reset all flip-flops, Enable circular BIST mode, Clock for N cycles and Compare values in subset of flip-flops with expected value.



Figure 2.12: Circular BIST flip flop

There are some disadvantages to Circular BIST. The fault grades can be low due to limit cycling. This is where an inappropriate starting state for the circular BIST path leads to the BIST path repeatedly cycling through a limited number of states [40]. Also, fault grade can be low due to register adjacency problem. This is where adjacent cells in BIST path have the property that the output of the first cell is in the functional input cone of the second. The result is that the XOR gate of the second cell can always output zero and hence block fault propagation [41].

# 2.6.3 Vernier Delay Line

For on-chip jitter measurement, many different techniques exist [17-19, 42-44]. A component-invariant vernier delay line (VDL) structure was used for jitter characterization in [18]. A single state loopbacked delay element, in oscillator like fashion, is used to replace a multi-stage delay element VDL. The performance of this design is not dependent on the matching of delay elements unlike conventional VDL, thus avoiding matching errors that leads to differential non-linearity timing errors. However, the delay elements in this VDL is still not immune to power supply substrate noise coupled in from analog and digital ground of adjacent components. The resolution is reported to be as low as 19ps.

# 2.6.4 Undersampling BIST

Structural tests are tests that detect structures that deviate from defect-free region and has narrower objective than specification-based test. Scan, AC Scan, Logic BIST, Memory BIST and  $I_{ddq}$  are common examples of structural test that provide good *functional* test coverage. However, in high speed SerDes I/O testing, common BIST/DFT for parametric test coverage (input/output jitter and voltage offsets) are still lacking [15]. In [43], a suite of structural tests is used to measure parameters that affect jitter tolerance in multi-Gbps receivers. The tests measure high-frequency jitter (RMS value and histogram) in the received signal and in the recovered clock, plus transition-density dependent phase-shift, mean sampling position in the signal eye, sampling clock phase error, and pin-to-pin skew, all with near picosecond resolution. Its method does not require significant changes or if any changes at all to existing SerDes macro design. Using undersampling, an UnLimited Time Resolution Analysis (ULTRA) module was created for testing jitter, phase delay and pulse width on chip. Among them, high frequency jitter of recovered clock was measured by analyzing the unstable bits relative to the median edge of the transition region, and low frequency jitter was measured by analyzing variation in time interval between median edge of each transition.

In [44], undersampling-based measurements were also used for analyzing random jitter for high speed SerDes I/Os. Random jitter was measured by analyzing unstable bits using the *mean* edge of the transition region. This approach yields different results as it filters out variation in mean positions of the each edge. It also reduces low frequency periodic jitter and data dependent jitter components in the data from appearing in the random jitter measurements. Accurate random jitter (RJ) measurement is crucial and its value will be used to determine the total jitter (TJ) of the system. The TJ in a system operating at BER  $< 10^{-12}$  is:

#### $TJ = DJ + (14.069 \times RJ)$

#### where DJ is the deterministic jitter

Since RJ is multiplied by 14, inaccurate measurements of RJ can have a large impact on the TJ estimate.

In Chapter 4, a duty cycle jitter measurement BIST is proposed and implemented in FPGA. The BIST is used to evaluate the duty cycle distribution of a clock or data with alternating bit sequence. Utilizing the PLLs available on FPGAs, measurement results with lower variance and higher accuracy can be obtained. The duty cycle distribution of the signal can be clearly observed from the shape of histogram produced.

# **Chapter 3**

# **Multilevel Phase Detector**

As mentioned in Chapter 1, as transmission data rates increases beyond Nyquist frequency of 2GHz, different techniques were used to maintain signal integrity for high speed backplane transmission. An active approach is to use multilevel signaling. In this chapter we introduce multilevel signaling and how it can improve signal-to-noise ratio in high speed serial links. A novel all-digital approach to multilevel phase detection will be presented along with simulation of the phase detector.

# 3.1 Multilevel Signaling

Multilevel Pulse Amplitude Modulation (PAM) signaling is done by having each symbol containing k bits of binary information transmitted in a single clock cycle by one of the  $2^k$  signal levels. Information is then being transmitted at a rate of

$$R_{PAM} = \frac{k}{T_b}$$
 bits/sec

Where  $T_b$  is the bit interval.

The new signaling frequency as a result of using multilevel signaling is

$$f_{NEW} = \frac{f_{OLD}}{k}$$

Where the  $f_{OLD}$  is the old signaling frequency.

For a given interval  $T_b$ , the bit rate is  $R_{PAM} = kR_B$ , k times faster than the original transmission using binary pulses. For PAM-4 signals, each symbol contains 2 bits of binary information. In PAM-4 signaling, the same amount of data can be transmitted using half the signaling frequency. These advantages come at the expense of a reduction in spacing between signal levels. For PAM-4 signals, the height of the eye diagram is reduced by factor of 3, shown in Figure 1; the signal to noise ratio (SNR) loss is thus approximately 9.5 dB.



Figure 3.1: PAM-2 and PAM-4 Signaling [45]

An improvement of at least 10dB in transmission loss reduction will warrant the use of multilevel signaling. PAM-4 signaling can replace NRZ signaling at above 3 GHz transmission, as the difference in transmission loss using 1.5GHz (PAM4) and 3GHz (NRZ) on a most common PCB fabrication material FR-4 is at least 10dB, as shown in Figure 2.



Figure 3.2: Improved SNR using PAM-4 Signaling [46]

# **3.2 Phase Detector for Multilevel CDR**

Phase Detectors are essential as they determined the type of clock recovery architecture used. The following Multilevel phase detector presented is a non-linear phase detector used in feedback clock recovery architectures. The design is based on the Alexander phase detector for binary signaling.

# **3.2.1 Structure and Operation**

Alexander phase detectors [25] are commonly used non-linear phase detectors for conventional clock data recovery. The structure and operation of the phase detector is shown in Figure 4 in Chapter 1. It uses three consecutive clock edges to sample data and compares them at the last clock edge to give the phase error information. Its attractive characteristic is that it retimes data during phase error detection and also maintains oscillator control voltage when no transition occurs. However, since phase detector is unstable for incoming data with high jitter components.



Figure 3.3: Multilevel PAM-4 Clock Recovery using Multiple Clock Phases.

For the multilevel phase detector, we apply a 2X oversampling phase detection scheme as in the Alexander phase detector. Figure 3 shows the structure of this all-digital multilevel phase detector. High speed flash ADCs are used to covert incoming analog multilevel signals to digital data. The digital data is then passed through a transition detection and decomposition block to determine transitions occurring in the data. The output will select the appropriate early/late signal used for driving the charge pump. Early/Late signals are generated based on the types of transition occurring in the data.



Figure 3.4: Operation of Multilevel Phase Detector

The operation of the multilevel phase detector is shown in Figure 4. Three clock phases are used to sample the incoming data stream.  $\varphi 3$  is delayed by the bit time from  $\varphi 1$ .  $\varphi 2$  is clocked exactly in between  $\varphi 1$  and  $\varphi 3$ . Signal sampled on  $\varphi 2$  is used to align the clock edge at the center of the bit. When signal sampled at  $\varphi 2$  is the same as signal sampled at  $\varphi 1$ , clock is early; when signal sampled at  $\varphi 2$  is the same as signal sampled at  $\varphi 3$ , clock is late. When the data is locked, the early and late signal generated will toggle, and will be averaged out by the charge pump and loop filter. For high data input jitter, the average charge pump actually becomes smaller because the data edge distribution is wider, making the phase tracking unstable. This is an inherent characteristic of Alexander-based phase detectors. In [47], an alternating phase detection scheme is demonstrated to overcome this stabilization issue.

# **3.2.2 Transition Detection and Decomposition**



Figure 3.5: Decomposition of Multilevel PAM-4 Signal for Clock Recovery

Multilevel signal transitions are of three types, as shown in Figure. 5. Type 1 signals have transitions between adjacent levels, Type 2 signals have transitions spanning across one signal level and Type 3 signals have transitions between the maximum and minimum signal levels. In terms of flash ADC output, the type1 transitions will have only a one bit output change since it is transitioning between adjacent levels. In type 2 and type3 transitions, there will be two and three bits change in the thermometer-coded output respectively. Making use of this fact, the transition detection and decomposition circuit for each type is designed as shown in Figure 6. The signal sampled at  $\varphi$ 1 is delayed and compared to the signal sampled at  $\varphi$ 3. Digital logic is used to decompose the three different types of transition Type1, Type2, Type3 and drives the multiplexer accordingly. Low logic depth for early/late decision ensures the timing constraints are met.



Figure 3.6: Digital Transition Detection and Decomposition Circuit

# 3.2.3 Early/Late Signal Generation

The Early/Late signals generated follows the same scheme as the Alexander phase detector. It consists of XORs comparing the delay  $\varphi 2$  sampled value with that of  $\varphi 1$  and  $\varphi 3$  at clock phase  $\varphi 3$  as shown in Figure 7. When no transitions occur, the output of other Early/Late Signal Generation is low. This enables the phase detector to maintain the oscillator voltage control and will not have a minimum transition requirement in the incoming data stream.



Figure 3.7: Early/Late Signal Generation

Each thermometer-coded output of the flash ADC is compare with its XORed sample according to generate an early late signal. Early<sub>T0</sub> corresponds to the T0 flash ADC output having it sample value at  $\varphi$ 2 to be the different to the sampled value at  $\varphi$ 1. The same goes for Early<sub>T1</sub> and Early<sub>T2</sub> for their respective ADC output. In order to generate the Early/Late signal for the respective transition types, the logic used to separate the different transition types in the transition decomposition block can be reused. This will also allow the Early/Late signal generation timing path to be matched to that of the transition detection and decomposition path.

## 3.3 2-Bit Flash ADC

The two bit Flash ADC comprises of two stages, a preamplifier stage and a track-andlatch stage. The preamplifier provides a small gain to increase resolution of the comparator, while the track-and-latch stage further amplifies the preamplifier output value to full-swing digital values.

### **3.3.1 Preamplifier Stage**

The preamplifier is basically a switched differential amplifier. When the clock is high, it amplifies operates as a differential amplifier and amplifies the input signal. When clock is low, the differential amplifier pair is disabled when  $M_{P3}$  transistor pulls up to Vdd. The preamplifier has a small gain of 2 is used to prevent excessive capacitive loading thus reducing operating bandwidth. Since the preamplifier gain is not huge, the input voltage offset becomes significant. The preamplifier offset voltage is caused by mismatches in the differential pair. Statistically, the offset can be derived as a function of device parameters:

$$\sigma(V_{off}) = \frac{k_P}{g_m \cdot L^2} (V_{GS} - V_t) \cdot \left[ A_{VT}^2 + \frac{A_{\beta}^2}{4} (V_{GS} - V_t)^2 \right]$$

#### Where $A_{VT}$ and $A_{\beta}$ are process parameters

Hence, using device lengths, L, larger then minimum and larger  $g_m$  can reduce preamplifier offset.



Figure 3.8: Sample-and-Hold Preamplifier

The advantage of using a preamplifier circuit is that it reduces kickback noise. Since the inputs are applied directly to the transistor gates, there is little charge transfer to the input and reference ladder. Kickback noise occurs when charge transfer either into or out of the inputs when the track-and-latch stage transitions from track mode to latch mode. This charge transfer is caused by the charge needed to turn on the transistors in the positive feedback circuitry and by the charge that must be removed to turn transistors in the tracking circuitry off [48]. Without a preamplifier stage, this charge transfer will cause large glitches in the inputs especially when the input impedances are mismatch.

### 3.3.2 Track and Latch Stage

The track and latch stage is a circuit that amplifies the small difference in the preamplifier output through a positive feedback, Figure 9. The regenerative circuit is based on the latched used in the Strong Arm processors [49]. A reset PMOS  $M_{P5}$  is added to circuit to reset the output for reasons mention later.

There is an exponential increase in the latch output as the result of a small difference in the input,

$$\Delta V = \Delta V_o e^{\frac{t}{\tau_{latch}}}$$

Where  $\Delta V_o$  is the initial difference at the beginning of the latch phase,  $\tau_{\text{latch}}$  is the latch mode time constant

From a linearized model analysis in [48], the time constant  $\tau_{latch}$  of the is

$$\tau_{latch} = K_3 \frac{L^2}{\mu_n V_{eff}}$$

Where K<sub>3</sub> is a proportionality constant between 2 and 4, and V<sub>eff</sub> =  $\sqrt{\frac{2I_D}{\mu_n C_{ox}(W/L)}}$ 

The analysis implies that the  $\tau_{latch}$  will depend primarily on technology and not on the design, given a reasonable design that maximizes  $V_{eff}$  and minimizes capacitive loading.



Figure 3.9: Track and Latch Stage

When the clock is high, a ground path is established (Mn5 is on). For a small difference in the input, the cross-couple inverters will regeneratively amplify the output to full swing. If  $I_+$  is high, then  $O_+$  is discharged through Mn3, Mn1 and Mn6, turning Mn4 off and Mp4 on. If I- is high, then O- is discharged through Mn4, Mn2 and Mn5, turning Mn3 off and Mp3 on. When clock is low, the latch resets and the outputs are high, Figure 10. Resets are used to eliminate hysteresis. Hysteresis occurs when the output of the comparator has a tendency to stay in the same state as previous toggled output. This ensures that no memory is transferred from one decision cycle to the next. Having resets also sets the comparator in *trip point*, which speeds up operation when the comparator resolves small input signals.



Figure 3.10: Regeneration Circuit Simulation

The 3-bit thermometer coded outputs of the 2-bit flash converter are converted to Gray code for further processing. Gray codes will avoid intermediate states during adjacent transitions, limiting error to only one bit. High speed flash comparators are susceptible to bubbles in the thermometer coded output as a result of comparator voltage offset, misfire of comparators and small response time. Bubbles occur when one or more zeros appear below a one in the thermometer code. By having AND gates between adjacent outputs of the comparator, the output is ensured to have output of '1' when the adjacent comparator output above is a '1' [50]. This method does not detect 2<sup>nd</sup> order bubble that in reality has low probability of occurring.

# **3.4 Simulation Results**



Figure 3.11: Flash ADC Output

A simple linear ramp test was used for checking the flash ADC output in Figure 11. The input was range from 1.2V to 1.8V was used with reference set at 1.3V, 1.5V and 1.7V. The number of occurrences for each code word is the same. Each multilevel signal vertical size is 0.2V, starting from 1.2V to  $V_{DD}$ .



Figure 3.12: 3-Phase Flash ADC Sampling

In Figure 12, a clock with three phases  $clka(\Phi_1)$ ,  $clkb(\Phi_2)$  and  $clkc (\Phi_3)$ , was used to sample the multilevel signal and convert it to thermometer-coded output. The clock phases are delayed by 350ps each. The converted thermometer-coded output is used for phase detection. Various multilevel signals are used to give various transition types. The thermometer code is added for clarity. When  $\Phi_1$  and  $\Phi_2$  outputs are the same, the clock is early; when  $\Phi_2$  and  $\Phi_3$  are the same, the clock is late.



Figure 3.13: Early/Late Signal Generation and Signal Decomposition Output

Figure 13 shows the output from the digital Early/Late signal generation and the Signal decomposition. The analog input multilevel signal is already encoded in thermometer codes *T0*, *T1* and *T2* by 2-bit flash ADC. *Type1*, *Type2*, *Type3* and *No\_transition\_select* select the current early/late transition signal generated to produce phase detector outputs, *Early\_out* and *Late\_out*. The first 3 transitions are type1 transitions, the 4<sup>th</sup> and 5<sup>th</sup> are type3 transitions, the 7<sup>th</sup> and 8<sup>th</sup> are type2 transitions, the 6<sup>th</sup> and final have no transitions, where the transitions are shown as

To test the performance of the phase detector, clock phase  $\Phi_1$  was used to sample at different increments from the center of the symbol, as shown in Figure 14. This was done from -350ps to +350ps from the symbol center. Clock phase  $\Phi_2$  and  $\Phi_3$  are respectively 350ps and 700ps from  $\Phi_1$ .



Figure 3.14: Sampling of Data Symbol at Different Offset from Symbol Center

It was observed that the performance of the phase detector was limited by the digital phase detection used rather than the analog Flash ADC. Although the data symbols were correctly sampled by the Flash ADC, the flip-flops and digital logic for phase detection were not fast enough when the sampling clock  $\Phi_1$  was ±100ps from the symbol center. In Figure 15, the *Late* signal of Type1 transition signal until the clock phase  $\Phi_1$  was about 100ps past the center of the symbol.



Figure 3.15: Phase Detection Output Sampled Across the Symbol Period

# **Chapter 4**

# Serializer/Deserializer BIST

Duty cycle symmetry in SerDes is important in both the *clocks* operating in SerDes and in *data* that is being transmitted in alternating bit sequence. Duty cycle of alternating data signal can be used to measure rise/fall time and signal amplitude of the transmitted signal [20]. A duty cycle jitter measurement BIST, applying both the theory of undersampling and the law of large numbers, is presented. As such, higher measurement accuracy can be achieved. The implementation is purely digital and area efficient. It measures duty cycle of a signal and generates a histogram to provide a clear view of the distribution of the measurements results.

In the following sections, the two concepts and how it can be applied to the duty cycle testing is presented. Next, the overall design, the related components involved and its operation is presented. Finally, the results presented will show the histogram of the measured duty cycle and the increased measurement accuracy of a signal as a result of the concepts used.

# 4.1 Sampling Theorem and Aliasing

Sampling theorem was introduced by Shannon in 1949 for application in communication systems; however, the knowledge of sampling theorem can be traced back to Nyquist in

1928 and other mathematics literature as far back as 1915. The sampling theorem can be stated in two separate but equivalent ways [51]:

- 1. A continuous-time signal with frequencies no higher than  $F_{max}$  is completely described by specifying the values of the signal at instants of time separated by  $1/(2*F_{max})$  seconds.
- 2. A continuous-time signal with frequencies no higher than  $F_{max}$  may be completely recovered from the knowledge of its samples taken at the rate of  $2*F_{max}$  per second.

The sampling rate  $2*F_{max}$  is called the Nyquist rate. The Nyquist rate is the minimum sampling rate allowable by the sampling theorem. Confusingly, the Nyquist frequency refers to  $F_{max}$ . Aliasing occurs when input signal with frequency  $F_{signal}$  is above Nyquist frequency ( $F_{signal} > F_{max}$ ) when sampling at Nyquist rate. This technique of is called undersampling. The signal generated from undersampling is called an alias. Figure 1 shows the frequency domain effects of alias when input frequency is (a) within Nyquist bandwidth (b) the same as Nyquist frequency and (c) above Nyquist frequency.



Figure 4.1: Frequency Domain effects aliasing. (a) Signal Frequency  $(F_{signal}) < Nyquist$ Frequency  $(F_{max})$  (b)  $F_{signal} = F_{max}$  (c)  $F_{signal} > F_{max}$ 

The effects of undersampling can be observed in time domain in Figure 2. The period between sampling points is  $1/F_{sampling}$ . The time domain representation of Figure 1 with the same three different condition of  $F_{signal}$  in relation to  $F_{max}$  is shown in Figure 2. Since  $F_{max} = \frac{1}{2} F_{sampling}$ , as  $F_{sampling}$  decreases,  $F_{max}$  decreases proportionally. In Figure 2(b)  $F_{signal} = F_{max}$  represents an ambiguous limiting condition. If sampling points were at the zero crossing instead of the peaks of the signal wave, then all information would be lost. In Figure 2(c),  $F_{signal} > F_{max}$  represents the condition for undersampling and aliasing. The frequency of the alias signal falls into the Nyquist bandwidth as a result of undersampling. As  $F_{sampling}$  decreases further and  $F_{signal}$  approaches  $F_{sampling}$ , the alias signal approaches DC.

47



Figure 4.2: Time Domain Effects of Aliasing. (a)  $F_{signal} < F_{max}$  (b)  $F_{signal} = F_{max}$ (c)  $F_{signal} > F_{max}$ 

The alias signal from undersampling an input signal is useful when the properties of an alias are understood. The following aliasing theorem will provide a generalized background on how it can be used for measuring jitter [52]:

- 1. For an ideal repetitive time-domain alias, time-related features on the alias (e.g. rise time) is scaled by the ratio of the repetition frequency of the alias to the original repetition frequency of the original waveform.
- 2. The number of samples on each cycle of the alias, N is

$$N = \frac{1}{0.5 - \left| 0.5 - \left( \frac{F_{SIGNAL}}{F_{SAMPLE}} \right) \mod 1 \right|}$$

For example, for  $F_{SIGNAL}$ = 100Hz,  $F_{SAMPLE}$ =99Hz, the alias waveform will have N=100 samples (of the original signal) per alias cycle

3. For a repetitive time-domain signal, ideal positive undersampling loses the repetition frequency information but does not materially change the shape of the

signal, provided that there are at least 10 points on the aliased rise time and the waveform artifacts are not smaller than 1/5 the rise time.

The alias with a scaled ratio of the original waveform can be used to measure the jitter of the original signal since *minute* timing features can be scaled. Another advantage of using the alias is that the resolution of the measurement will be limited only by two criteria: (1) the difference between the sampling frequency and signal frequency ( $F_{sampling}$ - $F_{signal}$ ) (2) the metastability of the sampling flip-flop used to sample the input signal. For these reasons, the undersampling technique has been exploited by the analog-sampling oscilloscopes to allow them to display waveforms with gigahertz and greater repetition rates when sampling only at 100K samples per second.

# 4.2 Law of Large Numbers

The Law of Large Numbers can be exploited for statistical estimation of the duty cycle measurement. The Law of Large Numbers was first proved by Swiss mathematician James Bernoulli in 1713. Bernoulli's proof is more complex; however, it can be presented using Chebyshev's inequality [53].

#### Chebyshev's Inequality Theorem:

Let X be a discrete random variable with expected value  $\mu = E(X)$ , and let  $\varepsilon > 0$  be any positive real number. Then Chebyshev's Inequality states

$$P(|X-\mu| \ge \varepsilon) \le \frac{V(X)}{\varepsilon^2}$$

The following will present Law of Large Numbers and how Chebyshev's Inequality can be useful.

The Law of Large Numbers:

Let  $X_1, ..., X_n$  be a sequence of independent and identically distributed random variables, each having mean  $\mu$  and standard deviation  $\sigma$ . The sample mean  $\langle X \rangle$  will equal the population mean when  $n \to \infty$ .

$$\left\langle X\right\rangle = \left\langle \frac{X_1 + \dots + X_n}{n} \right\rangle = \frac{1}{n} \left[ \left\langle X_1 \right\rangle + \dots + \left\langle X_n \right\rangle \right] = \frac{n\mu}{n} = \mu$$

In addition,

$$\operatorname{var}(X) = \operatorname{var}\left(\frac{X_1 + \dots + X_n}{n}\right) = \operatorname{var}\left(\frac{X_1}{n}\right) + \dots + \operatorname{var}\left(\frac{X_n}{n}\right) = \frac{\sigma^2}{n^2} + \dots + \frac{\sigma^2}{n^2} = \frac{\sigma^2}{n}$$

Therefore by Chebyshev's inequality, for all  $\varepsilon > 0$ ,

$$P(|X-\mu| \ge \varepsilon) \le \frac{\operatorname{var}(X)}{\varepsilon^2} = \frac{\sigma^2}{n\varepsilon^2}$$

As  $n \to \infty$ , then the Law of Large Numbers shows that  $P(|X - \mu| \ge \varepsilon) = 0$ . This can be also be stated the probability that the average  $\left|\frac{(X_1 + ... + X_n)}{n} - \mu\right| < \varepsilon$  for  $\varepsilon$  (an arbitrary positive quantity) approaches 1 as  $n \to \infty$ . The proposed method increases *n*, by increasing the number of sampling clocks used to achieve higher measurement accuracy.

Measurements used in previous work were measured by cumulative-edge and individual processing on transition region width [43, 44]. The cumulative-edge method measures the high frequency jitter by analyzing the unstable bits relative to the median edge of the transition region, while the individual method measures jitter by analyzing the unstable bits using the mean edge of the transition regions. Both measurement methods can have improved measurement accuracy by increasing the number of sampling clocks used. In the proposed method, in addition to using a cumulative-edge method to measure duty cycle jitter, the number of sampling clocks, *n* used is larger than 1. As n increases, the measurement accuracy increases as shown in the Law of Large Numbers that probability |(X + y + X)| = 1

 $\left|\frac{(X_1 + \dots + X_n)}{n} - \mu\right| < \varepsilon \text{ approaches to 1 as } n \text{ increases to infinity.}$ 

The implementation for the proposed improved measurements was made possible with the numerous PLL and number of clock outputs per PLL found in recent FPGAs. The implementation consists of purely digital components. This approach is resource efficient and temperature/process variation independent compared to the mixed-signal approach consisting of components such as operational amplifiers, comparators and delay lines. The digital approach also allows for easier integration to other systems and different technologies.

# **4.3 Duty Cycle Measurement BIST**

In VLSI circuits like DRAM's, dynamic/pipelined circuits, pipelined Analog-to-Digital Converters (ADC) and high-speed SerDes, the operations are synchronized by both transitions of the clock. For dynamic/domino logic circuits, one phase of the clock cycle precharges and the other evaluates, making duty cycle of the clock crucial when operating at high speeds. For memory systems, one phase of the clock cycle is used to precharge the bit bar lines, and the other for read/write operations. In data communications, in order to achieve speeds in the Gbps range, circuits employ parallelism requiring the use of multiple clock phases to serialize data and also for clock recovery. For digital phase detectors in clock recovery systems, a duty cycle mismatch will cause static phase error and reduce the timing margin of the system.



Figure 4.3: Application of Duty Cycle Measurement

An all-digital BIST was implemented the ALTERA Stratix FPGA to ensure that the recovered clock maintains a 50% duty cycle so that the clock edge outputting the retimed data is sampling at the center of the data bit for maximum timing margin. Figure 3 shows a general CDR in a SerDes link, and the BIST receives the recovered clock from the CDR as input signal.

#### Recovered clock PLL U/D lits Counter UART Dual Port PLL ••• LPM\_RAM Interface U/D PLL Counter Jitter Processor.pl **CNTR Banks** PLL Banks **FF Banks** Bins Controller BIST

# 4.3.1 Structure and Operation

Figure 4.4: Structure and Operation of Duty Cycle Measurement BIST

The structure and operation of the duty cycle measurement BIST is shown in Figure 4. The BIST contains mainly of Phase-Lock Loops (PLLs), sampling Flip-Flops (FF), up/down counter arrays and dual port RAM for storing results. A controller was used to control the above data path and also to facilitate interaction with the user. For user interface with the BIST, a parallel-to-serial UART interface was also implemented in the FPGA. The measured data is received by the computer and data analysis on the raw measured results is done using a PERL script. The PERL script produces a histogram of the duty cycle measured, where the histogram parameters like mean, standard deviation and the range of values can be determined.

As presented in the aliasing theorem, alias from undersampling will have its timing characteristics scaled by the ratio of the repetition frequency of the alias to the original repetition frequency of the original waveform. This effect allows an on-chip, higher resolution examination of the clock transition regions, thus allowing measurement of the clock duty cycle more accurately. During undersampling, the PLL clock period is slightly larger than that of the recovered clock, allowing the PLL clock edge to "step" across the span of the recovered clock period, as show in Figure 5. Each sampling clock edge "steps" by the amount of the absolute difference in the two clock periods. The resolution of the undersampling method depends on the smallest difference between the sampling PLL clock period and the recovered clock period allowable before metastability failure of the flip-flops.



Figure 4.5: Undersampling Example of Recovered Clock Signal

# 4.3.2 Phase Lock Loops

The Altera Stratix device have two types of PLL: Enchanced and Fast PLLs. Enhanced PLLs are feature-rich general-purpose PLLs supporting advanced features such as external feedback, clock switchover, PLL reconfiguration, spread-spectrum clocking, and programmable bandwidth. Fast PLLs are optimized for high-speed differential I/O interfaces and can be used for general-purpose, PLL clocking. The fast and enhanced PLL features are shown in Table 1.

| Features                 | Enhanced PLL  | Fast PLL       |
|--------------------------|---------------|----------------|
| Input Frequency Range    | 3 - 462 MHz   | 30 - 644.5 MHz |
| Output Frequency Range   | 0.6 - 462 MHz | 9 - 644.5 MHz  |
| Programmable Phase Shift | 160 ps        | 160 ps         |

| Programmable Delay Shift                  | 250-ps increments |         |
|-------------------------------------------|-------------------|---------|
| Clock Switchover                          | ~                 |         |
| PLL Reconfiguration                       | ✓                 |         |
| Programmable Bandwidth                    | ✓                 |         |
| Spread-Spectrum Clocking                  | ~                 |         |
| Number of Dedicated External Differential | 8                 |         |
| Clock Outputs                             |                   |         |
| Number of Feedback Clock Outputs          | 4                 |         |
| Number of PLLs per Device                 | Up to 4           | Up to 8 |

Table 4.1: Altera Stratix PLL Features [54]

Using Altera Fast PLLs, as many as 12 PLL (4 Fast PLLs X 3 outputs) outputs can be instantiated to perform multi-sampling on the recovered clock. More PLLs can be instantiated. However, since CMC Altera Stratix DSP Development Platform has only a single oscillator input on FPGA, only four PLL's were used. From the datasheet, we should be able to utilize up to 32 PLL outputs (8 Fast PLLs with 4 PLL clock outputs each). These PLL outputs increase the number of sampling on the measured signal to provide a better mean measurement of the duty cycle, as the variance of the mean measurement becomes smaller when n increases (Law of Large Numbers). Figure 6 shows multiple clock sampling of the measured signal as oppose to that of Figure 5.



Figure 4.6: Multiple PLL Undersampling

# 4.3.3 FPGA design



Figure 4.7: Detailed FPGA design of BIST

Figure 7 shows a detail slice of the implementation of Figure 4. The BIST design is separated in two clock domains: the sampling and the controller clock domain. The input clock signal is undersampled by the sampling PLLs in the sampling clock domain and the signal at the output of the cascading flip-flops is an alias of the input signal. The alias signal will have a much lower frequency  $|F_{signal} - F_{sampling}|$  than that of the input and sampling PLL. For outputting the measurements results, the dual-port RAM interfaces between the controller clock domain and the UART. Crossings between clock domains are carefully handled by either cascading flip-flops ensures that the measurements obtained by the BIST are not affected by metastability failure. This allows the sampled data to settle to a stable logic value before being forwarded to other logic. The design is very modular and can easily be expanded to include multiple sampling PLLs.

The implementation utilizes an edge detector, only one up/down counter and some control logic. The duty cycle of the recovered clock is measured by using the up/down counter to measure the length offset of the two phases of the clock cycle. Each measurement result stored in the RAM indicates how much the high pulse of the clock exceeds the low pulse of the clock. For example, a "0" cycle offset will mean that a duty cycle of 50% or 50-50 duty cycle was measured. Using a 16-bit up/down counter will allow a range of up to  $2*(2^8-1)$  number of offsets, divided between positive and negative cycle offset. However, care has to be taken so that the cycle offset to be measured does

not exceed the allowable number of offsets. When this happens the positive cycle offset will be indistinguishable from the negative offset. A simple hand calculation before implementing the BIST will allow us to avoid this issue. For example, given the known absolute difference between the PLL sampling clock period and that of the recovered clock,  $|t_{PLL}-t_{clock}|$ , the allowable measured range of the cycle offset (in seconds) is

$$\left|t_{PLL}-t_{clock}\right|\times(2^{\frac{N}{2}}-1)$$

Where the N is the size of the up/down counter used in bits

For example, if the  $|t_{PLL} - t_{clock}| = 10$  ps, then using a 16-bit up/down counter will give us the cycle offset range of ±2.55 ns from the ideal 50-50 duty cycle.

Since the maximum number of bits for RS232 UART interface of PC to the FPGA board is 8, it will take two 8-bit UART transmission or 16 cycles to serially transfer out the 16-bit cycle offset value. The transfer speed of the UART interface was implemented at 115200 bps. Hence, for 20,000 measurements, it takes only about 2.5 seconds to obtain the results.

The BIST design is all synchronous except for domain crossing logics. There are no setup/hold time warnings/violations for the design. The logic/memory usage is shown below for the BIST implementation:

| Blocks                          | Logic Cells | Memory Bits | PLLs |
|---------------------------------|-------------|-------------|------|
| Each slice using 1pll           | 63          | 32768       | 1    |
| UART controller                 | 54          | 0           | 0    |
| Overall design (including UART) | 1021        | 393216      | 12   |

Table 4.2: Logic and Memory Usage

The maximum operating frequency for controller clock domain in the BIST logic is 155.45MHz. Since the alias signal entering controller clock domain is in the couple of megahertz range, the maximum operating frequency in the controller clock domain is not the performance bottleneck. Also, as the measurement resolution increases, the alias

signal will be closer to DC (Alias theorem). The maximum propagation path was due to the RAM block used. A faster RAM block can increase the overall operation speed.

The expected performance bottleneck the cascaded sampling flip-flops in the sampling clock domain. The current FPGA has maximum operating frequency of flip-flops rated at 400MHz. For SERDES operating at 3.125Gbps, a 300MHz multiple-phase clock is used to achieve parallelism for high speed operation. This clock speed is still within the maximum operating frequency of the sampling clock domain. Rigid placement and routing within an FPGA may reduce the slightly the operating frequency of the proposed method, however the implementation still feasible for SerDes operating in multi-Gbps range.

# 4.3.4 Duty Cycle Counter Controller

The duty cycle controller is a Moore finite state machine that resets and enables the up/down counters during measurements. The duty cycle controller also enables the RAM to read in the value of the up/down counter at the end of every alias clock period. Consecutive alias clock periods are obtained until the RAM is full. Instantiated RAM size in this design allows for 20,000 cycle offset measurement each test.



Figure 4.8: State Machine implementation of duty cycle controller

The state machine of the duty cycle controller is shown in Figure 4.8. It comes out of the *INIT* state when a transition and high pulse is detected. The *UpCount* State enables the counter to start and continue counting up. When a transition and low pulse is detected, the *UpCount* state moves to *DownCount* state. The *DownCount* state enables the counter to count down until the next transition and high pulse is detected. The *Store* state stops the counter and enables the RAM to store the counter value. If RAM is full, the next state will be *RamRdy* where the UART interface is notified to that RAM is ready to be read out. If RAM is not full, the *SetCount* state is the next state. *SetCount* state sets the counter to start counting from 2 as one clock cycle was used for storing the result of the previous alias in the RAM. The loop continues until resetb signal is set or the RAM is full.

### **4.3.4 PERL Processing**

Practical Extraction and Reporting Language (PERL) is a scripting language originally developed for text manipulation and data reduction; now it is used in a range of application including system administration, web development, network programming, GUI development, and more. It was chosen for processing the raw measurements result from the FPGA because it has powerful built-support for text processing and also a huge collection of third-party modules.

Instead of writing a lengthy C program for parsing and further data analysis, PERL can achieve the same goal with less effort. The PERL *unpack* function is used to parse the raw data value and convert it to user friendly 16-bit integer value. Furthermore, a cross platform PERL module (Spreadsheet::WriteExcel) is used to create a histogram plot of the raw data in MS-EXCEL spreadsheet. So just by executing the PERL script on the data received from the FPGA, we can see clearly the distribution of the duty cycle of the clock signal.

# 4.4 Results

A duty cycle jitter measurement BIST was implemented on the Altera Stratix FPGA. A PERL script is executed to generate a histogram in EXCEL to present the measurements from the FPGA. As the number of sampling PPLs used increases, so does the measurement accuracy. The histogram shape and statistical mean and variance show the improvements over using just a single sampling PLL. While the shape of histogram can instantly show the distribution of the waveform parameters, the bin size used in a histogram can affect the shape of histogram. A bin size too large will have statistical results deviating from actual values; on the other hand, a bin size too small will produce histograms with many peaks. A brief the histogram theory of operation is presented in 4.4.1. In 4.4.2, the figures presented will demonstrate how varying bin size affects the shape of the histogram. Same measurements were made using the Serial Data Analyzer to produce histograms with different bin sizes.

In 4.4.3, the results of single PLL sampling versus multiple PLL sampling are presented with the histogram along with statistical parameters mean, standard deviation and range. To show that the measurements obtain are persistent, measurements of input signal with duty cycle varied from 49% to 51% (with 0.1% steps) were taken. The measurements obtained were compared to the measurements made with the Serial Data Analyzer in 4.4.4. Finally, in 4.4.5, a clock signal with randomly varying duty cycle was used as input signal for measurements. Again, the same measurements were made with the Serial Data Analyzer and both results are presented.

The BIST implementation was synthesized and tested in the lab using a 20MHz clock generator to simulate a clock or data with alternating bit sequence. For measurements of a signal with varying duty cycles, since the clock generator used does not allow randomly varying of the generated signal's duty cycle, another Altera Stratix FPGA was used as a source to simulate a signal with varying duty cycle. Duty cycle measurements were compared to that of the LeCroy Serial Data Analyzer (SDA6000).

59

# 4.4.1. Histogram Theory of Operation

Statistical variation in the waveform parameter measurements can be evaluated by knowledge of the average, the range and the standard deviation. However, an effective tool to show a clear view of how parameter's values are distributed over many measurements is the histogram. In a histogram the parameter's value are divided into sub-range called bins. A count for the number of parameters (events) that fall within each bin is accumulated and plotted to give a histogram [7].

Having known the distribution of the parameters from the histogram, additional statistical calculation can be performed to characterize a histogram or differentiate a histogram from another. Since such calculations assume that all events in a bin are represented by a single value, the calculation will be affected by bin size. The smaller the bin size, the less potential deviation between actual event values and those values assumed in histogram parameter calculations. However, smaller bins size result in higher number of bins, requiring a greater number of waveform parameter measurements in to populate the bins sufficiently for the identification of a characteristic histogram distribution. Also, for a smaller bin size, it will be more difficult to determine the peaks in the histograms.

# **4.4.2 Bin size**

The duty cycle offset measurement is carried out using  $|t_{PLL} - t_{clock}| = 10$  ps. The histogram produced can have different bin sizes shown in Figure 10, 11, 12. All measurements were taken using the maximum number of PLLs (4 PLLs with total of 12 clock outs). The resulting measurements were also taken using the Serial Data Analyzer SDA6000 with varying number of bins is shown in Figure 13, 14, 15. In Figure 15, the number of bins used is 2000; the histogram shown has many peaks and can be difficult to determine the mean. As we reduced the number of bins, the more the deviation of the actual result from the value represented by the bin. The histogram plots in the following sections will have bin size of 10 ps.



Figure 4.9: Histogram of Cycle Offset Measurement with Bin Size of 200ps



Figure 4.10: Histogram of Cycle Offset Measurement with bin size of 20ps



Figure 4.11: Histogram of Cycle Offset Measurement with bin size of 10ps



Figure 4.12: Histogram of Duty Cycle Measurement from SDA600 using 20 bins


Figure 4.13: Histogram of Duty Cycle Measurement from SDA600 using 100 bins



Figure 4.14: Histogram of Duty Cycle Measurement from SDA600 using 2000 bins

### 4.4.3 Single vs Multiple PLL sampling

The Figures 16, 17, 18 and 19 show the cycle offset measurements using 3, 6, 9 and 12 sampling clocks respectively. The shape of the histogram converges to the mean as number of sampling clocks used in measurement increases. From the Law of Large Numbers as number of sampling clock increases, the average of the measurements from the sampling clocks will converge to the mean. This gives a histogram shape with smaller standard deviation when the number of sampling clock increases. The standard deviation and range of the histograms in Figures 16, 17, 18 and 19 are shown in Table 3. The same measurements were also obtained using the Lecroy SDA. The standard deviation becomes smaller when the number of sampling clock increases. The mean measurements obtained from the BIST and the Lecroy is different by about 30ps. Section 4.4.4 shows that the measurements are repeatable and the difference is consistent. To compensate for this limitation, the measurements results will need to be subtracted by that difference.



Figure 4.15: Histogram of Cycle Offset Measurement using 1 PLL with 3 outputs



Figure 4.16: Histogram of Cycle Offset Measurement using 2 PLL with 6 outputs



Figure 4.17: Histogram of Cycle Offset Measurement using 3 PLL with 9 outputs



Figure 4.18: Histogram of Cycle Offset Measurement using 4 PLL with 12 outputs

| Measurement Method           | Histogram Mean  | Histogram Standard | Histogram Range |
|------------------------------|-----------------|--------------------|-----------------|
|                              | (ps)            | Deviation          | (ps)            |
| Using LeCroy SDA6000         | 10.0 (= 50.02%) | 355m%              | 1125 (= 2.25 %) |
| Using 12 sampling clock-outs | 41.9            | 2.68               | 1400            |
| Using 9 sampling clock-outs  | 44.6            | 2.89               | 1690            |
| Using 6 sampling clock-outs  | 47.4            | 3.04               | 1470            |
| Using 3 sampling clock-outs  | 52.6            | 3.19               | 1650            |

Table 4.3: Histogram parameters using different measurement methods

Since the BIST measurements obtain are actually cycle offset, which is the difference between the lengths of high and low pulse, the histogram mean for duty cycle of 50.2% measured by the SDA (in Table 3) is 10ps in terms of cycle offset for a signal with period of 50ns. Equivalently, the range is 1125 ps for a signal with period of 50ns.

#### 4.4.4 Repeatability

The mean measurements obtained in Table 3 shows a difference between the mean from the sampling PLLs and that of the Lecroy SDA. In this section the discrepancy is shown to be consistent and repeatable across a range of measurements.

To show the repeatability of the measurements, the input clock signal is varied from duty cycle of 49% to 51% with 0.1% (or 50ps) offset increments. The histogram mean is measured from each duty cycle increments and the difference of the histogram mean from that of the Lecroy SDA measurement is plotted in Figure 20. Figure 20 shows that the measured histogram mean of each varying duty cycle input signal is different from the SDA measured value by an average of 30.4 ps.



Figure 4.19: Histogram Mean offset for range of duty cycle inputs

## 4.4.5 Varying Duty Cycle

The previous measurements always have an input signal with a fixed duty cycle. In Figure 18 and 19, a test clock signal generated from another FPGA with duty cycle randomly switching between 50% and 43% was used. When duty cycle is 50%, both phase of the clock cycle is 28ns; when 43%, the high clock phase is 24ns, and the low clock phase is 32ns. Using  $|t_{PLL} - t_{clock}| = 10$ ps, the histogram of the varying duty cycle signal measurement is shown in Figure 21. Figure 21 can be compared to the histogram from the LeCroy SDA in Figure 22. The bin size of 2000 is used in Figure 22.

Since input signal is randomly switching between two duty cycles, the measured duty cycle range is expected to be larger than that of an input with single duty cycle. Using the default number of bins of 100 with the LeCroy SDA as shown in Figure 23 will result in loss of some information as each individual duty cycle histogram cannot clearly represented. Figure 22 and 21 clearly shows the advantage of using an increase bin number or smaller bin size taking measurements with a wide range.



Figure 4.20: Histogram of cycle offset measurement of a signal with randomly varying

duty cycles



Figure 4.21: Histogram of duty cycle measurement of a signal with randomly varying duty cycles using LeCroy SDA6000 with 2000 bins



Figure 4.22: Histogram of duty cycle measurement of a signal with randomly varying duty cycles using LeCroy SDA6000 with 100 bins

# **Chapter 5**

# Conclusion

To meet increasing bandwidth requirements of new information technology and networks, we proposed a novel multi-level phase detector for high speed Serializers/Deserializers and designed a Built In Self Test (BIST) component capable of improving existing high speed SerDes testing. This thesis hence explored both the design aspect and the testing aspect for high speed Serializers/Deserializers (SerDes).

A novel all-digital multi-level phase detector was designed and simulated in 0.35µm. The all digital implementation allows for easy portability between different technologies, has lower power consumption, and has less vulnerability to temperature and process variation. The multi-level phase detector also comprises of high speed Flash ADCs that samples and converts the high speed multilevel analog input signals into digital values for further processing. Since multi-level signal inputs are used, the transmission frequency is half of that of the data rate. Signal-to-noise ratio in high speed serial links benefits from multilevel signal transmission as lower transmission frequency circumvents several high speed signal integrity issues.

In SerDes component testing, a Built In Self Test (BIST) component was designed and implemented in an FPGA to provide measurements for duty cycle jitter. The BIST uses multiple phase lock loop (PLL) sampling to provide more accurate measurement results. The LeCroy Serial Data Analyzer (SDA6000) was used as a reference for test measurements. During testing, the results obtained from multiple sampling of input test signal shows a smaller standard deviation from the mean and a closer histogram shape to the reference than that of the single sampling method. The Law of Large Numbers and undersampling technique used also shows repeatability and accuracy for test signal with multiple duty cycle variation. The FPGA BIST implementation can also be further developed in ASICs to provide higher operation speed.

-

### **Reference:**

- [1] Singhal and R. Jain, "Terabit switching: A survey of techniques and current products," *Comp. Commun.*, vol. 25, no. 6, pp. 547–556, 2002.
- [2] Rick Merritt, "Designers chart progress in the gigabit era", *EETimes*, 4 November 2005
- [3] Zerbe, J.L.; Werner, C.W.; Stojanovic, V.; Chen, F.; Wei, J.; Tsang, G.; Kim, D.; Stonecypher, W.F.; Ho, A.; Thrush, T.P.; Kollipara, R.T.; Horowitz, M.A.; Donnelly, K.S."Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell", *Solid-State Circuits, IEEE Journal of*, Volume 38, Issue 12, Dec 2003 Page(s):2121 – 2130
- [4] Tzong-Lin Wu; Chien-Chung Wang; Yen-Hui Lin; Ting-Kuang Wang; Chang, G.
   "A novel power plane with super-wideband elimination of ground bounce noise on high speed circuits", *Microwave and Wireless Components Letters, IEEE*, Volume 15, Issue 3, March 2005 Page(s):174 – 176
- [5] Jingook Kim; Heeseok Lee; Joungho Kim "Effects on signal integrity and radiated emission by split reference plane on high-speed multilayer printed circuit boards", *Advanced Packaging, IEEE Transactions on*, Volume 28, Issue 4, Nov. 2005 Page(s):724 – 735
- [6] K. Oshiro and G. Uehara, "A 10 Gbps 83 mW GaAs HBT equalizer/detector for coaxial cable channels," *IEEE Custom Integrated Circuits Conf.*, 1998, pp. 15.3.1– 15.3.4
- [7] J. Sonntag et al., "An adaptive PAM-4 5Gb/s backplane transceiver in 0.25 μm CMOS," *IEEE Custom Int. Circuits Conf.*, May 2002, pp. 363–366.
- [8] R. Farjad-Rad et al., "A 0.4 mm CMOS 10-Gb/s 4-PAM pre-emphasis serial link transmitter," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 580–585, May 1999.

- [9] Musa, F.A.; Carusone, A.C. "Clock recovery in high-speed multilevel serial links" *Circuits and Systems, 2003. ISCAS '03.* Proceedings of the 2003 International Symposium on Volume 5, 25-28 May 2003 Page(s):V-449 - V-452 vol.5
- [10] Farjad-Rad, R.; Chih-Kong Ken Yang; Horowitz, M.; Lee, T. "A 0.3-μm CMOS 8-Gb/s 4-PAM serial link transceiver" VLSI Circuits, 1999. Digest of Technical Papers. 1999 Symposium on 17-19 June 1999 Page(s):41 – 44
- [11] Kahn Li Lim; Zilic, Z. "A novel phase detector for PAM-4 clock recovery in high speed serial links", SOC Conference, 2004. Proceedings. IEEE International 12-15 Sept. 2004 Page(s):151 – 152
- [12] Johnson, J.C. "Options for high-volume test of multi-Gb/s ports" Test Conference, 2004. Proceedings. International 2004 Page(s):1435
- [13] Cai, Y.; Werner, S.A.; Zhang, G.J.; Olsen, M.J.; Brink, R.D. "Jitter testing for multi-Gigabit backplane SerDes - techniques to decompose and combine various types of jitter" *Test Conference*, 2002. Proceedings. International 7-10 Oct. 2002 Page(s):700 - 709
- [14] Cai, Y.; Warwick, T.P.; Rane, S.G.; Masserrat, E. "Digital serial communication device testing and its implications on automatic test equipment architecture" *Test Conference, 2000. Proceedings. International* 3-5 Oct. 2000 Page(s):600 – 609
- [15] The International Technology Roadmap for Semiconductors (ITRS) 2003 Edition Test and Test Equipment
- [16] Watanabe, D.; Suda, M.; Okayasu, T. "34.1 Gbps low jitter, low BER high-speed parallel CMOS interface for interconnections in high-speed memory test system" *Test Conference, 2004. Proceedings. International* 2004 Page(s):1255 – 1262
- [17] Sunter, S.; Roy, A. "BIST for phase-locked loops in digital applications" *Test* Conference, 1999. Proceedings. International 28-30 Sept. 1999 Page(s):532 – 540
- [18] Chan, A.H.; Roberts, G.W. "A jitter characterization system using a componentinvariant Vernier delay line" Very Large Scale Integration (VLSI) Systems, IEEE Transactions on Volume 12, Issue 1, Jan. 2004 Page(s):79 - 95
- [19] Casper, B.; Martin, A.; Jaussi, J.E.; Kennedy, J.; Mooney, R. "An 8-Gb/s simultaneous bidirectional link with on-die waveform capture" *Solid-State Circuits, IEEE Journal* of Volume 38, Issue 12, Dec 2003 Page(s):2111 2120

- [20] Sunter, S.; Roy, A.; Cote, J.-F. "An automated, complete, structural test solution for SERDES" *Test Conference, 2004. Proceedings. International* 26-28 Oct. 2004 Page(s):95 – 104
- [21] Dalal, W.; Rosenthal, D. "Measuring jitter of high speed data channels using undersampling techniques" *Test Conference*, 1998. Proceedings. International 18-23 Oct. 1998 Page(s):814 – 818
- [22] Hetherington, G.; Simpson, R.; "Circular bist testing the digital logic within a high speed serdes" *Test Conference*, 2003. Proceedings. ITC 2003. International Volume 1, Sept. 30-Oct. 2, 2003 Page(s):1221 122
- [23] Sungjoon Kim; Kyeongho Lee; Deog-Kyoon Jeong; Lee, D.D.; Nowatzyk, A.G.
   "An 800 Mbps multi-channel CMOS serial link with 3× oversampling" *Custom Integrated Circuits Conference*, 1995, Proceedings of the IEEE 1995 1-4 May 1995 Page(s):451 – 455
- [24] Chih-Kong Ken Yang; Ramin Farjad-Rad; Horowitz, M.A. "A 0.5-μm CMOS 4.0-Gbit/s serial link transceiver with data recovery using oversampling", *Solid-State Circuits, IEEE Journal of*, Volume 33, Issue 5, May 1998 Page(s):713 – 722
- [25] J. D. H. Alexander, "Clock recovery from random binary data," *Electronics Letters*, vol. 11, pp. 541–542, Oct. 1975.
- [26] R. Best, Phase-Locked Loops, 3rd ed, McGraw Hill, 1997
- [27] Lee, T.H.; Bulzacchelli, J.F. "A 155-MHz clock recovery delay- and phase-locked loop" Solid-State Circuits, IEEE Journal of Volume 27, Issue 12, Dec. 1992 Page(s):1736-1746
- [28] Nakamura, K.; Fukaishi, M.; Abiko, H.; Matsumoto, A.; Yotsuyanagi, M. "A 6 GbpsCMOS phase detecting DEMUX module using half-frequency clock", VLSI Circuits, 1998 Digest of Technical Papers.1998 Symposium on 11-13 June 1998 Page(s):196 – 197
- [29] Rezayee, A.; Martin, K. "A 9-16Gb/s clock and data recovery circuit with three-state phase detector and dual-path loop architecture" *Solid-State Circuits Conference*, 2003. ESSCIRC '03. Proceedings of the 29th European 16-18 Sept. 2003
  Page(s):683 686

- [30] Behzad Razavi "Phase-Locking in High-Performance Systems: From Devices to Architectures" *Wiley-IEEE Press*, February 2003; pp. 34-45
- [31] Hu, T.H.; Gray, P.R. "A monolithic 480 Mb/s parallel AGC/decision/clock-recovery circuit in 1.2 μm CMOS" Solid-State Circuits Conference, 1993. Digest of Technical Papers. 40th ISSCC., 1993 IEEE International 24-26 Feb. 1993 Page(s):98 - 99, 269
- [32] John Patrin, Mike Li "Comparison and Correlation of Signal Integrity Measurement Techniques." DesignCon 2002.
- [33] Thomas H. Lee "The Design of CMOS Radio-Frequency Integrated Circuits", Cambridge University Press, 1998.
- [34] C.D. Motchenbacher and F.C. Fitchen, "Low Noise Electronic Design", Wiley, New York, 1973, p.172
- [35] Nelson Ou, Touraj Farahmand, Andy Kuo, Sassan Tabatabaei, Andre Ivanov. "Jitter Models for the Design and Test of Gbps-Speed Serial Interconnects", *IEEE Design* and Test of Computers, vol. 21, no.4, pp. 302-313, July/August, 2004
- [36] H.W. Johnson and M. Graham, "High-speed Signal Propagation : Advanced Black Magic", Prentice Hall, 2003.
- [37] Robertson, Iain et al. "Testing High-Speed, Large Scale Implementation of SerDes
   I/Os on Chips Used in Throughput Computing Systems" *Test Conference, 2005 Proceedings. International*, Paper 38.1
- [38] Cole, C.B.; Warwick, T.P. "High speed digital transceivers: A challenge for manufacturing" *Test Conference*, 1999. Proceedings. International 28-30 Sept. 1999 Page(s):211 – 215
- [39] Tripp, M.; Mak, T.M.; Meixner, A. "Elimination of traditional functional testing of interface timings at Intel" *Test Conference*, 2004. Proceedings. International 2004 Page(s):1448-1454
- [40] C.Stroud, "Automated BIST for sequential logic synthesis", IEEE Design & Test of Computers, pp 22-32, December 1988.
- [41] N.Touba, "Obtaining High Fault Coverage with Circular BIST Via State Skipping", VLSI Test Symposium, Proceedings, pp41 0-41 5, 1997,

- [42] Hafed, M.; Abaskharoun, N.; Roberts, G.W. "A stand-alone integrated test core for time and frequency domain measurements" *Test Conference*, 2001. Proceedings International 30 Oct.-1 Nov. 2001 Page(s):1190 – 1199
- [43] Sunter, S; Roy, A. "Structural Tests for Jitter Tolerance in SerDes Receivers", Test Conference, 2005 Proceedings. International, Paper 9.1
- [44] Dongwoo Hong; Dryden, C.; Saksena, G.; Panis, M. "An efficient random jitter measurement technique using fast comparator sampling", *VLSI Test Symposium*, 2005 Proceedings 23rd IEEE 1-5 May 2005 Page(s):123 – 130
- [45] Zerbe, J.L.; Werner, C.W.; Stojanovic, V.; Chen, F.; Wei, J.; Tsang, G.; Kim, D.; Stonecypher, W.F.; Ho, A.; Thrush, T.P.; Kollipara, R.T.; Horowitz, M.A.; Donnelly, K.S. "Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell" *Solid-State Circuits, IEEE* Journal of Volume 38, Issue 12, Dec 2003 Page(s):2121 - 213
- [46] T.Schott, G.Patel "Overcoming Design Challenges to Reach One-Terabit Data/Rate for a Star Configuration Backplane Using FR-4", *DesignCon*, 2002
- [47] Bong-Joon Lee; Moon Sang Hwang; Sang-Hyun Lee; Deog-Kyoon Jeong; "A 2.5-10-Gb/s CMOS transceiver with alternating edge sampling phase detection for loop characteristic stabilization" Solid- State Circuits, IEEE Journal of, Volume: 38, Issue: 11, Nov. 2003
- [48] David A. Johns, Ken Martin "Analog Integrated Circuit Design" John Wiley & Sons, Inc1997,
- [49] Yang, C.-K.K.; Stojanovic, V.; Modjtahedi, S.; Horowitz, M.A.; Ellersick, W.F. "A serial-link transceiver based on 8-GSamples/s A/D and D/A converters in 0.25-μm CMOS" Solid-State Circuits, IEEE Journal of Volume 36, Issue 11, Nov. 2001 Page(s):1684 – 1692
- [50] Uyttenhove, K.; Steyaert, M.S.J. "A 1.8-V 6-bit 1.3-GHz flash ADC in 0.25-/spl mu/m CMOS" Solid-State Circuits, IEEE Journal of, Volume: 38, Issue: 7, July 2003
- [51] Mark Burns, Gordon W. Roberts "An Introduction to Mixed-Signal IC Test and Measurement", Oxford University Press 2001 pp.159

- [52] Leslie Green, "The aliasing theorems: practical undersampling for expert engineers", www.ednmag.com, june 21,2001 pp.97-105
- [53] C.M. Grinstead, J.L. Snell. "Introduction to Probability", American Mathematical Society, 1997, pp305-31
- [54] http://altera.com/products/devices/stratix/features/stx-pll.html
- [55] R.Bhatti, M.Denneau, J.Draper, "Duty Cycle Measurement and Correction Using a Random Sampling Technique", Proceedings of the 48th IEEE International Midwest Symposium on Circuits and Systems, August 2005
- [56] Wavemaster Series operator's Manual