INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps.
Circuits for On-Chip Sub-Nanosecond Signal Capture and Characterization

by

Nazmy Abaskharoun, B. Eng. 1998

Department of Electrical Engineering
McGill University, Montréal

March 2001

A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Engineering

© Nazmy Abaskharoun, 2001
The author has granted a non-exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

0-612-70216-2
Circuits For On-Chip Sub-Nanosecond Signal Capture and Characterization

Abstract

On-chip signal extraction and characterization structures are slowly becoming a necessary component of any complex integrated circuit design. The increased integration of the modern System On a Chip has necessitated the development of these alternate test strategies to address the issue of device node access and signal integrity. The process of extracting a signal in an analog form across a chip boundary can often compromise its true nature, as these new systems stretch performance limits.

The aim of this thesis is to introduce two circuits for on-chip sub-nanosecond signal capture. The emphasis is placed on providing gigahertz rate effective sampling resolutions to provide a progressive characterization solution for the ever increasing operating speed of integrated circuits.

The first circuit presented is a hardware implementation of an undersampling algorithm that extends the operation of a pre-existing mixed-signal test-core to the capture of periodic signals with a bandwidth much greater than the sample rate of the system. This hardware unit comprises of a specialized timing module based on a Delay Locked Loop with tap selection circuitry. The effective sampling resolution of the system is limited by the intrinsic gate delay of the technology the timing module is implemented in.

The second circuit presented is a specialized jitter measurement device. This device is based on a Vernier Delay Line Time-to-Digital converter, and can provide resolutions well below a gate delay. Special emphasis was given to jitter measurement, since it is an issue that is often difficult to address adequately in the testing of many complex circuits. Both the aforementioned circuits were implemented in a 0.35 μm CMOS process, and results demonstrating their successful operation are presented.
Résumé

Les dispositifs intégrés pour l’extraction et la caractérisation des signaux deviennent progressivement un composant primordial dans la conception de circuits intégrés complexes. La croissance du niveau d’intégration des systèmes sur puce (SoC) a rendu nécessaire le développement de ces nouvelles stratégies de test qui permettent l’accès aux signaux internes en même temps que la préservation de leur intégrité. En effet, la capture d’un signal sous forme analogique par le biais d’un contact périphérique peut altérer la nature de ce signal, compromettant de ce fait une mesure précise des performances des systèmes sus-mentionnés.

Le but de cette thèse est de présenter deux circuits pour la saisie “in situ” de signaux avec une résolution supérieure à une nanoseconde. La plus grande attention est porte sur la capacité à délivrer une fréquence effective d’échantillonnage de l’ordre du gigahertz, dans le but de fournir une solution de caractérisation qui soit progressive, étant donné la croissance continuelle de la vitesse de fonctionnement des circuits intégrés.

Le premier circuit présenté est une implémentation matériel d’un algorithme de sous-échantillonnage qui étend l’opération d’un noyau-test analogique/numérique, déjà existant, à la saisie des signaux avec une largeur de bande beaucoup plus grande que la cadence d’échantillonnage du système. Ce circuit comporte un module de synchronisation spécialisé basé sur une boucle verrouillée par retard (DLL) avec une circuiterie de sélection de prise. La résolution effective d’échantillonnage est limitée par le retard intrinsèque d’une porte élémentaire, caractéristique de la technologie dans laquelle le module est implémenté.

Le second circuit présenté est un dispositif dédié à la mesure d’instabilité de phase (jitter). Ce dispositif est basé sur un convertisseur Vernier de ligne à retard, et peut fournir
des résolutions bien en deçà d'un retard de porte élémentaire. Une considération toute particulière a été portée sur la mesure d'instabilité, problème souvent difficile à aborder dans beaucoup de circuits complexes. Les deux circuits sus-mentionnés ont été implémenté dans une technologie CMOS 0.35 μm avec succès, et des résultats probant démontrant leur fonctionnement sont présentés.
Acknowledgments

The work in this thesis would not have been possible without the support and encouragement of many individuals. To begin with, I would like to express my gratitude to my supervisor, Professor Gordon W. Roberts, for instilling me with interest in the field, for his instruction, and for his technical and professional guidance.

I would also like to acknowledge the technical contribution of Mohamed Hafed to this body of work through our numerous discussions, his suggestions, and his development of the initial test-core prototype which motivated the work in this thesis. I also extend my thanks to Mourad Oulmane for providing me with the French translation of the Abstract.

I am also grateful to my family for their unconditional and unyielding support and encouragement throughout the years.

My brief tenure at the MACS lab would not have been as memorable, if it had not been for the friendship and camaraderie of my fellow students: Ahmed, Arshan, Boris, Christian, Ian, Lige, Mohamed, Mona, Mourad, Naveen, Ramez, Sebastien, Yang, and Ye. I extend my gratitude to them for the many memories, and wish them the best for the future. To my friends: Veronika, Tal, Miguel, Angela, Elizabeth, and Matthew - For your support and friendship I am forever grateful, for without it I would not be where I am today.

This work was supported by the Natural Sciences and Engineering Research Council of Canada, the Canadian Microelectronics Corporation, and Micronet, a Canadian network of centres of excellence dealing with microelectronic devices, circuits, and systems.
# Table of Contents

## Chapter 1 - Introduction
1.1 - Motivation ................................................................. 1  
1.2 - Thesis Outline ............................................................ 4

## Chapter 2 - Background
2.1 - On-Chip Waveform Capture ............................................. 6  
2.1.1 - An Integrated Mixed-Signal Test Core ......................... 6  
2.1.2 - Signal Generation .................................................... 8  
2.1.3 - Multipass Signal Capture ........................................... 10  
2.1.4 - Test-Core Limitations ............................................. 11  
2.2 - Jitter Measurement ....................................................... 12  
2.2.1 - Challenges in Jitter Measurement ............................... 12  
2.2.2 - Jitter Measurement Techniques .................................. 15  
2.2.3 - Proposed On-Chip Jitter Measurement Solution ............. 16  
2.3 - Summary ................................................................. 17

## Chapter 3 - Signal Capture
3.1 - On-Chip Waveform Capture ............................................. 19  
3.1.1 - Undersampling Algorithms ........................................ 19  
3.2 - On-Chip Digitizer ....................................................... 25  
3.2.1 - System Level Architecture ......................................... 25  
3.2.2 - DC Reference Generator ........................................... 26  
3.2.3 - Comparator ............................................................. 27  
3.2.4 - T/H and Buffer .......................................................... 28  
3.2.5 - Modifications to the Waveform Digitizer ....................... 30  
3.3 - Timing Module Design ................................................... 33  
3.3.1 - DLL Architecture .................................................... 33  
3.3.2 - Self-Biased DLLs ....................................................... 36  
3.3.3 - The Delay Cell ......................................................... 37  
3.3.4 - The Phase Detector .................................................. 43  
3.3.5 - The Charge Pump & Loop Filter ................................... 46  
3.3.6 - Phase Selection Circuitry ......................................... 49  
3.3.7 - Additional Circuitry ............................................... 51
Chapter 4 - Jitter Measurement ................................................. 71
  4.1 - Jitter Measurement .................................................. 71
    4.1.1 - Time-to-Digital Converters (TDCs) ................................. 71
    4.1.2 - Vernier Delay Line (VDL) Samplers ................................ 72
    4.1.3 - Tuning Methods .................................................. 78
    4.1.4 - A Jitter Measurement Device Based on a VDL Sampler ............ 81
  4.2 - Implementation ...................................................... 87
    4.2.1 - The Delay Cell .................................................. 87
    4.2.2 - Sizing and Layout Considerations .................................. 90
    4.2.3 - Edge Counter Implementation ................................... 93
  4.3 - Experimental Results .............................................. 95
    4.3.1 - Test Setup ..................................................... 95
    4.3.2 - Delay Cell Characteristics ...................................... 99
    4.3.3 - Transfer Characteristic and Linearity Measurements .............. 100
    4.3.4 - Creating Signals with Jitter .................................... 102
    4.3.5 - Jitter Measurements ........................................... 104
  4.4 - Performance Limitations ........................................... 109
  4.5 - Summary ............................................................. 111

Chapter 5 - Conclusions .................................................. 112

References ................................................................. 114
## List of Figures

### Chapter 1 - Introduction

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.1</td>
<td>Typical SOC Device</td>
<td>1</td>
</tr>
<tr>
<td>1.2</td>
<td>Basic Components of a Mixed-Signal ATE</td>
<td>3</td>
</tr>
</tbody>
</table>

### Chapter 2 - Background

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.1</td>
<td>Generic Configuration of an On-Chip Sampler</td>
<td>7</td>
</tr>
<tr>
<td>2.2</td>
<td>System Level Architecture of the Mixed-Signal Test Core</td>
<td>7</td>
</tr>
<tr>
<td>2.3</td>
<td>Spectral Properties of DS bitstreams</td>
<td>9</td>
</tr>
<tr>
<td>2.4</td>
<td>Test-Core AWG component partitioning</td>
<td>9</td>
</tr>
<tr>
<td>2.5</td>
<td>Graphical Representation of Multi-Pass Signal Capture</td>
<td>11</td>
</tr>
<tr>
<td>2.6</td>
<td>System Level Schematic of Waveform Digitizer</td>
<td>11</td>
</tr>
</tbody>
</table>

### Chapter 3 - Signal Capture

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.1</td>
<td>Graphical representation of undersampling algorithm</td>
<td>20</td>
</tr>
<tr>
<td>3.2</td>
<td>System-Level Architecture of On-Chip Digitizer</td>
<td>26</td>
</tr>
<tr>
<td>3.3</td>
<td>DC Reference Generator</td>
<td>27</td>
</tr>
<tr>
<td>3.4</td>
<td>Comparator used in waveform Digitizer</td>
<td>27</td>
</tr>
<tr>
<td>3.5</td>
<td>Track and Hold and Buffer</td>
<td>29</td>
</tr>
<tr>
<td>3.6</td>
<td>Modified Sampling Stage</td>
<td>30</td>
</tr>
<tr>
<td>3.7</td>
<td>DLL Architecture</td>
<td>33</td>
</tr>
<tr>
<td>3.8</td>
<td>Modified Timing Module Architecture with no Output Phase Latency</td>
<td>35</td>
</tr>
<tr>
<td>3.9</td>
<td>Self-Biased DLL Architecture</td>
<td>37</td>
</tr>
<tr>
<td>3.10</td>
<td>Self-Biased Delay Cell</td>
<td>38</td>
</tr>
<tr>
<td>3.11</td>
<td>PMOS Symmetric Load Characteristics</td>
<td>39</td>
</tr>
<tr>
<td>3.12</td>
<td>Simplified schematic of DLL Bias Cell</td>
<td>41</td>
</tr>
<tr>
<td>3.13</td>
<td>DLL Bias Cell</td>
<td>42</td>
</tr>
<tr>
<td>3.14</td>
<td>Simulated DLL Bias Cell Characteristics</td>
<td>43</td>
</tr>
<tr>
<td>3.15</td>
<td>Phase Detector</td>
<td>44</td>
</tr>
<tr>
<td>3.16</td>
<td>Simulated Phase Detector Operation for Un-Locked Mode</td>
<td>45</td>
</tr>
<tr>
<td>3.17</td>
<td>Simulated Phase Detector Operation for Locked Mode</td>
<td>45</td>
</tr>
<tr>
<td>3.18</td>
<td>Operation of Charge Pump for the UP and DN signals</td>
<td>47</td>
</tr>
<tr>
<td>3.19</td>
<td>DLL Charge Pump</td>
<td>48</td>
</tr>
<tr>
<td>3.20</td>
<td>Simulated Behaviour of Charge Pump</td>
<td>48</td>
</tr>
</tbody>
</table>
Figure 3.21  Block level diagram of phase selection block ........................................... 52
Figure 3.22  2 - to -1 Multiplexer used in phase selection circuitry ............................. 53
Figure 3.23  Supplementary Circuits........................................................................ 53
Figure 3.24  Teradyne A567 Tester ........................................................................ 55
Figure 3.25  PCB Mounted onto Test Head ................................................................. 56
Figure 3.26  PCB Used to test Digitizer and DLL ........................................................ 56
Figure 3.27  Micrograph of On-Chip Digitizer ............................................................ 57
Figure 3.28  Digitizer Gain Characteristics ................................................................. 58
Figure 3.29  Captured Sine Wave ............................................................................ 59
Figure 3.30  Micrograph of DLL .............................................................................. 59
Figure 3.31  VCDL Delay versus Tap for various Control Voltages ......................... 60
Figure 3.32  DLL Isb as a function of control voltage .................................................. 62
Figure 3.33  DLL's Control Voltage versus Operating Frequency ............................. 63
Figure 3.34  Forced and Locked LSBs ..................................................................... 63
Figure 3.35  DLL control voltage lock waveform for various operating frequencies 64
Figure 3.36  Timing Module with Calibration Circuitry .............................................. 65
Figure 3.37  DLL DNL and INL before and after calibration for LSB = 625 ps ........... 66
Figure 3.38  Clock Waveform Capture .................................................................... 67
Figure 3.39  Data Stream Eye-Pattern ..................................................................... 67
Figure 3.40  DLL Jitter Characteristics ..................................................................... 69

Chapter 4 - Jitter Measurement ................................................................................. 71
Figure 4.1  A Vernier Delay Line Sampler ................................................................. 73
Figure 4.2  Operation of a VDL Sampler .................................................................. 75
Figure 4.3  The Effect of Control Voltage Noise on Buffer Delay ......................... 78
Figure 4.4  Tuning Mechanism for a VDL sampler .................................................... 79
Figure 4.5  Jitter Quantification with the VDL Sampler ............................................. 82
Figure 4.6  A Jitter Measurement Device based on a VDL Sampler ......................... 83
Figure 4.7  Optional Supporting Circuitry for Jitter Measurement Device ................. 85
Figure 4.8  On-Chip incorporation of the jitter measurement device ....................... 85
Figure 4.9  Delay cell used in TDC ......................................................................... 89
Figure 4.10  Layout of Delay Cell used in TDC ......................................................... 91
Figure 4.11  Jitter Measurement Device Floorplan .................................................... 93
Figure 4.12  "Scannable" Ripple Counter used in Jitter Measurement Device .......... 93
Figure 4.13  Jitter Measurement Device Chip Micrograph ........................................ 96
Figure 4.14  PCB mounted onto Test Head ............................................................... 98
Figure 4.15  Printed Circuit board used to test Jitter Measurement Device IC .......... 98
Figure 4.16  TDC Delay Cell Characteristics ............................................................. 99
Figure 4.17  Transfer Characteristic of the TDC for various resolutions ................... 101
Figure 4.18  TDC INL for LSB = 115 ps ................................................................ 102
Figure 4.19 Generating Signals with Jitter through the use of a Single Delay Cell. 103
Figure 4.20 Set-up used to evaluate functionality of jitter measurement device. ....104
Figure 4.21 Gaussian Jitter Distribution measured with an Oscilloscope................107
Figure 4.22 Sinusoidal Jitter Distribution measured with an Oscilloscope ..........108
Figure 4.23 Gaussian and Sinusoidal Jitter Measurement Results....................109
Figure 4.24 Delay Line Jitter Measurements ................................................. 111
Chapter 1 - Introduction

1.1 - Motivation

The increased demand for high performance, multi-functional integrated circuits has driven the System On a Chip (SOC) phenomenon. The convergence of building blocks such as ADCs, DACs, DSPs, Microprocessors, filters, and RF components onto a single die has made production testing a costly and time consuming issue. Conventional test techniques applied to stand-alone mixed-signal devices can no longer be applied to these SOC devices, given the embedded nature of many of their components. Signal extraction and component stimulation pose problems due to the fact that sub-system I/O interfaces are no longer connected to package pins, but interface directly with other sub-systems on the die (Figure 1.1).

Ad-hoc methods such as test-point insertion and loop-back testing [1] have been developed to test complex mixed signal devices such as hard drive read channel ICs. These
methods though require the use of high quality I/O devices that will allow for the stimulation of system components and the extraction of signals. Costly ATE systems are also required to generate stimulus waveforms and capture output signals. The resolution and speed of the data generation and conversion operations required usually stretches the limits of conventional technologies, adding further complexity and cost to the test process.

Self-contained on-chip automatic test structures, often referred to as Built-In Self-Test (BIST) devices, have been developed as a response to the increased level of integration found in the present day’s ICs. Implementing test functionality on the die, adjacent to the Circuit Under Test (CUT), provides for a methodology by which embedded devices may be accessed. The proximity of the BIST structures to the CUT also alleviates some of the adverse effects of connecting intermediate nodes to I/O pads through long interconnects. BIST structures are still intrusive nonetheless, but given their proximity to the CUT and the fact that test signals do not have to cross the chip boundary, signal integrity may be maintained. A balance though must be achieved between providing a certain degree of testability, and maintaining circuit performance.

Economic incentives also exist for developing on-chip hardware based BIST structures, since ATE systems may be simplified and test times may be reduced. This may be accomplished by shifting the emphasis from designing better and more complex ATE systems, to designing chips that are geared towards testability and hardware based self-test. In the realm of digital test, BIST structures are used extensively in commercial production testing to the advent of providing a high fault coverage and fast test times. Given the fact that testing is the most costly stage in the production of an IC and doubles as a quality assurance measure, a strong incentive exists to develop test strategies that are accurate and efficient. The development of BIST structures can help achieve these goals through maintaining better signal integrity, and providing fast hardware based test strategies. The added cost of investing silicon area to BIST structures, is usually offset by the gains in quality assurance.

The field of digital BIST is well developed, due to the relatively simplistic nature of digital signals and the nature in which digital circuits are characterized. The majority of
digital circuits provide deterministic outputs that can be easily quantified in terms of whether or not they meet a set of pre-defined specifications in terms of functionality. On-chip structures that facilitate test-vector injection, data processing, and even provide for "GO/NO GO" testing are commercially viable solutions that are in existence today. Even simplified parametric tests for digital circuits, such as $I_{DDQ}$ testing [2], are used in production testing. Quantifying mixed-signal circuits though, or high-speed digital circuits that are subject to the same mechanisms as analog circuits, in terms of whether or not they meet certain operational specifications, is altogether a different matter.

Characterizing a mixed-signal block in terms of functionality provides for a vague perspective of the performance of a device, due to the ambiguous nature of analog signals. Analog circuit specifications are usually defined in terms of performance metrics such as Signal-to-Noise ratio (SNR), Total Harmonic Distortion (THD), bandwidth, gain, and jitter. Traditionally, these performance metrics have been extracted through the ATE mechanism presented in Figure 1.2. The core of a typical ATE system comprises of an Arbitrary Waveform Generator (AWG) consisting of a D/A, and a waveform digitizer consisting of an A/D. Both are interfaced to a DSP / Memory block, that serves the purpose of data processing and storage. A central clock source synchronizes all the components, ensuring that coherency may be attained. The activities of the ATE's components are coordinated through the use of a Controller, that usually takes the form of a host computer.

Figure 1.2 Basic Components of a Mixed-Signal ATE
The D/A is used to stimulate the CUT, while the A/D is used to extract the desired waveforms. Captured waveforms are then processed by the DSP to obtain the required performance metrics. If a BIST system is to be developed for a mixed-signal system, then it must contain the basic elements and operations described above, such that the desired performance metrics be extracted.

A first generation of on-chip mixed-signal testers, consisting of an AWG and digitizer, has been developed as a response to the need for a high performance affordable on-chip solution [3]. The on-chip test core consists mostly of scalable digital gates, making it a solution that is portable over several processes and platforms. In addition, the functionality of the waveform digitizer can be extended to the capture of high frequency signals through the use of external sampling control. The aim of this thesis is to investigate on-chip techniques that will extend the operation of the mixed-signal test-core to the capture of high bandwidth signals, whereby effective sampling resolutions in the sub-nanosecond range may be achieved.

1.2 - Thesis Outline

Chapter 2 of this thesis provides for a brief introduction to the test-core and each of its components. The algorithms and circuits pertaining to the AWG and digitizer are presented, and their shortcomings are revealed. A brief discussion of jitter measurement, which is a performance metric of prime importance and is tackled in this thesis, is also presented.

Chapter 3 presents an undersampling algorithm that extends the operation of the test-core to the capture of periodic signals, with bandwidths greater than the clock speed of the system. The hardware implementation of the undersampling algorithm takes the form of a timing module comprised of a Delay-Locked-Loop (DLL) with tap selection circuitry. This unit is capable of providing sampling resolutions in the sub-nanosecond range. A detailed discussion of the design of the timing module and the issues pertaining to its implementation are presented. Improvements to the waveform digitizer are also discussed. The chapter concludes with experimental results obtained from a CMOS
implementation of the improved digitizer and timing module, and their combined operation.

Even though the techniques and circuits presented in Chapter 3 provide for sub-nanosecond sampling, the methods utilized are limited to providing sampling resolutions that theoretically will not exceed the intrinsic gate delay of the technology the timing module is implemented in. These sampling resolutions may be useful for the capture of high-speed AC waveforms, but do not provide for the sampling rates required for the characterization of jitter in high-speed digital signals, which is a parameter of prime importance in many SOC devices. As such, a jitter measurement device designed specifically for this purpose and provides a factor of 10 improvement in resolution over the techniques presented in Chapter 3, is presented in Chapter 4. This device is based on a Vernier Delay Line (VDL) Time-to-Digital Convertor (TDC). Experimental results from a CMOS implementation of the jitter measurement device are presented at the end of the chapter. Chapter 5 summarizes the findings of this thesis, and presents some concluding remarks.
Chapter 2 - Background

2.1 - On-Chip Waveform Capture

2.1.1 - An Integrated Mixed-Signal Test Core

As a response to the issues discussed in Chapter 1 relating to the test of the modern mixed-signal integrated circuit, a new generation of on-chip waveform extraction methods have been developed [4 - 7]. The majority of these techniques employ simple algorithms and circuits that provide for a methodology of extracting intermediate node waveforms. A generic system level schematic relaying the configuration of most of these samplers is presented in Figure 2.1. These devices are usually placed near the node of interest to alleviate CUT loading, but still provide an analog output for post-processing. As such, capturing the final required waveform requires the use of a high quality I/O interface for signal transmission, and an oscilloscope or A/D converter for final signal capture. Subject to these constraints, the integrity of the final captured waveform can still be compromised.

The concept of digitizing signals on-chip has been previously proposed [8]. Performing an analog-to-digital conversion operation directly on-chip implies that signal integrity can be easily maintained across the chip boundary, the constraints on test-point I/O interfaces may be relaxed, and a digital data capture device is all that is required to extract the necessary information. The disadvantages of previous implementations of this concept is that the digitizers were not autonomous and still required the use of high
precision external AC and DC waveform generators to perform the A/D operation and stimulate the CUT.

Recent advances in on-chip mixed-signal testing have provided for an autonomous solution that has a completely digital interface, and allows for waveform extraction and CUT stimulation [3]. Figure 2.2 displays a functional diagram of the components of this on-chip test-core.

The arbitrary waveform generator (AWG) consists of a shift register that cycles a \( \Delta \Sigma \) bitstream representing an encoded signal. This bitstream is filtered to generate arbitrary waveforms to stimulate the circuit under test (CUT). In the same manner, the DC
reference generator consists of a shift register that cycles either a PWM or PDM bitstream that represents an encoded DC value, and is also reconstructed with a filter. The digitizer is comprised of a comparator with a track and hold (T/H) circuit at each input terminal, and a digitally programmable DC reference generator. These three units combined provide for the basic components of a commercial tester. Given the fact that the majority of the circuitry is digital, this core can be integrated on chip and used to test embedded circuits, using a completely digital I/O interface, with a minimal area penalty. A brief description of the signal generation and capture methods utilized in this test core are presented in the following two sub-sections.

2.1.2 - Signal Generation

Delta-Sigma based signal generators are based on the principle of reconstructing an analog waveform from a digital stream that is representative of the desired signal [9]. This digital stream is not treated in the traditional manner, whereby it is usually interpreted as an encoded digital word, but is interpreted as being an analog waveform consisting of the signal in question, and the superfluous noise-shaped quantization error.

Figure 2.3(a) represents the power spectral density (PSD) of a ΔΣ bitstream representing a single tone, as generated by a low-pass modulator. The time domain signal is represented in the inset. The analog waveform may be reconstructed by filtering the digital bitstream with a filter possessing the frequency response depicted in Figure 2.3(b). In the case presented, the tone of interest lies in a low frequency region, and the quantization noise at higher frequencies. As such, a lowpass filter would attenuate the high frequency noise components, allowing the tone of interest to be the dominant component in the signal’s composition. Figure 2.3(c) presents the spectrum of the final reconstructed waveform. The associated time domain signal is presented in the inset.
One of the components of the test-core’s AWG is a shift register that cycles a bitstream representing an encoded version of the desired signal. The bitstream is generated by a software model, and then loaded onto the on-chip shift register. Generating the bitstream in advance through a software based approach allows for the use of high order modulators that can produce high resolution signals. The software-based generation approach also allows for greater programmability, and a savings in area and in cost, as opposed to having a ΔΣ modulator on-chip. The partitioning of the test-core AWG components is presented in Figure 2.4.

The reconstruction of the analog waveform from the bitstream is accomplished by an on-chip filter. This filter may take the form of a continuous or discrete implementation, and may be passive or active. The choice of genre is driven mainly by the specifications of the required performance of the AWG, since the filter will greatly determine the quality of the final analog waveform. This technique has been successfully used for off-chip signal generation [10].
2.1.3 - Multipass Signal Capture

The test-core also includes an on-chip waveform digitizer whose purpose is to extract the output of the CUT, in a representative digital form. Its operation is restricted to the capture of periodic waveforms, due to the fact that it employs a multi-pass capture algorithm [11]. A graphical representation of this algorithm is presented in Figure 2.5.

The digitizer is comprised of a single comparator that receives the desired signal at its positive terminal, and a DC reference voltage at its negative terminal. One pass of the input waveform is sampled using the comparator, with the DC reference set at a fixed value. Several sample points are acquired per Unit Test Period (UTP), since the signal is sampled at a rate higher than its Nyquist frequency. The comparator produces a digital sequence that indicates for each sample point, whether the waveform is higher or lower than the reference DC value. On the next pass of the input waveform, the DC level is incremented, and the waveform is resampled at the same sample instants as the previous pass. The DC reference is subsequently incremented for every pass of the UTP until complete coverage of the voltage range of the input waveform is achieved. Through post processing, the digital output sequences of the comparator provide for a thermometer code representation of the input waveform.

A top-level schematic of the waveform digitizer is presented in Figure 2.6. The programmable DC generator is similar in construct to the AWG. It consists of a shift register that cycles either a PDM or PWM (Pulse Width Modulated) waveform that represents an encoded DC value. The generator’s programmability is achieved through the fact that the DC level bitstreams are also developed using software techniques, and then loaded onto the shift register when required. Reconstruction is accomplished using an RC filter. The T/H circuitry at the comparator’s input nodes serves two purposes: to isolate the input signal from the loading associated with the comparator, and to ensure that both the DC reference and input signal undergo the same non-idealities before comparison. A more detailed description of the waveform digitizer is provided in Chapter 3. Note that the clocks for both the AWG and digitizer are derived from the same source, implying that the system is inherently coherent.
2.1.4 - Test-Core Limitations

The system described above has several fundamental limitations, based on the methods and algorithms it employs. Given that the AWG employs a ΔΣ based signal generation technique, it would be difficult to create high frequency or broadband signals due to the oversampled nature of the bitstreams. For high frequency signals, the clock speed of the AWG may have to extend into the range of several GHz to create a high resolution signal in the MHz range. Achieving these high clock rates may be infeasible,
given the maximum attainable operating speed of digital circuits in most standard processes in existence to date.

Another issue that renders the ΔΣ bitstream based signal generation technique difficult to employ for high frequency applications, is the need for high frequency filtering techniques. High-frequency integrated reconstruction filters that provide a reasonable resolution are difficult to fabricate in a reliable, predictable fashion. As such, the quality of the final reconstructed analog waveform is difficult to guarantee, given the dependency of the final waveform on the reconstruction filter. The bandwidth and attenuation that would be required of these filters would also be difficult to achieve in an integrated form.

The front end bandwidth of the digitizer is high enough to allow for the capture of high frequency signals. Conventional sampling algorithms, such as Nyquist rate based methods, do not take advantage of this available bandwidth. The system is versatile enough though, such that it can be used with an undersampling algorithm to capture signals that extend well beyond the sampling frequency.

Chapter 3 of this thesis provides the details of this undersampling algorithm and the additional circuitry required to implement it.

2.2 - Jitter Measurement

2.2.1 - Challenges in Jitter Measurement

The test-core described above, in its present state, does not provide the necessary characteristics or the required sampling resolution required to perform jitter tests for high speed signals. Even with the use of the undersampling circuitry to be described in Chapter 3, the test core is still incapable of characterizing jitter in gigabit rate signals. Given that jitter is a parameter of prime importance in the characterization of many modern ICs, a natural extension of the test-core would be to provide it with high resolution jitter measurement capabilities. In this thesis, an alternative on-chip jitter measurement device that addresses some of the challenges pertaining to testing jitter will be introduced. The following sections will present common jitter measurement techniques presently in use
and compares their effectiveness. The proposed on-chip jitter measurement solution will then be briefly introduced.

Jitter is defined as the variation in a clock or data edge, relative to its expected position in time. This expected position in time can be with respect to a jitter-free synchronization signal (accumulative jitter measurement), or with respect to another signal that may also express jitter (relative jitter measurement), or with respect to the signal of interest’s previous position (period jitter measurement). Quantifying this variation on a signal edge has become increasingly important due to the ever increasing operating speed of integrated circuits. Communication ICs, hard drive read channel ICs, and CPUs are just some of the numerous products that operate in the GHz range at the present date. Due to this increased operating speed, clock periods are well below a nanosecond, implying that much tighter timing budgets are enforced. Peak-to-peak jitter specifications of less than a few tens of picoseconds are required to ensure the proper operation of most of these devices.

Jitter measurement tests are some of the most time-consuming and expensive tests that high-speed data communication devices must undergo. The time and cost constraints are interdependent, since greater test time translates into a greater test cost. The reason these tests are so time consuming and difficult, is due to the random nature of jitter, and the stringent specifications required in devices with tight timing budgets. The sources of jitter in a circuit are numerous, ranging from power supply noise and random thermal noise generated by the circuit components themselves, to pattern dependant jitter [12] that is a function of the input sequence and the bandwidth of the device in question. As such, long data sequences must be used in order to capture the effects of the random noise and pattern dependency. Models for estimating jitter have been developed [13], but fall short of predicting the performance of a device in all its post-production states, rendering their usage limited in the realm of production testing. In order to guarantee that a device meets a certain set of specifications, at least in terms of jitter, the device must be tested at-speed and for an adequate duration of time.
Besides the aforementioned test time constraints that render jitter testing an economic hindrance, designing jitter measurement devices provides for other challenges. The jitter measurement equipment must have a wide bandwidth in order to be able to deal with the incoming data rate. Also, given that the duration of bit period is in the sub-nanosecond range, the resolution of the jitter measurement device must sometimes extend into the sub-picosecond range. These constraints make for costly and complicated equipment that render jitter testing a costly stage of IC production.

Recent work [14] exploring the requirements of future test equipment for the characterization of high speed communication devices, suggests that the major issues that face jitter testing include:

i) **Jitter measurement noise floor**: The resolution of most modern jitter measurement systems, and the error associated with them, is usually inadequate for the accurate measurement of jitter in high speed signals. Techniques and devices that induce a negligible measurement error and have a high resolution, or have a “low noise jitter noise floor” must be developed.

ii) **Jitter injection capability**: In order to extract certain parameters such as jitter transfer functions and perform Bit Error Rate (BER) tests, testers must possess the capability of producing signals with a predictable, controllable, deterministic jitter. Few testers at the present time possess this capability.

iii) **Low jitter clock source**: Most jitter tests ideally require a jitter free reference signal. This signal is often used as an input to a CUT to determine the jitter it induces, or is used as a trigger signal for other devices such as oscilloscopes. Most ATE systems do not have clock sources with jitter low enough to adequately characterize devices that operate at gigabit rates.

iv) **Test time / throughput**: As previously mentioned, jitter tests are usually the most time consuming tests, rendering them also the most costly. Test strategies that extract jitter information in a minimal amount of time must be developed to render jitter tests more suitable for production testing.
Asynchronous testing: Testing asynchronous devices requires the use of clocks that have different domains. Most testers have only one clock domain.

2.2.2 - Jitter Measurement Techniques

Table 2.1 presents a brief summary of jitter measurement techniques presently in use and summarized in [14 - 16].

<table>
<thead>
<tr>
<th>Measurement Method</th>
<th>Accuracy</th>
<th>Bandwidth</th>
<th>Throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>Spectrum Analyzer</td>
<td>Moderate</td>
<td>High</td>
<td>Low</td>
</tr>
<tr>
<td>Counter</td>
<td>High</td>
<td>High</td>
<td>Moderate</td>
</tr>
<tr>
<td>Timing Interval Analyzer</td>
<td>Low</td>
<td>Low</td>
<td>High</td>
</tr>
<tr>
<td>Oscilloscope</td>
<td>High</td>
<td>High</td>
<td>Low</td>
</tr>
<tr>
<td>Undersampling</td>
<td>High</td>
<td>High</td>
<td>Moderate</td>
</tr>
</tbody>
</table>

The Spectrum Analyzer and Oscilloscope based measurement techniques provide for a high bandwidth but low throughput. The slow speed of the test is due mainly to the capture methods of both these devices. The spectrum analyzer examines a region of the signal’s spectrum, which requires some degree of averaging to attain, to arrive at a phase noise metric. This noise metric is referenced to the actual signal power. The disadvantage of this approach is that if the signal of interest is distorted, the phase noise and distortion tones become indistinguishable. The Spectrum Analyzer method provides a low throughput given that time is required to arrive at the spectrum of the signal of interest. The accuracy of this technique is limited due to various factors such as frequency resolution, and the previously discussed issue relating to distinguishing noise and signal components.

The Oscilloscope based jitter measurement technique also provides for a low throughput but a much higher accuracy than the Spectrum Analyzer method. The input waveform is captured in realtime or through the use of an undersampling algorithm, and a jitter histogram may be obtained by overlaying several passes of an edge with an appropriate trigger signal. To capture enough points to form a jitter histogram using these
oscilloscopes, several seconds, or in some cases minutes of test time are required. Since a time domain representation of the waveform may be captured at a high sampling rate, relatively accurate results may be obtained.

The undersampling algorithms used in many modern digital sampling oscilloscopes may also be used to measure jitter with a commercial tester [16,17]. The sampling can be performed with a tester's digital pin or a comparator, in combination with a specialized clocking scheme, such as that presented in Chapter 3, that provides for a high effective sampling rate. The speed of this genre of test is still moderate, due to the fact that several passes of the input waveform are required before the jitter can be fully characterized.

Other techniques for measuring jitter involve directly trying to measure the variation of an edge over time with respect to a voltage threshold crossing, or with respect to the edge's previous position. Time Interval Analyzers (TIAs) measure the time between successive occurrences of an edge through a reference voltage level. No edges are skipped, and as such, the jitter can be completely characterized for only one run of the UTP. This implies that the test time is limited only by the test input signal. The disadvantage of this technique is that the required circuitry does not have an adequate resolution or operating speed to characterize signals in the GHz range.

Counter based measurements, or start / stop time measurements directly measure the time between successive occurrences of an edge. The first occurrence of an edge usually initiates a counter, and the second edge halts it. Accurate results can be obtained with this method, especially if time interpolation is used to arrive at resolutions greater than that of the counter's clock period. The disadvantage of this technique is that there is usually a large deadtime between successive measurements, compromising the throughput of the system.

2.2.3 - Proposed On-Chip Jitter Measurement Solution

Given the aforementioned discussion pertaining to the advantages of performing signal capture on chip, and the fact that the test-core is an on-chip solution, by extension,
the proposed jitter measurement device is also an on-chip solution. On-chip jitter measurement is a relatively new concept, with little work existing in the field to date. A BIST approach for PLL jitter measurement has been previously described in [18]. This approach utilizes methods similar to the undersampling technique described above, and as such does not have an optimum test time. The resolution of this device is also limited to the intrinsic gate delay of the technology it is implemented in. Experimental results in [18], obtained using a modified proprietary technique, present resolutions that are 20% of an intrinsic gate delay. In this thesis, an on-chip jitter measurement device that also provides for resolutions that are a fraction of a gate delay and provide for an optimum test time will be presented.

This device is based on a Vernier Delay Line Time-to-Digital Convertor, and represents an amalgamation of some of the approaches used by the oscilloscope, TIA, and counter based jitter measurement techniques. The device is triggered by an input reference signal, and samples an input digital bit with a very fine sampling resolution. This sampling operation is also inherently a time-to-digital conversion operation whereby the time interval between the trigger signal and the bit in question is measured and converted to a digital word. On-chip counters measure the variation in the position of the bit in question and produce an on-chip jitter Cumulative Distribution Function (CDF). From the jitter CDF, peak-to-peak and RMS jitter metrics may be extracted.

Given that input data stream is sampled continuously and every edge is captured, the device has a test time equivalent to that of a single UTP (with the exception of the time required to extract information from the on-chip counters and perform post-processing). The device also has a high bandwidth, and has an automatically tunable resolution that extends well into the sub-nanosecond range. Details pertaining to the design and operation of this jitter measurement device are presented in Chapter 4.

2.3 - Summary

The preceding discussion has presented some of the on-chip signal extraction and jitter measurement techniques in existence to date. The majority of these techniques are
limited either in terms of their maximum effective sampling rate, their test time, or their ability to maintain signal integrity. The purpose of this thesis is two fold: i) To present an extension to the aforementioned mixed-signal test core that would provide it with the capability to perform sub-nanosecond resolution sampling, and ii) To present a sub-gate delay accuracy jitter measurement device with an optimal test time. The body of work presented in this thesis is summarized in [49-51].
Chapter 3 - Signal Capture

3.1 - On-Chip Waveform Capture

3.1.1 - Undersampling Algorithms

The integrated mixed-signal test core described in Chapter 2 relies on over-sampling techniques in order to digitize signals. This dependency implies that the bandwidth of the input signal is constrained by the amount of over-sampling required, and the maximum clocking speed of the slowest sub-system. The system as a whole though, is versatile enough such that under-sampling techniques may be employed to attain high equivalent sampling rates while clocking it at speeds at par to, or even much lower than the incoming data rate.

Two under-sampling techniques that prevail in the literature were considered. The first method [17,19], involves sampling an input signal of period $T$, with a clock of period $T + \Delta t$, where $\Delta t << T$. Sampling the input signal with such a clock ensures that one point per input signal period is sampled, and that the sample point moves $\Delta t$ seconds relative to the previous sampling instance on the next run of the input signal. The timing resolution of the sampling operation is determined by the quantity $\Delta t$, and provides an equivalent sampling frequency of $1/\Delta t$. This method requires a high accuracy frequency synthesizer, capable of producing frequencies that are fractionally slower than one Unit Test Period (UTP).
The second technique [16] involves using a sampling clock running at an arbitrary speed that can be delayed by an incremental phase shift $\Delta t_{\text{shift}}$. This clock is used with a zero phase delay to sample the first UTP, and is then incremented by $\Delta t_{\text{shift}}$ on each subsequent run of the UTP until the sampling clock has been delayed by the equivalent of $T_{\text{samp}} - \Delta t_{\text{shift}}$, where $T_{\text{samp}} = 1 / \text{sample clock frequency}$. A total phase shift of one period ensures that complete coverage of the input waveform is obtained, with a timing resolution of $\Delta t_{\text{shift}}$ seconds, or an equivalent sampling rate of

$$F_{\text{SAMP\_EQUIV}} = \frac{1}{\Delta t_{\text{shift}}} \quad (3.1)$$

This method requires the use of a timing block such as a delay-locked-loop (DLL) that can produce accurate phase delays. Figure 3.1 provides a graphical representation of this undersampling algorithm.

The undersampling algorithm involving the use of incremental delays was deemed to be the preferred method due to the ease of integration of the required circuitry. Given a

![Graphical representation of undersampling algorithm.](image-url)
particular application, the equivalent sampling rate required may be in the range of several GHz. This implies that $\Delta t$ and $\Delta t_{\text{shift}}$ will be in the sub-nanosecond range. Given the present day’s CMOS technologies, producing a propagation delay in the sub-nanosecond range for a delay element used in a DLL is a trivial task [20]. To achieve the same effective sampling rate with two periods slightly offset from each other though is more difficult, requiring a fractional-PLL capable of providing a frequency resolution in the hertz or sub-hertz range. The inherent stability of most DLL architectures also implies that they are less complex and easier to design than fractional PLLs. DLLs also provide better phase jitter performance, and as such have been traditionally used in ATE systems to implement timing verniers [21]. Note though that in both the aforementioned undersampling methods, multiple passes of the input waveform are required to perform a complete capture, restricting the use of the test-core to the capture of periodic signals. This restriction though does not further limit the test-core from its previous modes of operation, since it was initially restricted to the capture of periodic signals.

Note though that the proposed algorithm also suffers from the conventional timing issues that affect the performance of any data conversion device. Jitter can have a profound affect on deteriorating the quality of the final captured waveforms [22,23], particularly for high-frequency signals. Thus a low jitter DLL must be used in order to ensure an adequate resolution for the conversion operation. Given that the phase delayed clocks will also be generated on-chip, the uniformity of the phase offsets cannot be guaranteed due to die and process variations. The effects of these non-uniformities are akin to those of jitter, but the source is different. Sampling errors due to consistent, measurable phase offsets can be compensated for though, and a method of calibrating out these non-uniformities will be discussed in a later section dealing with the design of the DLL.

Some of the advantageous repercussions of employing the undersampling algorithm, include the fact that the frequency of the input signal can now extend well beyond the clock speed of the system, and the limiting factor becomes the front-end bandwidth of the sample and hold stage, as opposed to the maximum clocking speed of
some of the subsystems. The design specifications of the test-core digitizer can also be relaxed, since a low speed clock may be used with the system while still maintaining a high effective sampling rate by having a small value for $\Delta t_{\text{shift}}$.

Variations on this undersampling technique have been previously used to sample high speed waveforms on-chip [4 - 7]. The proposed system has an added advantage over these methods, due to the fact that it contains a simple digitization method that produces a digital output that represents the sampled waveform. As such, the integrity of the signal in question is maintained, and a digital ATE setup may be used to retrieve information from the test-core. The majority of the existing analog waveform extraction techniques provide for analog outputs that still need to be converted into a digital form for post-processing through the use of costly ATE equipment. These techniques also require the use of high quality I/O interfaces to ensure the integrity of the captured waveform.

Given the ability to capture high-frequency and broadband signals, the use of the test-core may be extended to applications such as Asymmetric Digital Subscriber Lines (ADSL), Serializers/Deserializers (SerDes), and serial wireline communication characterization, diagnostics, and testing - all applications that require high-bandwidth test equipment. The breadth of use of this system may be attributed mainly to the fact that it functions as an A/D that is restricted to the capture of periodic signals. Once a signal is captured and stored in a digital form on ATE memory, any quantitative or qualitative metric may be extracted from it through post-processing, allowing the test-core to be application independent. The emphasis on the design of test structures then shifts to the development of higher quality on-chip digitizers, and faster CPUs to perform the post processing. These two measures will help ensure the quality of the obtained results, and the speed at which the tests are performed.

Compared to conventional methods of device testing, the proposed on-chip solution provides for advantages in the realm of maintaining the quality of information extracted from the CUT. Several issues though such as throughput, and calibration, compromise its effectiveness as a production tester and render it more suitable as a diagnostic tool, due to the interrelated issues of test time and cost. In terms of throughput,
given that multiple passes of the DC reference voltage and several passes with different phase delayed clocks are required to perform a complete capture, the test time is increased dramatically from that of an external high-speed A/D. For example, if a k-bit, capture of a waveform with \( N \) evenly spaced points on a signal of length \( \text{UTP} \) is desired, then the test time required to perform the capture with the test core would be

\[
\sigma_1 = \begin{cases} 
\frac{2^k N}{F_{ST}} & \text{UTP} \leq \frac{1}{F_{ST}} \\
\frac{N}{\text{floor}(\text{UTP} \cdot F_{ST})} \text{UTP} & \text{UTP} > \frac{1}{F_{ST}}
\end{cases}
\]  

(3.2)

where \( \sigma_1 \) is the test time in seconds, and \( F_{ST} \) is the clock frequency of the test-core. This assumes that the test core is capable of achieving the \( k \)-bit resolution. Note that in the first scenario presented, the length of a UTP is less than or equal to that of the test core’s sampling clock, and as such one point per period is extracted, and the sampling clock must be delayed \( N \) times to capture the required number of points. In the second possible case, the test core sampling clock is faster than that of one UTP, and as such several points per delayed clock may be captured, and less delayed clocks (or passes) will be required to perform a complete capture. Note that in both these cases though, the length of time required to perform a capture is greater than a UTP.

On the other hand, if an external \( k \)-bit A/D on an ATE is used to capture the same signal, then the test time would be

\[
\sigma_2 = \frac{N}{F_S} = \frac{N}{N} \text{UTP} = \text{UTP}
\]

(3.3)

since it is assumed that the sample rate of the A/D allows it to capture the desired \( N \) points in one run of the input signal. Note that equations (3.2) and (3.3) represent the absolute minimal attainable test times, and do not take into account device set up times and post-processing.
There is a major advantage in terms of test time for the direct A/D method as opposed to the algorithm utilized by the test-core. The major drawbacks of the conventional data capture technique though become apparent in the case where broadband signals must be captured. Take for example the case where a 25 MHz digital data pattern must be captured, with 64 points per data bit, to perform a mask compliance test. Using a conventional A/D, the sampling rate must be set to 1.6 GHz. Few data converters in existence today operate at these clock rates, and have a front-end bandwidth that is suitable for the capture of broadband signals. Those that do exist are usually manufactured in costly processes [24]. Using the test core though, the 25 MHz data pattern may be captured with a 25 MHz clock that is delayed 64 times by an incremental amount of 625 ps, resulting in an equivalent sampling rate of 1.6 GHz. The latency is greater than that of the conventional A/D, but less complex circuitry is required to provide similar functionality.

Calibration is also an issue that poses a challenge for the incorporation of the test-core as an on-chip production tester. If all chips are to contain their own test-core, then they must first undergo a test / calibration routine to verify it’s functionality. This added time translates into another cost that is not usually required of other techniques. For instance, a production tester need only be calibrated once to test numerous chips, and does not have to recalibrate for each individual chip.

Besides the aforementioned issues pertaining to latency and calibration, the test-core is also simple enough such that is can be implemented in various degrees of integration. It can be entirely implemented on a Device Under Test (DUT) card, and used to test analog components with a digital tester. The components required would be a comparator, two shift registers and two filters (for the ΔΣ based AWG and DC generator), and a clock delay generation chip. The board would only have to be calibrated once, and its cost would be relatively inexpensive when compared to that of a mixed signal tester. The only drawback would be that since the test-core would no longer be on-chip, the issue of signal integrity arises again. Nonetheless, it can still prove to be a viable mixed-signal
test solution. Various techniques that are employed by the test-core are already in use with
digital testers to test analog parts [10].

Other signal extraction methods that do not involve the use of ATE equipment are
usually oscilloscope based, and suffer from an increased output latency and difficulty in
maintaining signal integrity.

The following section will provide a brief description of the on-chip digitizer and
some modifications to its circuitry aimed at improving its dynamic range and front-end
bandwidth. A description of the DLL based timing module implemented to generate the
phase delayed clocks required for implementing the undersampling algorithm then
follows. Experimental results demonstrating the operation of the entire system are also
presented.

3.2 - On-Chip Digitizer

3.2.1 - System Level Architecture

The following section will present a brief description of the circuits used in the
implementation of the on-chip digitizer presented in [3]. The modifications required in
order to extend the operation of the digitizer to the capture of high bandwidth signals
using the undersampling algorithm will also be presented.

Figure 3.2(a) displays a system level schematic of the on-chip digitizer. The
digitizer consists of three components: i) a digitally controlled PDM/PWM based DC
reference generator [25] (not shown), ii) a comparator, iii) two track and holds (T/H)
followed by buffers, each connected to one of the comparator’s input terminals. The
combination of these three components allows for the implementation of the multi-pass
capture algorithm described in Chapter 2 and [11]. The DC voltage that is swept from
GND to VDD is produced by the programmable DC reference generator. The purpose of
the comparator is to arbitrate whether the sampled analog waveform is greater than or less
than the reference DC value. The sampling of the input waveform is performed by the T/H
and buffer connected to the positive terminal of the comparator. The purpose of the buffer
is to isolate the sampled analog waveform from the input nodes of the comparator. Both the input analog waveform and the DC reference are sampled by identical T/H and buffer circuits, to ensure that both voltages undergo the same undesired effects that may change the sampled voltage. Figure 3.2(b) demonstrates this principle.

Most sampling networks employ techniques that attempt to compensate for effects such as charge injection, and voltage droop. These effects cause the sampled analog waveform to be misrepresented from its original value, and as such erroneous results may be obtained when the comparison operation is performed. Instead of compensating for these effects though, the test-core attempts to minimize their influence by ensuring that the DC reference signal also undergoes the same non-ideal effects at the analog input signal. This may be accomplished by sampling the DC reference signal with an identical sampling network. The effectiveness of this technique depends on how well the two sampling networks can be matched. A brief description of the circuits used to implement each of the digitizer’s blocks is presented in the following sections.

3.2.2 - DC Reference Generator

The DC reference generator used in the waveform digitizer is based on the same principles as the on-chip AWG. As indicated in Figure 3.3, a PDM or Pulse Width Modulated (PWM) bitstream representing the desired DC voltage is loaded onto an on-chip shift register. The bitstream is then cycled around the shift register and filtered with a passive RC filter. The filtering action reconstructs the DC level encoded into the

![Figure 3.2 System-Level Architecture of On-Chip Digitizer](image)
bitstream. The choice of PDM or PWM encoding is based on the settling time requirements of the DC source [25].

3.2.3 - Comparator

The comparator used in the digitizer, and is presented in Figure 3.4, is based on a two-stage design with a latch [26]. The first stage is a pre-amplifier that subtracts the two input voltages and provides a slight amplification to their difference. The second stage provides the majority of the required gain through a clocked gain stage.

Figure 3.4 Comparator used in waveform Digitizer
3.2.4 - T/H and Buffer

The track and hold circuit, and the associated buffer are presented in Figure 3.5. The T/H consists of a sampling switch \( S_1 \), composed of a CMOS transmission gate, followed by the parallel sampling capacitor \( C_f \). The purpose of the buffer is to isolate the sampled analog waveform from the loading associated with the input nodes of the comparator which may affect the bandwidth of the T/H stage. Capacitive loading may also cause the sampled analog value to couple through to some internal nodes of the comparator, causing errors in the decision process. The buffer is modelled on a source-follower amplifier configuration. Transistors M2 and M3, in combination with resistor \( R_{bias} \), provide a biasing current, and also serve as an active load for the buffering transistor M1.

The bandwidth of this sampling stage is given by

\[
f_{3dB} = \frac{1}{2\pi R_{on}(C_1 + C_p)}
\]

where \( R_{on} \) represents the on-resistance of switch \( S_1 \), \( C_p \) represents the total parasitic capacitance contributions looking into the gate of M1, and \( f_{3dB} \) represent the 3 dB bandwidth of the circuit. In the first implementation of the test-core prototype [3], \( C_f \) was set to 1 pF, which is significantly larger than the parasitic capacitances associated with M1, rendering their bandwidth limiting effects negligible. The gate-source parasitic of M1 does have an effect though on the accuracy of the overall digitization process. This is due to the fact that transients on the sampling capacitor may couple through this parasitic, which lies on the output of the buffer and is connected to the input terminal of the comparator.

Given the simplicity of the sampler, there are few design variables that have a significant role in determining its performance. The sizing of the transmission gate \( S_1 \) directly affects \( R_{on} \), and thus the bandwidth of the T/H stage. The size of \( C_f \) also directly affects the front-end bandwidth as indicated by (3.4), as well as the resolution of the sampler: the thermal noise contributions of \( S_1 \) that are sampled onto \( C_f \) (commonly
referred to as \(kT/C\) noise) are inversely proportional to the size of the sampling capacitor. The larger the value of \(C_i\), the lower the front-end bandwidth of the sampler will be, but the noise contributions from the sampling switch will be minimized. The converse is true for smaller sizes of capacitance, illustrating the trade-off that exists between front-end bandwidth and resolution in the chosen topology. Given the size of the sampling capacitor in the initial prototype, bandwidth was sacrificed to provide adequate noise performance.

Another issue that limits the performance of the entire conversion operation is the maximum input voltage swing of the combination buffer / comparator. The buffer also performs an inherent level shift operation that prevents input voltages that approach the positive rail from being accurately represented. This misrepresentation can be directly interpreted as a decrease in dynamic range. This limitation is due to the fact that the level shift operation decreases the available headroom, due to the fact that the load transistor \(M2\) will turn off for voltages that approach mid-rail. Rail-to-rail sampling is an essential feature for applications such as the testing of data-communication products, where high-speed digital signals that employ voltage levels that span the entire supply range are utilized. The modified version of the waveform digitizer, presented in the following section, employs an alternate sampling mechanism that attempts to combat the issue of decreased dynamic range.

![Buffer Diagram](image-url)

**Figure 3.5** Track and Hold and Buffer
3.2.5 - Modifications to the Waveform Digitizer

Figure 3.6 presents the modified waveform digitizer architecture. The changes to the sampling network include modifying the architecture from a T/H to a Sample and Hold (S/H). The purpose of the additional sampling capacitor is to hold the sample point provided by the input T/H sampler, and to divide the input voltage before it is transferred to the buffer by a factor equivalent to

\[
\frac{C_1}{C_1 + C_2}
\]

(3.5)
due to charge sharing on the \( \Phi_1 \) clock phase. This voltage division allows the input voltage to be scaled into the range for which the buffer operates. As such, the integrity of the sampled waveform's shape may be maintained over the entire supply voltage range. Note that the sampling networks on both nodes of the comparator in the digitizer are identical (Figure 3.2), and as such both the input analog signal and the DC reference voltage are scaled by identical factors.

The upwards level shift provided by the buffer can be altered by sizing transistors M2 and M3, and selecting an appropriate value for \( R_{bias} \). Although the level shift decreases the dynamic range of the system as previously discussed, it is still required, since the comparator presented in Figure 3.4 also has a limited input voltage range. This input range is restricted on the lower end of the supply range due to the biasing.
requirements of the current mirror in the pre-amplification stage. Thus the purpose of the modified sampler / buffer becomes two-fold: to divide the input voltage by a factor that scales it into the range of the buffer, and to level shift the then scaled voltage into the operating range of the comparator.

Another added advantage of the sample and hold architecture, is that there is further isolation of the input signal from the output of the buffer, due to the additional number of processing stages. This pipelined architecture has the effect of reducing the capacitive coupling between the input to the sampler and the output of the buffer. There is however a half a clock tick latency between when the sampling action is initiated, and when the sample is expressed at the output of the buffer. This delay can easily be accounted for though in post-processing.

The bandwidth of the sampler was also deemed to be an issue. Designing $S_1$ to be large to decrease its on-resistance will limit the resolution of the system due to secondary effects such as charge injection, and distortion. The capacitive loading of a large switch will also limit the front-end bandwidth of the sampler. Making the size of $C_I$ small would also help increase the front-end bandwidth, but the contribution of sampled thermal noise from $S_1$ would compromise the resolution of the system. Making $C_I$ much smaller than $C_2$ would also decrease the voltage scaling factor provided by the S/H, providing for signals levels that may approach that of the noise floor. Given that the purpose of the proposed undersampling algorithm was to demonstrate that the existing test-core prototype could capture high-frequency signals by the addition of a specialized timing module, design trade-offs were chosen in favour of bandwidth as opposed to resolution, in order to help demonstrate the functionality of the proposed extension.

The switches used in the implementation of the digitizer were chosen to be of moderate size in order to provide the necessary drive, while simultaneously having a negligible effect on the bandwidth of the signal path from the CUT to the sampler. The input capacitance of the sampler was estimated to be approximately 50 fF in the sampling phase. The two sampling capacitors where chosen to be on the order of 300 fF, resulting in a simulated input bandwidth of 3.2 GHz. Previous literature reports the use of other
parallel samplers that successfully utilize similar [5] or smaller [19] values of capacitance. The \( kT/C \) noise contribution of the 300 fF sampling capacitors in the 3.3 V supply of the technology the prototype is implemented in, does not have a marked effect on the target 8-10 bit resolution of the system. Note that by having \( C_1 \) and \( C_2 \) set to equal values, the sampled voltage is scaled by a factor of a half before the level shift / buffering / comparison operation.

Other limits on the performance of the proposed sampler include its susceptibility to charge injection errors when the sampled analog value and the DC reference are similar in value. The proposed architecture employs no charge injection cancellation scheme, rendering the system more vulnerable to charge injection errors in certain situations. Assuming that both sampling / buffering networks on either node of the comparator are matched, charge injection errors in the case when the DC reference and the sampled analog value are dissimilar, have little effect on the comparator’s decision process. This is due to the fact that the charge injection errors will be of identical polarity, and proportional to the their associated voltages, and as such the decision process will be relatively unaffected. In the case where the DC reference and the sampled analog value are similar in magnitude, their associated charge injection errors will also be similar in magnitude and identical in polarity, rendering the comparison process more susceptible to noise. Unequal impedances on the input nodes of the sampler (that of the DC source and that of the CUT) will also cause mismatches in the magnitude of the charge injection error introduced to either node.

Note that for the principle demonstrated in Figure 3.2(b) to hold, both sampling networks must be matched to ensure that the errors introduced by each one are identical. Mismatches in the switches or sampling capacitors would further introduce errors, due to the fact that the DC reference and input analog waveform will not experience the same non-linear effects.

Even though the simplicity of the proposed sampler renders it ineffective for high-resolution frequency domain testing applications (e.g. SNR, THD measurements), it still
suffices for applications such as mask testing and curve tracing, where a medium resolution and high sampling speeds are required.

3.3 - Timing Module Design

3.3.1 - DLL Architecture

As previously mentioned, a DLL was deemed to be a suitable candidate for the timing module, given its ease of implementation, tunability, and jitter rejection characteristics. Each of these qualities is desired in the implementation of such a unit to ensure the programmability of the sampling resolution, the accuracy of the obtained results, and to facilitate integration with the test core. A system level schematic of the DLL based timing module is presented in Figure 3.7.

The timing module consists of five main components: i) A voltage controlled delay line (VCDL), ii) a phase detector, iii) a charge pump, iv) a loop filter, and v) a phase selection block. The VCDL consists of a series of identical digital buffers with a variable delay that is controlled by a control voltage.

![Figure 3.7 DLL Architecture](image)
The function of this delay line is to delay an input clock by an amount, $T_{\text{DELAY}}$, which is equivalent to

$$T_{\text{DELAY}} = \tau_{\text{buff}}(V_{\text{ctrl}}) \cdot N \quad (3.6)$$

where $\tau_{\text{buff}}(V_{\text{ctrl}})$ represents a single buffer delay as a function of its control voltage, and $N$ represents the number of buffers in the VCDL. The input to the delay line and its output are routed to the phase detector that arbitrates the arrival times of the $C_{\text{in}}$ and $C_{\text{out}}$ clocks, and produces two signals, UP and DN, that indicate the relation in phase between these two clocks. The UP and DN signals control a charge pump which either pushes current, or pulls it off an integrating capacitor that is designated as being the loop filter. The purpose of this filter is produce a DC voltage that is proportional to the phase error between the $C_{\text{in}}$ and $C_{\text{out}}$ clocks. This DC voltage, denoted as $V_{\text{ctrl}}$, adjusts the delay of the VCDL, and the phase error between $C_{\text{in}}$ and $C_{\text{out}}$ is re-evaluated. The feedback loop continuously adjusts the value of $V_{\text{ctrl}}$, until the phase error between the two clocks has been minimized. The phase error will be at its minimum value when the delay across the delay line is equal to one period of the input clock, $T_{\text{in}}$, implying that the $C_{\text{in}}$ and $C_{\text{out}}$ clock appear to be in phase. Once this condition has been met, the delay per buffer will be

$$\tau_{\text{buff}}(V_{\text{ctrl}}) = \frac{T_{\text{DELAY}}}{N} = \frac{T_{\text{in}}}{N} \quad (3.7)$$

Note that the phase detector may also be designed to stabilize the control loop when the phase shift between the $C_{\text{in}}$ and $C_{\text{out}}$ clocks is $180^\circ$, in which case the delay per buffer will be half of what (3.7) predicts. The two buffers depicted in Figure 3.7 that produce the $C_{\text{in}}$ and $C_{\text{out}}$ clocks add an additional delay that does not affect the accuracy of the phase error measurement, since it is assumed that both of these buffers are identical and delay both clocks equally.

The final component of the timing module is the Phase Selection block. The purpose of this block is to tap the outputs of each buffer in the VCDL, and route the appropriate phase-delayed clock signal to the digitizer. This unit can consist of a series of multiplexers [27], or a series of phase interpolators that can further subdivide the phases.
from the VCDL and produce higher effective sampling rates. Note that for the architecture presented in Figure 3.7, the first tap output will be delayed by $\Delta t_{\text{shift}}$, implying that the sampling clock must eventually be delayed by $T$, as opposed to $T - \Delta t_{\text{shift}}$. In addition, the phase selection block adds a latency to the final output clock that is routed to the digitizer, implying that the captured sample points will be offset from the edge of the main clock. For the purposes of sampling a periodic input waveform, this latency is not problematic, since as long as the sampling clock is eventually delayed by $T$, the entire waveform will be captured. For some applications though, this latency may be problematic, but may be eliminated by employing the architecture suggested in Figure 3.8.

In this modified architecture, an additional VCDL (denoted as VCDL_{CLK}) is placed in the path of the input clock. The purpose of this additional delay is to offset the latency caused by the phase selection block, through an additional tuning mechanism. The output of the phase selection block for the first buffer in the timing module VCDL is routed to a phase detector/charge pump unit, which compares its phase to that of the input clock. If the two clocks are not aligned in phase, the additional control loop will tune the delay across VCDL_{CLK} to $T_{in} - T_{\text{LATENCY}}$, where $T_{\text{LATENCY}}$ represents the phase offset caused by the timing module for the first tap output. In such a manner the output of the first tap in the phase selection block, will be aligned with the input clock.

The forthcoming sections will provide a detailed circuit level description of the components required to implement the timing module.

![Figure 3.8 Modified Timing Module Architecture with no Output Phase Latency](image-url)
3.3.2 - Self-Biased DLLs

As previously mentioned, designing a DLL that produces a low-jitter clock is essential, such that the accuracy of the overall digitization process is not compromised. A self-biased architecture [29] was deemed to be a suitable implementation, since it provides for high supply / substrate noise immunity, two factors that greatly affect the jitter performance of the DLL.

In a self-biased design, all the bias voltages that are required by its constituent circuits to operate, are derived from each other, eliminating the need for specialized precision current and voltage references. In the case of a self-biased DLL, all the necessary bias voltages are derived from the loop control voltage, \( V_{ctrl} \). The advantages of employing such a biasing technique pertain to the fact that the performance of the DLL becomes less dependant on supply voltage. DLLs are particularly sensitive to supply voltage fluctuations, given the fact that they affect the delay of the buffers in the VCDL. The self-biased DLL presented in [30] also has an added advantage, whereby the mechanism that produces the bias voltages can also compensate for high frequency noise generated by the operation of the DLL itself. This added functionality allows the designer to set the loop bandwidth of the DLL as low as possible to reject phase noise from the input clock, yet at the same time compensate for the noise of the DLL.

In most other DLL topologies, there exists a trade off between rejecting phase noise generated from the input clock, and minimizing the effects of substrate and supply noise generated by the operation of the DLL. To reject input clock phase noise, the bandwidth of the DLL must be low. Having a low bandwidth, ensures that the reaction time of the loop will be slow, and as such high frequency phase variations on the input clock are not expressed on the output clock. On the other hand, any phase noise generated by the operation of the DLL will be related to the input clock frequency, and as such will have high frequency components. In order for the DLL to track these high frequency noise components and ensure that the input clock is still well represented through the output clock, the loop bandwidth of the DLL must be high such that these noise components may be tracked. Having an independent mechanism though that compensates for this high...
frequency noise, gives designers the added flexibility of designing the loop bandwidth to be as low as possible. An architectural view of the self-biased DLL is presented in Figure 3.9.

The major difference between this architecture and conventional DLLs is that there is an extra block present between the output of the loop filter and the VCDL. This additional block is the “self-biasing” cell that derives two voltages, \( V_p \) and \( V_n \), from the loop filter control voltage \( V_{ctrl} \). The \( V_p \) control voltage controls the delay of a single buffer in the VCDL, whereas the \( V_n \) control voltage provides dynamic biasing for the current in a single delay cell, that compensates for high frequency noise. A more detailed explanation of the components of the self-biased DLL is presented in the following sections.

3.3.3 - The Delay Cell

The fully-differential delay cell used in the VCDL is presented in Figure 3.10. The delay is controlled by altering the resistance of the two diode-connected symmetric loads [31] which are composed of transistors M4 & M5 and M6 & M7. Varying \( V_p \) changes the operating point of the symmetric loads, and thus their equivalent resistance. The lower the value of \( V_p \), the more drive the PMOS transistors in the symmetric loads will have, and thus the resistance will be less, leading to a faster buffer delay. Conversely, the higher the value of \( V_p \), the less drive the PMOS transistors will have, leading to a slower buffer delay.

A characteristic that symmetric loads possess that allow them to have sufficient power supply noise rejection properties is the fact that the resistance of the load, on the basis of a first order approximation, can be assumed to linear. It is desirable to have linear resistive loads in delay cells, because they allow for a first order differential cancellation of noise. The graph presented in Figure 3.11 depicts the current through the symmetric

![Figure 3.9 Self-Biased DLL Architecture](image)
load, $I_{load}$, as a function of the voltage drop across it, $V_{load}$. Note the symmetry in the curve, about $V_{bias}/2$. The resistance of the symmetric load under these bias conditions may be approximated by joining the two extremities of the curve with a straight line. This line represents a first order model of the load’s resistance, which is approximately equivalent to the inverse of the transconductance of one of the PMOS transistors [29]. Thus, the symmetric load can be assumed to provide some degree of power supply noise cancellation through a first order model approximation of the resistance of the load. Note also that the lower swing limit of the load, which is equivalent to the lower swing limit of the buffer, is set by the PMOS control voltage, effectively isolating the signals from the lower supply. This provides for an extra degree of noise immunity, since the signals propagating through the buffer will be relatively unaffected by noise on the lower supply (such as ground bounce). The mechanism that forces the lower swing to the value of $V_p$ is incorporated into the bias cell, and will be discussed in a forthcoming section.

![Figure 3.10 Self-Biased Delay Cell](image)
Transistors M2 and M3 in the delay cell serve as the positive and negative signal input transistors, respectively. Transistor M1 provides the nominal current through the delay cell, which is set by the $V_n$ bias voltage. The current source in the voltage controlled buffer should possess the characteristic of having a high output resistance, such that any voltage fluctuations due to switching signals or noise at the output of this current source do not change the value of the current produced. If this current varies, then the delay of the cell will also vary, and will be expressed on the output clock as jitter. Cascoding techniques are usually used in the implementation of current sources for delay cells, to ensure that they have a high output resistance so that the value of the bias current, and thus the delay of the cell, is relatively unaffected by noise. The delay cell current source though in this case, is represented by a single transistor that alone, does not provide for an output impedance equivalent to that of a cascode configuration. Through the use of the biasing cell though, the value of $V_n$ is dynamically altered to compensate for noise, and provide an output impedance equivalent to that of a cascode configuration [29].

A simplified schematic of the biasing cell that provides the aforementioned noise rejection properties and derives $V_p$ and $V_n$ from $V_{ctrl}$ is presented in Figure 3.12. The bias cell consists of a half-buffer replica, and an op-amp that implements a feedback mechanism. The feedback mechanism forces the output of the half-buffer replica, $V_p$, to be equal to the loop control voltage, $V_{ctrl}$, by programming the value of $V_n$. Note that this half-buffer replica also represents a single stage inverting amplifier, with the input transistor corresponding to M1, and the output to $V_p$. Since the gate of M1 is tied to $VDD$, which is the highest possible input voltage the buffer will experience, then $V_p$ will be programmed to its lowest possible value. Given that the feedback mechanism will force $V_p$
to $V_{ctrl}$, the setup of the half-buffer replica will ensure that the lower swing limit of the cell is set to $V_{ctrl}$. This effectively detaches the buffer signals from the lower supply, ensuring that the linear load approximation previously discussed holds.

The tracking capabilities of the feedback loop also ensure that as the values of $V_p$ and $V_n$ fluctuate due to noise on the supply or substrate noise, they will be forced back to their respective values that are representative of $V_{ctrl}$. This dynamic noise compensation mechanism ensures that the loop control voltage is accurately represented in each of the delay cells. The feedback loop also effectively increases the output resistance of the current source transistor, M2, since the value of $V_n$ will be constantly adjusted to ensure that it produces a constant current. Since the bias cell must be able to track and compensate for supply and substrate variations, which will be correlated to the operating frequency of the DLL, the bandwidth of the bias cell must be set to the operating frequency. Since a mechanism that compensates for high frequency noise in the DLL exists, the loop bandwidth can be set as low as necessary to minimize the effects of input clock phase noise. The high frequency noise related to the operation of the DLL will be compensated for by the bias cell. As such, the half-buffer replica must be closely matched to the delay cells, to ensure the proper programming of $V_p$ and $V_n$ and adequate noise tracking. Note that the op-amp itself is self-biased, and its bias voltage is derived from the $V_n$ control voltage.

A more detailed and complete schematic of the DLL bias cell is presented in Figure 3.13. Note the addition of a buffer and start-up circuitry. The $V_p$ output of the programmed half-buffer replica is not directly routed to the delay cells, to ensure that the loop control voltage is unaffected by switching signals that may couple from the delay cells to the $V_p$ node, and then to $V_{ctrl}$. Instead, a buffered version of $V_p$, denoted as $V_{p\_buf}$, is produced by an additional half-buffer replica that is programmed with the determined value of $V_n$, and is routed to the delay cells.
Since the bias of the op-amp current source is determined by its output, a start-up circuit is necessary to ensure that the bias cell does not latch into a state where the output is 0 V. A start-up circuit consisting of a diode connected transistor that generates a voltage slightly greater than the threshold voltage for an NMOS through a biasing resistor, is attached to the bias point of the op-amp, or $V_n$, through an NMOS switch controlled by a reset signal, denoted as RST. During start-up, the RST signal is held high for a short period of time, forcing the bias point of the op-amp into a region where its devices are active. The RST signal is then released, allowing the bias circuit to settle to its natural active mode of operation.

Note that the dimensions of the op-amp transistors are similar to that in the buffers, to ensure that they operate with similar current and voltage levels. Since the bias conditions of the op-amp vary with $V_n$, the bias point of the op-amp will also vary accordingly to track the bias conditions of the delay buffers. The bandwidth of the op-amp is also varied due to the changing operating point. Its bandwidth is increased when the loop locks onto high frequencies, and decreased for lower frequencies. Thus the bias cell possesses an inherent bandwidth adjustment mechanism that varies directly with the operating frequency of the DLL to ensure adequate noise tracking.
As with all biasing circuits that employ feedback mechanisms, stability is an issue. The dominant pole in the DLL bias cell is present at the output of the op-amp, or at $V_n$. As such, compensation may be performed by the capacitive loading (denoted as $C_C$) of the delay cells attached to the bias cell. The capacitive loading of the delay cells may decrease the bandwidth of the bias cell to a point where the noise compensation mechanism cannot track high frequency fluctuations. As such, an adequate number of delay cells must be attached to the bias cell to ensure its stability, but attaching too many delay cells may affect the performance of the DLL.

The aforementioned discussion highlights the major advantages that this architecture provides for, over conventional DLLs in terms of noise suppression: i) The symmetric PMOS loads used in the delay cells approximate a linear load, and thus provide for a certain degree of power supply noise rejection [32]. ii) The lower swing limit of delay buffers is set by the loop control voltage and is referenced to $VDD$, thus isolating the signals from the lower supply. iii) The self-biasing cell attempts to compensate for high frequency supply and substrate noise that is correlated with the operating frequency of the DLL. As such, the loop bandwidth can be set as low as necessary to reject phase noise on
the input clock, leaving the task of high frequency noise rejection to the bias cell. Given that supply and substrate noise are the major causes of output jitter in the DLL, the incorporation of the aforementioned characteristics provides for several mechanisms whereby the effects of noise on the performance of the DLL may be minimized.

Figure 3.14 presents the simulated voltage transfer characteristics of the bias cell used in the implementation of the DLL, for the $V_n$ and $V_{p_{buff}}$ voltages. The $V_{p_{buff}}$ voltage tracks $V_{ctrl}$ for input voltages between the range of 0.8 V and 3.0 V. Note how $V_n$ decreases with increasing $V_{ctrl}$, as per the effects of the negative feedback loop.

![Figure 3.14 Simulated DLL Bias Cell Characteristics](image)

3.3.4 - The Phase Detector

The 180° phase detector [30] used in the implementation of the DLL is presented in Figure 3.15. The phase detector consists of an SR-latch, augmented with two pulse generators. The pulse generators produce negative pulses on the rising edge of their inputs. This polarity reversal accounts for the fact that the SR-latch accepts complementary inputs.
The SR-latch implementation of the phase detector locks onto 180° phase difference, since the DN and UP signals will be forced for equal periods of time, implying that no net charge will be transferred from the loop filter and the control voltage will remain constant (on average), when the C₀ and Cₚ clocks are 180° out of phase. Augmenting the phase detector with pulse generators that trigger a fixed width pulse only on the rising edge of their inputs ensures that duty-cycle imperfections in the input clocks do not affect the lock process. Ordinarily, the C₀ and Cₚ clocks would be routed directly to the latch's S and R inputs. Since the latch reacts to falling edges, if the duty cycle of the two clocks is different, the loop will not lock onto the desired 180° phase shift. This is due to the fact that their falling edges will not occur at the correct instants of time. The two pulse generators though will produce identical length pulses despite any imperfections in the duty cycle of the input clocks.

The operation of the phase detector, based on simulation results, when the two input clocks are not 180° out of phase is depicted in Figure 3.17. In this situation, the Cₚ clock lags the C₀ clock by 72°. On the rising edge of the C₀ clock, a falling pulse is propagated to the S input of the latch, setting the DN signal high and the UP signal low. On the event of the rising edge of the Cₚ clock, a falling pulse is propagated to the R input of the latch, setting the DN signal low, and the UP signal high. Since the rising edge of the Cₚ clock occurs shortly after that of the C₀ clock, the UP signal will be held high for a longer period of time than the DN signal. This implies that the net charge of the loop filter
capacitor will increase with time, implying that $V_{ctrl}$ and $V_{p}$ will increase, causing the delay of the cells to be greater. In turn, the $C_{\pi}$ clock will be delayed by a longer amount, allowing the loop to gradually approach the 180° lock condition. In the case where the $C_{\pi}$ clock lags the $C_{0}$ clock by 180°, Figure 3.16, the UP and DN signals are asserted for equal periods of time, and no net charge is transferred from the loop filter.

![Figure 3.16 Simulated Phase Detector Operation for Un-Locked Mode](image)

![Figure 3.17 Simulated Phase Detector Operation for Locked Mode](image)
3.3.5 - The Charge Pump & Loop Filter

The charge pump and loop filter utilized in the implementation of the DLL are presented in Figure 3.19. The charge pump consists of a current steering structure based on half-buffer replicas, that either push or pull a current onto or off of the integrating loop filter capacitor, C_{CP}. The charge pump current, I_{CP}, is determined by the transistors controlled by the V_{n} bias voltage derived from the DLL's bias cell. These transistors are usually \( \alpha \) times larger than the V_{n} controlled transistors in the delay cells. The loop bandwidth, \( f_{BW} \), to operating frequency, \( f_{in} \), ratio for this DLL architecture is constant, and is given by [29]

\[
\frac{f_{BW}}{f_{in}} = \frac{\alpha}{4\pi} \frac{C_{load}}{C_{CP}}
\] (3.8)

where \( C_{load} \) represents the total load capacitance of all the delay cells. Equation (3.8) illustrates another advantage of the self-biased architecture. The ratio of loop bandwidth to operating frequency is constant for all values of \( f_{in} \), and is predominately determined by \( \alpha \) and the ratio \( C_{load}/C_{CP} \). Note that \( \alpha \) is dependent on the ratio of transistors that can be easily matched, and that \( C_{load} \) represents device parasitics that can also be matched to \( C_{CP} \). As such, the ratio of loop bandwidth to operating frequency becomes a function of device matching, implying that it can be reliably predicted over various operating conditions and process corners.

Equation (3.8) also highlights the relationship between the loop bandwidth, and the values of \( I_{CP} \) and \( C_{CP} \). The greater \( \alpha \) is, the greater \( I_{CP} \) will be, implying that \( C_{CP} \) will be charged or discharged by the charge pump at a faster rate, indicating a higher loop bandwidth. The greater \( C_{CP} \) though, the rate at which it charges / discharges will decrease, implying a lower loop bandwidth.

The UP and DN halves of the charge pump are both based on half-buffer replicas to ensure that they both source / sink an equal amount of current. This measure ensures that the charge pump has no systematic offset. Figure 3.18(a) displays a simplified schematic of the charge pump, when the UP signal is asserted. The charge pump current, \( I_{CP} \), is
steered to the left half of the circuit, whereby it is mirrored to the node connected to $C_{CP}$. Since the DN signal is low, the right hand side of the charge pump connected to the loop filter is effectively disconnected from the output node, and the entirety of $I_{CP}$ is steered into the capacitor, causing $V_{ctrl}$ to increase. Conversely, in Figure 3.18(b), when the DN signal is asserted, the entirety of $I_{CP}$ is steered off the capacitor, causing $V_{ctrl}$ to decrease.

Note that the circuitry associated with the UP and DN signals presented in Figure 3.19 is required to ensure that biasing conditions similar to those in the bias cell are maintained in the half-buffer replicas sourcing/sinking current, to ensure that $I_{CP}$ is properly scaled by $\alpha$, and that the UP and DN currents are identical.

The change in the loop control voltage is given by

$$\Delta V_{ctrl} = \pm \frac{I_{CP}}{C_{CP}} \theta_e$$  \hspace{1cm} (3.9)

where $\theta_e$ represents the amount of time the charge pump is activated. When the UP signal is asserted, $\Delta V_{ctrl}$ is positive and $\theta_e$ represents the difference in time between the rising edge of the $C_{\pi}$ clock and the next rising edge of the $C_0$ clock. When the DN signal is asserted, $\Delta V_{ctrl}$ is negative and $\theta_e$ represents the difference in time between the rising edge of the $C_0$ clock, and the following rising edge of the $C_{\pi}$ clock.

**Figure 3.18 Operation of Charge Pump for the UP and DN signals**
Figure 3.19 DLL Charge Pump

Figure 3.20 represents a transistor level simulation of the charge pump, where the $C_\pi$ clock lags the $C_0$ clock by 72° (note that the value of $C_{CP}$ was decreased for illustrative purposes). Since the UP signal is always asserted for a longer duration than the DN signal, the net change in $V_{ctrl}$ is positive. This result agrees with the reasoning outlined in the discussion of the phase detector. Note that the slopes of the charge / discharge phases are identical, indicating that the UP and DN currents are identical. In the implementation of the DLL, the loop filter is also discharged on start-up to alleviate false-lock (i.e. locking onto multiples of the input period) conditions.

Figure 3.20 Simulated Behaviour of Charge Pump
3.3.6 - Phase Selection Circuitry

The phase selection block depicted in Figure 3.7 consists of a multiplexing block that selects the appropriate buffer output and routes it to the digitizer. As previously mentioned, a phase interpolator may also be used to select the appropriate buffer output, and further subdivide phases between adjacent buffers. The phase interpolation process though, possesses an inherent non-linearity that renders it difficult to produce systematic phase offsets. As such, it was deemed that implementing the phase selection block with multiplexers would be more suited for the required application, since undesired phase offsets would be caused solely by device mismatches, as opposed to being an inherent characteristic of the device as with the phase interpolator. A disadvantage of using multiplexers though, is that they occupy a larger area and consume more power.

A block level diagram for the phase selection block of a 32-tap delay line is presented in Figure 3.21. Each buffer output from the delay line is routed to an identical buffer, whose purpose is to ensure that the VCDL experiences the same loading conditions regardless of which tap is selected. Note that this extra level of buffering is also composed of self-biased delay cells. The use of dummy structures is also required to ensure that every buffer in the delay line experiences the same loading. For instance, a dummy buffer must be added after the last buffer in the VCDL to ensure that it is loaded by the inputs of two buffers, as with all the other buffers in the VCDL.

Two levels of 4-to-1 multiplexing are required in order to decompose the outputs of the 32 taps to two differential outputs. This pair of buffer outputs is then routed to a 2-to-1 multiplexer that provides the final phase selection within the 180° phase shift provided by the VCDL. The single differential output of this multiplexer is then routed to another 2-to-1 multiplexer, whereby for one input the signal’s polarity is maintained, and for the other input, the positive and negative terminal interconnects are swapped to provide a signal inversion. This final multiplexer provides a 180° inversion to the selected tap to provide the remainder of the 360° of phase delay required for the entire sampling operation. Using a 2-to-1 differential multiplexer as opposed to a buffer to perform the inversion operation ensures that the tap outputs undergo identical delays in the inverted
mode, as with the non-inverted mode. Using an inverter to generate the 180° to 360° phase delayed clocks would induce a static phase offset associated with the inverter’s delay, that would cause errors in the sampling operation.

The transistor level schematic of the 2-to-1 multiplexer [31] is presented in Figure 3.22. The core of the multiplexer is two delay cell input stages and their associated current sources, sharing a single pair of symmetric loads. The sections of the circuit denoted as S0 and S1 serve as switches that activate/deactivate their respective differential pairs. Control signals a0 and a1 are the multiplexer’s select lines, and determine which differential pair is connected to the output nodes. Only one differential pair may be activated at any given moment in time. The input signal associated with the active differential pair will be the one expressed at the output nodes. This multiplexer effectively acts as a delay cell that has multiple inputs, but which only one can be routed to a single pair of symmetric loads.

Using a multiplexer based on the self-biased delay cell ensures that the high-frequency noise tracking capabilities of the core VCDL are emulated in the phase selection circuitry. Note though that the output clock produced by the phase selection block bears no influence on the operation of the control loop, implying that phase selection circuitry essentially operates in an open-loop configuration. In addition, the final output clock will not be in phase with the input dock, and will express an inherent latency as previously discussed. The dual-loop topology presented in Figure 3.8 can eliminate this phase offset, that is also a function of the operating frequency.

The two control voltages in the multiplexer also ensure that the rise and fall times of the signals from the VCDL are maintained, as is also the case with the buffering between the VCDL and the phase selection block. A 4-to-1 multiplexer may also be constructed from the base configuration presented in Figure 3.22 by adding an additional pair of differential input stages and their associated current sources to the output nodes. Note though that the 4-to-1 multiplexer possesses a lower bandwidth (due to greater capacitive loading at the output nodes caused by the input stages). To ensure uniformity in signal propagation, it is desirable to construct the phase selection block with multiplexers.
that bear similar characteristics as the VCDL delay cells. As such, it is preferable to utilize the 2-to-1 multiplexers in the phase selection block to ensure an adequate bandwidth in the output channel. On the other hand, utilizing the 4-to-1 multiplexers, while limiting the maximum operating frequency, provides a savings in area (especially for large delay lines), due to the decreased number of multiplexing levels. Given that the application in question is testing, power consumption becomes a secondary issue and area becomes more relevant. Also, since the purpose of the timing module is to allow the use of the existing test core circuitry with a low-speed clock to achieve high effective sampling rates, a high bandwidth is not required in the DLL. As such, 4-to-1 multiplexers were utilized in the design of the phase selection block to minimize area consumption.

In the implementation of the timing module, the tap selection is accomplished through the use of a scan-chain where the output of each flip-flop is connected to the select lines of each multiplexer. The scan-chain is loaded with the appropriate bits that activate the desired signal path through the various levels of multiplexing. The use of a scan-chain also provides an additional savings of space, pin count, and power, as opposed to decoding logic. It also possess the additional characteristic of being fully testable.

### 3.3.7 - Additional Circuitry

In addition to the circuits described in the previous sections, supplementary circuits are required to perform single-ended to differential, and differential to single-ended conversion. Figure 3.23(a) displays the input stage of the DLL, which consists of a digital single-ended to differential converter. The input clock is propagated through two signal paths: the first simply buffers the input clock, while the other provides a signal inversion. Both the buffer and the inverter in the main signal path driving the $CLK$ and $\overline{CLK}$ signals are sized such that their delays are approximately equal. The presence of two additional inverters connected back-to-front between the $CLK$ and $\overline{CLK}$ lines ensures that if one signal switches prematurely, the other will be forced into the required complementary state. This additional feedback loop attempts to minimize duty cycle imperfections, and reduce the skew between the $CLK$ and $\overline{CLK}$ signals. Most DLLs are sensitive to duty cycle imperfections and skew in the input clock signals.
Figure 3.21 Block level diagram of phase selection block
At the output of the phase selection block, the differential to single-ended converter presented in [29] is utilized. Its circuit schematic is presented in Figure 3.23(b). This particular implementation of a differential to single-ended converter is designed to provide an output clock with a 50% duty cycle. The two opposite phase input NMOS amplifiers are modelled on the self-biased delay cells. They are biased with the same $V_n$ control voltage as with the rest of the buffers in the VCDL to ensure that they receive the correct common mode bias point. The PMOS amplifier performs the final single-ended to differential conversion. An inverting clock driver is added to the output of the PMOS amplifier to provide additional drive.
3.3.8 - Implementation Issues

In the physical implementation of the DLL, the layout of each individual delay cell and the layout of the delay line as a whole have an effect on the linearity of the phase delay from tap to tap. Mismatches in transistor sizes and parasitic / load capacitances lead to deviations in the delay per tap, which directly influence the linearity of obtained measurements. The transistors in the delay cell must be large in size such that the effects of local and global variations in the die can be minimized through the use of appropriate layout techniques. In addition, the delay line must utilize dummy structures, and must be laid out in a straight line to avoid mismatches due to interconnects of unequal length and lithographic gradients. A more detailed examination of the layout techniques required to combat the effects of die non-uniformities is presented in Chapter 4, Section 4.2.2.

3.4 - Experimental Results

The modified digitizer and the timing module were implemented in a 0.35 \( \mu \text{m} \) CMOS process on a single die. The testing was performed on a Teradyne A567 mixed-signal tester platform. The following sections will provide a brief overview of the test setup, experimental results from the digitizer and timing module when operated independently, and results when they are combined to form the entire on-chip capture system.

3.4.1 - Test Setup

A Teradyne A567 mixed-signal tester platform (Figure 3.24) was used to test the fabricated IC. A C-based language is used to program and control the tester's components. The tester incorporates a variety of DC and AWG sources, as well as a digital sub-system. The maximum output frequency of the AWG is 500 kHz, limiting the characterization of the test-core with arbitrary waveforms to signals with a low bandwidth. The digital subsystem though produces pulses at a maximum speed of 25 MHz, allowing for the characterization of the test-core with moderate frequency digital signals serving as inputs. The tester also contains a Timing Measurement System (TMS), that has the capability of
measuring propagation delays with a maximum resolution of 78 ps. This subsystem allows for the characterization of the VCDL.

The test head supplies I/O signals to the IC through a universal Device Interface Board (DIB). A PCB was designed to be interfaced and mounted onto the DIB, as shown in Figure 3.25, to facilitate testing. A close-up of the two-layer PCB is presented in Figure 3.26. Power supply regulation and decoupling, as well as careful ground plane placement and signal routing are required for such an application, to minimize the effects of high frequency noise coupling from the DLL, to the digitizer. A 15 V supply is used to power up a two regulators that provide the IC's 3.3V supply. One regulator is devoted to the DLL, and the other to the digitizer. This supply partitioning prevents noise from the switching activity of the DLL from coupling onto the supply lines of the digitizer. A series of bias resistors is also required to provide the bias voltages for the digitizer.

![Figure 3.24 Teradyne A567 Tester](image)

Figure 3.24 Teradyne A567 Tester
Figure 3.25 PCB Mounted onto Test Head

Figure 3.26 PCB Used to test Digitizer and DLL
The lower side of the PCB is designated as being the ground plane, and comprises the entire lower half of the board, excluding the area under the IC. This measure prevents noise generated on the ground plane by switching digital signals from coupling into the IC and affecting its performance. The large ground plane also ensures a low resistance path to ground. Both analog and digital ground points were provided for. A series of coaxial connectors are also present, to provide access to the IC's probe points for debugging and characterization purposes. The primary signals used to control the DLL and digitizer are routed from the DIB board to the IC through a series of block connectors located on the lower half of the board.

3.4.2 - On-Chip digitizer

Figure 3.27 presents a micrograph of the on-chip digitizer. Due to space limitations, the variable DC reference generator was not implemented on-chip, but rather the tester's programmable DC source was used to generate the required voltages. The sampling networks and comparator occupy a total area of 0.045 mm². The switches and capacitors are matched using a common centroid configuration, whereas the comparator is isolated from the sampling networks using a series of guard rings. This measure is meant to minimize the effect of substrate noise caused by the comparator's clock signal.

The digitizer's DC output versus input characteristics are presented in Figure 3.28(a). These results were obtained by digitizing a periodic ramp input signal. A closer examination of the gain of the digitizer presented in Fig. 7(b), indicates that a gain of approximately 1 V/V can be guaranteed for input voltages greater than 10% of the
A single-tone test was performed on the digitizer to determine its viability as a tool for performing DSP based frequency domain testing. One run of a 47.3 kHz tone was captured with a 25 MHz sampling clock. The quantization level of the capture operation was set to 1/256 if the 3.3 V supply, implying that a maximum signal-to-noise ratio of 8 bits would be attainable. The signal possessed a DC offset of 1.6 V, and an amplitude of 0.6 V. Figure 3.29(a) presents the captured time domain waveform. Note how its swing ranges from 2.15 V to 1.01 V, which accurately represents the expected waveform.

Figure 3.29(b) presents the PSD of the captured sine wave. For the given test frequency, a Spurious Free Dynamic Range (SFDR) of 59.5 dB is attained. This implies that an amplitude resolution of approximately 9.7 bits is available. Tests with high frequency tones did not produce SFDR measurements that provided for adequate performance. As such, it was determined that the use of the digitizer implemented in this prototype would be more suited to tests that require simple low-accuracy curve tracing functionality. This includes applications such as mask testing or functionality verification, where the time domain representation of data is required. The digitizer however does provide an adequate resolution for low-frequency AC testing.
3.4.3 - Timing Module

A 32-stage DLL based on the components previously described was fabricated. The chip micrograph is presented in Figure 3.30. Given that the phase detector utilized locks onto 180° of phase shift and the phase selection circuitry provides for an optional inversion, this architecture provides for 64 effective taps. The timing module in its entirety, occupies a total area of 0.8 mm².

Figure 3.30 Micrograph of DLL
Using the TMS, the delay versus tap characteristics of the 32-tap VCDL were characterized for various control voltages. A control loop override mechanism was included in the design of the DLL to facilitate the characterization of the VCDL using external control voltages for $V_{ctrl}$. The results are presented in Figure 3.31(a). These measurements are normalized to the lowest attainable delay, implying the smallest attainable delay from the combination of first tap and lowest control voltage, with the addition of the propagation delay of the pads, is subtracted from all the measurements. Since the operation of the undersampling algorithm relies on the placement of sampling edges relative to their previous position, the absolute value of the delay per tap is of little relevance. Most of the desired information can be extracted by examining the change in delay from tap to tap.

Note how the slope of the line representing the delay per tap (which is equivalent to the sampling resolution) decreases as the control voltage is decreased. This decrease in slope represents a higher sampling resolution, and is consistent with the design of the delay cell which provides higher drive with decreasing $V_{ctrl}$. The control voltage sweep was performed in a linear manner, yet the slopes of adjacent lines do not increase in a linear fashion, due to the non-linear characteristics of the delay cell and the VCDL. Note how for the lower control voltages there is little change in the slope of the line. In addition, the non-linearities of the VCDL are evident in the non-uniformity of each line associated with a particular control voltage. Figure 3.31(b) provides a close-up of some of the delay versus tap measurements, to further emphasize the presence of these non-uniformities.

![Figure 3.31 VCDL Delay versus Tap for various Control Voltages](image-url)
A line-of-best-fit algorithm was used on each set of measurements associated with a particular control voltage to arrive at the slope of the line, which can also be interpreted as the delay per buffer, or the sampling resolution of the system. A more common nomenclature for this quantity is the Least Significant Bit (LSB) designation, often associated with data converters. The timing module can be interpreted as being a D/A that converts the digital input word representing the desired tap selection, to a desired phase delay, which is an analog quantity. The resolution of this conversion operation is defined by the average increase in delay per input code word, which is equivalent to an LSB. Note that in the case of the implemented timing module the LSB is a variable quantity, determined by

\[ \text{LSB} = \Delta t_{\text{SHIFT}} = \frac{1}{64 \cdot F_{\text{CLK}}} \]  

(3.10)

where \( F_{\text{CLK}} \) is the frequency of the DLL's input clock. Conversely, the equivalent sampling resolution can be defined by

\[ F_{\text{SAMP,EQUIV}} = 64 \cdot F_{\text{CLK}} \]  

(3.11)

The multiplication of the input clock by a factor of 64, is derived from the fact the DLL provides a phase delay equal to \( 1/64 \) the of the input clock per tap, resulting in an equivalent sampling rate that is 64 times higher than the input clock speed using the undersampling algorithm.

Figure 3.32(a) represents the LSB of the timing module as a function of forced control voltage (i.e. the lock mechanism has been by-passed and the control voltage is derived from an external DC source). The VCDL is operable for control voltages ranging from 0 V to 2.5 V, resulting in a delay per buffer ranging from 180 ps, to 24 ns. This range of delays implies that effective sampling resolutions between 41.7 MHz and 5.6 GHz may be achieved without the use of the lock mechanism. Note the non-linear nature of the delay cell, as indicated by the dramatic increase in delay between 2 V and 2.5 V. Figure 3.32(b) provides for a close-up of the LSB of the timing module versus the control voltage for the range where the gain of the delay cell is of moderate magnitude. Note how...
the delay of the cell does not decrease for voltages lower than 0.6 V. The region highlighted in this graph is the optimal region for operating the timing module, since the highest equivalent sampling resolutions may be attained, while maintaining sufficient immunity to noise on the control voltage, due to the low gain of the delay cell. The effects of delay cell gain on immunity to noise on the control voltage are discussed to greater depth in Chapter 4, Section 4.1.2.

From the results obtained above, the required control voltage for a particular operating frequency may be extrapolated. Given that the control voltage for a particular LSB has been determined, using (3.10), an estimate of the associated operating frequency may be inferred. The results of this analysis are presented in Figure 3.33(a). The control voltage varies in a linear manner with operating frequency, as predicted by [29] providing a gain of approximately -18.8 mV/MHz, or -53.16 MHz/V. The negative gain is due to the fact that the delay cell's speed increases with decreasing $V_{ctrl}$.

Using the DLL's lock mechanism, $V_{ctrl}$ was measured for various operating frequencies, under the same conditions as the previous measurements. The results are presented in Figure 3.33(b), superimposed onto the inferred control voltages derived above. The tester’s digital subsystem limited the characterization of the lock mechanism to a maximum operating speed of 25 MHz. Note how the control voltage versus frequency dataset produced by the lock mechanism closely follows that of the forced mode. This likeness implies that the lock mechanism appropriately tunes the VCDL to the desired
resolution. A plot comparing the locked and forced LSBs is presented in Figure 3.34. Once again, note the similarities between the two datasets, indicating the functionality of the lock mechanism.

Note that the measurements of $V_{ctrl}$ provided above represent average values, since the DLL's control voltage fluctuates in a triangular waveform-like fashion in its steady state. This is due to the fact that the UP and DN signals from the phase detector are asserted for equal periods of time when the loop is in lock, causing an equal amount of voltage to be charged / discharged from the integrating capacitor for each clock tick. Figure 3.35 presents a graph displaying the transient behaviour of the DLL's control voltage as captured by an HP 54602B Oscilloscope, for various operating frequencies. Note the overdamped response of the system, as expressed by the gradual transient in the initial
As the operating frequency increases, the magnitude of the voltage fluctuations in the final transient decrease. This is due to the fact that the charge pump pushes / pulls current off the integrating capacitor for shorter periods of time at higher frequencies, implying that the control voltage will vary less when the DLL is in lock. Note how the control voltage decreases in a uniform manner for every 5 MHz drop in the operating frequency. This result is consistent with the linear relation between $V_{ctrl}$ and the DLL’s operating frequency, as presented in Figure 3.33(a).

### 3.4.4 - VCDL Calibration Methods

As previously mentioned, given the dependency of the undersampling algorithm on the generation of precise, consistent phase delays in order to guarantee the quality of the captured waveform, matching becomes an issue that plays a vital role in the physical implementation of the phase delay generation block. Any device mismatches will cause the delay produced per cell at a particular bias point, to vary from tap to tap. These mismatches can occur in the delay cells themselves or from the tap selection circuitry. Calibration methods may be utilized to circumvent these mismatch issues and obtain high linearity measurements. Figure 3.36 represents a system level schematic of the suggested
calibration scheme. The characteristics of the voltage controlled delay line are characterized by external measurement equipment, such as the tester's TMS, for a fixed control voltage. An ideal, software-based linear transfer characteristic for the delay line is then developed based on a "best-fit" algorithm, and INL and DNL information are extracted. Using the ATE, a software based optimization algorithm, and an on-chip ΔΣ based DC voltage generator (this can take the form of a digital ΔΣ followed by an RC filter, or the approach described in [25]), the control voltage is modified per tap to produce a transfer characteristic that best approximates the ideal case. The modifications required to the control voltage are then stored in a calibration RAM that can take the form of ATE memory or on-chip memory. When the phase delay generation circuitry is in operation the control voltage used for each tap is generated based on the value stored in the calibration RAM, ensuring that the most linear measurement possible can be obtained.

The DLL's DNL and INL as a function of output tap for an input clock of 25 MHz, or an LSB of 625 ps, before and after calibration, are presented in Figure 3.37(a) and Figure 3.37(b), respectively. The calibration voltages were stored in the ATE's memory and produced by its programmable DC source. The maximum DNL and INL after calibration are 21 ps and 47 ps, respectively, for an LSB of 625 ps. These figures represent an order of magnitude of improvement over the measurements obtained from the VCDL before calibration, which indicate a DNL and INL if 265 ps and 223 ps respectively. The added accuracy implies that less error will be introduced into the characterization of a device by the sampling operation itself, leading to more representative measurements.

The DLL's DNL and INL as a function of output tap for an input clock of 25 MHz, or an LSB of 625 ps, before and after calibration, are presented in Figure 3.37(a) and Figure 3.37(b), respectively. The calibration voltages were stored in the ATE's memory and produced by its programmable DC source. The maximum DNL and INL after calibration are 21 ps and 47 ps, respectively, for an LSB of 625 ps. These figures represent an order of magnitude of improvement over the measurements obtained from the VCDL before calibration, which indicate a DNL and INL if 265 ps and 223 ps respectively. The added accuracy implies that less error will be introduced into the characterization of a device by the sampling operation itself, leading to more representative measurements.
3.4.5 - Overall System Operation

Using an on-chip digitizer wired to a digital input pad, a 25 MHz clock signal was captured using the undersampling algorithm, but with the delayed clock being sourced from an external generator. The value of $\Delta t_{shift}$ was set to 625 ps, resulting in $F_{SAMP\_EQUIV} = 1.6$ GHz. Figure 3.38(a) presents the resulting waveform. The digitizer was then clocked with the DLL running at 25 MHz (limited by the test set up), resulting in $F_{SAMP\_EQUIV} = 64 \times 25e6 = 1.6$ GHz, and the same clock signal was captured. The resulting waveform is displayed in Figure 3.38(b). Note the similarities between the two waveform captures. The digitizer also manages to capture rise times on the order of a nanosecond, demonstrating an improvement in the front-end bandwidth compared to previous prototypes [3]. The phase delay between the signals is due to the latency in the tap selection circuitry.
Figure 3.39 presents the eye-pattern from a 10 MHz 128 bit data stream, obtained with the combined operation of the DLL and the undersampling algorithm. The digitizer and DLL were clocked at 10 MHz, resulting in an effective sampling resolution of $\Delta t_{\text{shift}} = 1.5625$ ns. The signal was generated with a large amount of jitter, using the tester’s variable timing edges. The jitter is represented in the eye-pattern by the inconsistencies in the occurrence of the rising and falling edges, and accurately portrays that created by the tester. The plots presented in this section verify the functionality of the undersampling algorithm, and its physical implementation with the DLL based timing module.
3.5 - Performance Limitations

The factors that dominate the performance of the overall system are linearity and jitter. The implemented sampler provides adequate linearity for low-frequency sinusoidal signals but provides a poor resolution for higher test signals. This can be mainly attributed to the simplicity of the circuitry that lacks mechanisms to compensate for the sources of error discussed in Section 3.2.4 and Section 3.2.5, which can have a profound performance on the quality of the digitized signal. Compensating for the non-uniformity of the gain as a function of input voltage, as previously discussed, provides for one manner in which the linearity of the system may be improved. Architectural modifications to the sampler though, are required to provide adequate high-speed performance.

The linearity of the DLL's delay line will also affect the quality of the final digitized signal. The aforementioned discussion has demonstrated that through various calibration techniques, that the linearity of the delay line may be improved by an order of magnitude. Characterizing the improvement in resolution this calibration routine provides was made difficult, given the fact that the digitizer, even when clocked with an accurate external source, does not provide an adequate resolution for high frequency signals. Nonetheless, the measured linearity of the DLL does present a marked improvement that will undoubtedly improve the performance of the overall system.

Jitter also affects the linearity of the measured results in the same manner the non-linearity of the delay line does. Even though its effects are time varying and statistical, a deterministic metric that predicts the overall performance of systems with jitter may be obtained [23]. This metric is easily obtained for simple signals such as sinusoids, but is difficult to obtain for signals with a more complex composition. The jitter in a delay line, is also proportional to its length and delay [13]. In addition, the lock mechanism of the DLL plays a role in determining its final performance.

Figure 3.40(a) and Figure 3.40(b) present the rms and peak-to-peak jitter of the DLL for various operating frequencies and tap outputs, as measured by a Tektronix TDS 8000 Digital Sampling Oscilloscope. Note how in the case of both the rms and peak-to-
peak jitter, the trend of decreasing jitter with increasing operating frequency is visible. This relation of inverse proportionality is due to the following factors: i) As the operating frequency of the DLL is increased, the delay per buffer is decreased and as such the total delay of the delay line decreases. As previously mentioned, the jitter in a delay line is proportional to its delay. ii) In the architecture utilized, the gain of the delay cell decreases with increasing operating frequency. As such, there is a greater immunity to control voltage fluctuations, which translates directly into decreased jitter on the output clock.

Note also that the data for the 32nd tap exhibits a greater amount of both rms and peak-to-peak jitter. This is due to the fact that buffers at the end of the delay line express the net effects of time jitter present in preceding buffers, and the phase selection circuitry which operates in an open loop configuration. Note though that the variation is in the range of 2 ps at most. The maximum rms and peak-to-peak jitter obtained for the operating range presented are 27 ps and 264 ps, respectively. This high jitter measurement may be partially attributed to jitter on the input clock (on the order of 300 ps peak-to-peak), and noise induced in the test interface. Using the following relation presented in [23]:

$$D_{MAX} < \log_2 \left( \frac{2\sqrt{2}}{\pi t_{j-RMS} F_{s-MAX}} + 1 \right)$$  (3.12)

where $F_{s-MAX}$ represents the test-core’s clock speed, and $t_{j-RMS}$ the DLLs associated rms jitter at 25 MHz, the maximum resolution attainable (without taking into account the linearity of the delay line), $D_{MAX}$, is 10 bits.
From the above results, the conclusion can be made that jitter in the timing module in general can be reduced by constructing it with a delay line of moderate size [13]. A delay line that is too large, will exhibit a pronounced degree of jitter in its later cells. The design trade-off between DLL size and clocking speed is emphasized by this conclusion. For a large delay line, high effective sampling resolutions may be attained with a low speed input clock. The time jitter performance of the DLL is compromised though. Minimizing jitter is required in order to attain an adequate resolution, as indicated by (3.12).

3.6 - Summary

In this chapter, a methodology for extending the effective sampling frequency of the on-chip mixed-signal test-core presented in [3] was introduced. The technique used is an undersampling algorithm that allows for the capture of arbitrary waveforms with sampling clocks of speeds well below the required Nyquist frequency of the desired signal. The hardware implementation of this algorithm was performed with a specialized on-chip timing module that consists of a DLL with tap selection circuitry. Due to the nature of the DLL based timing module, the maximum attainable effective sampling resolution is bounded by the intrinsic gate delay of the technology the system is implemented in. In a 0.35 µm CMOS process, effective sampling intervals well below the nanosecond range where achieved. The operation of the timing module was successfully demonstrated, and indicates the viability of the proposed system as an on-chip signal extraction tool. While suffering from drawbacks such as calibration and long test times, and despite the simplicity of the circuits utilized, the proposed system provides for effective sampling resolutions that are difficult to achieve with conventional processes and data conversion techniques.
Chapter 4 - Jitter Measurement

4.1 - Jitter Measurement

4.1.1 - Time-to-Digital Converters (TDCs)

Time-to-digital converters are analogous to analog-to-digital converters, where the analog quantity to be converted into a digital word is a time interval. Typically, these devices measure the time between two edges, often denoted as the START and STOP edges. The START edge ordinarily initiates the time interval measurement operation, while the STOP edge terminates it. The most common method used to measure time intervals using integrated circuits involve the use of delay lines [33 - 35]. The START signal is applied to a delay line composed of a series of buffers of equal delay. The state of the delay line (i.e. the extent of progression of the START edge through the delay line) is latched upon the assertion of the STOP signal. Given that apriori knowledge of the buffer delays is available, the time interval between the two edges can be deduced from the final state of the delay line.

The delay lines are of finite extent, and as such are usually limited to measuring relatively short time intervals. Some TDCs employ the use of range extension counters that provide for an initial coarse time measurement [33,36]. The delay line is used in conjunction with these counters to provide for a finer time resolution. Other methods of
measuring time intervals involve the use of pulse-shrinking techniques [37] or time interpolation [38].

The use of such devices extends to applications such as laser ranging and high-energy physics experiments. In this thesis, the use of a TDC for the purposes of jitter measurement will be explored. This TDC is based on a VDL sampler [39] that with some simple supporting circuitry can perform accurate, high resolution jitter measurements. The VDL sampler based TDC has numerous advantages over its aforementioned counterparts, including a deep sub-gate delay sampling resolution, the capability for continuous operation, and a structure that facilitates range extension and data extraction. It is based on a dual-delay line architecture. The operation and design of this unit, and its application to jitter measurement, is presented in the forthcoming sections.

4.1.2 - Vernier Delay Line (VDL) Samplers

The algorithms and circuits presented in Chapter 3, are fundamentally limited in terms of the maximum effective sampling rate they can provide, by the intrinsic gate delay of the technology they are implemented in. Given that the effective sampling resolution is determined by the minimum attainable delay per voltage controlled delay cell, the accuracy of the system cannot exceed the minimum delay of a buffer in the delay line. In more practical terms, since the buffer will be sized in a manner that takes into consideration matching and drive constraints, the minimum attainable delay will be greater than the intrinsic gate delay of the technology, leading to a lower maximum effective sampling rate.

Intrinsic gate delays scale with technology, as indicated in Table 4.1. This implies that as more advanced technologies are used to implement the test-core timing module, higher effective sampling rates may be achieved. Given the operating speed of present day communication devices, these sampling rates are adequate for the extraction of many key parameters for arbitrary signals. However, these sampling rates fall short of providing a high enough resolution in order to fully characterize the nature of jitter in high speed digital signals, where the peak-to-peak jitter may be in the range of tens of picoseconds.
The need then arises for the development of sampling structures with sub-gate delay resolutions.

<table>
<thead>
<tr>
<th>Technology</th>
<th>Intrinsic Gate Delay (ps)</th>
<th>Theoretical Maximum Effective Sampling Resolution (GHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.35μ</td>
<td>70</td>
<td>14.3</td>
</tr>
<tr>
<td>0.25μ</td>
<td>50</td>
<td>20</td>
</tr>
<tr>
<td>0.18μ</td>
<td>25</td>
<td>40</td>
</tr>
</tbody>
</table>

Vernier Delay Line (VDL) samplers are structures that provide for sub-gate delay sampling resolutions for digital signals. They have been previously used to perform time-interval measurements [40] and data recovery [39]. The Vernier measurement principle is based on having two calibrated references, each slightly offset from one another. These references are used in conjunction to obtain measurements with an accuracy that is a fraction of the smallest sub-division of either reference. Figure 4.1 demonstrates the application of this principle in a form that pertains to a sampling operation.

The $N$-stage VDL sampler is composed of three fundamental elements: i) a DATA delay line consisting of a $N$ buffers with delay $\tau_1$, ii) a CLK delay line consisting of $N$ buffers with delay $\tau_2$, iii) a series of D-type latches with the output of each DATA delay
line buffer tied to the D inputs of each corresponding latch, and the output of each CLK delay line buffer tied to the clock input of each corresponding latch. If the condition

$$\tau_2 > \tau_1$$  \hspace{1cm} (4.1)

is satisfied, then this structure samples the DATA digital input signal with a resolution of

$$\tau_{res} = \tau_2 - \tau_1$$  \hspace{1cm} (4.2)

using the CLK signal as a trigger, and the values of each sample point are stored at the outputs of the latches.

A structure of this kind is limited by the fact that it can not continuously sample an incoming signal since the sample points at the output of each latch will be corrupted at the rising edge of every CLK signal (assuming rising-edge triggered latches). Also, the capture range of the sampler is limited to

$$\tau_{RANGE} = N\tau_{res} = N(\tau_2 - \tau_1)$$  \hspace{1cm} (4.3)

due to the finite number of sampling cells.

Another issue concerning data extraction from the sampler is that the sample points are all latched at different moments in time, leading to synchronization issues that must be considered when interfacing this unit with other blocks. Certain read-out and range extension structures exist though, that allow for the continuous operation and synchronization of the outcoming data [41] to facilitate the operation and interfacing of the sampler. The use of such mechanisms though, is not required in the implementation of the jitter measurement device to be discussed.

Figure 4.2 provides for a graphical representation of the operation of the VDL sampler. The waveforms depicted represent the outputs of the DELAY and CLK buffers for the first three sampling cells, when the system is excited with identical pulses on both the DATA and CLK lines.
In the first sampling cell, the latch is excited by the DATA pulse delayed by $\tau_1$ seconds, and the CLK pulse delayed by $\tau_2$ seconds. The phase difference between the two waveforms is given by $\tau_{res}$, implying that the sample point captured at output $Q_1$ lies $\tau_{res}$ seconds away from the origin of the DATA signal. At the second sampling cell the CLK and DATA waveforms have propagated through two buffers, implying that the DATA and CLK waveforms, as seen by the latch, are now $2\tau_{res}$ seconds apart. The sample captured with the latch will now represent the logical level of the DATA signal $2\tau_{res}$ seconds away from its origin. Further propagation of the DATA and CLK waveforms through their respective delay lines, ensures that the third sampling cell captures a sample point on the DATA signal $3\tau_{res}$ seconds away from its origin, and that for the $i$-th sampling cell the sample captured represents the logical level of the DATA signal $i\tau_{res}$ seconds away from its origin. In this manner, this structure samples the incoming DATA signal with a resolution of $\tau_{res}$ seconds, or an effective sampling rate of

$$F_{SAMP\_EQUIV} = \frac{1}{\tau_{res}} \text{ Hz}$$

for a duration of $N\tau_{res}$ seconds.

Practically, the CLK and DATA signals cannot have the same origins in time, since the set-up times of the latches and the delays associated with the I/O circuitry must be taken into consideration when deciding when to trigger the sampling operation. The trigger point is dependant though on the window of time which the sampler is expected to examine. For example, given the situation depicted in Figure 4.2, if sample points on the
rising edge of the DATA signal are required, then the CLK trigger signal must be advanced in time with respect to it. These adjustments can be introduced on or off-chip by skewing the CLK and DATA signals through the use ATE equipment or voltage controlled buffers. The complementary version of the CLK signal, which is also readily generated on or off-chip, may be used to capture the falling edge of the DATA signal if the capture range of the device is not large enough to sample the entire pulse.

Note that it is not necessary to construct a VDL sampler consisting of many cells in a jitter measurement application. Enough cells are required such that an edge and the uncertainty associated with it may be captured. The value of $\tau_{res}$ will also play a role in determining the number of cells required. If the bound on the edge uncertainty is known (i.e., the peak-to-peak jitter), then the minimum number of sampling cells required to fully characterize the jitter can be described by

$$TDC_{LENGTH\_MIN} = \frac{J_{pk-pk}}{\tau_{res}}$$

where $J_{pk-pk}$ delimits the an expected bound on the peak-to-peak jitter of a signal. In a practical implementation though, the addition of more cells in the jitter measurement device is necessary to ensure that the characteristics of signals that deviate beyond the expected bounds may be captured.

The values of $\tau_1$ and $\tau_2$ can be set using several methods. The DATA and CLK buffers may be sized appropriately to provide the required delays. This method is subject to process and temperature variations, implying that the value of $\tau_{res}$ cannot be accurately predicted or tuned. The inability to tune the device also implies that it can only sample signals with a fixed resolution. Using voltage controlled buffers for the delay lines provides an additional degree of freedom whereby the resolution of the device is variable. The tunability of the device can extend from the use of either a DATA or CLK voltage controlled delay line, or both delay lines being voltage controlled.

The tuning method proposed for the VDL architecture to be used in the implementation of the jitter measurement device consists of two voltage-controlled delay
lines, as presented in Figure 4.1. The resolution of the device is set by the $V_{DATA}$ and $V_{CLK}$ control voltages. This architecture allows for maximum tunability, since the maximum resolution attainable will only be limited by the non-linearities of the device, as opposed to sizing constraints. As the resolution of the device is increased, non-uniformities in the values of $\tau_1$ and $\tau_2$ due to device mismatches may lead to signal loss in the delay lines.

This mechanism can also provide for better noise immunity with respect to the delay line control voltages. Given that both the data Data and Clock delay line buffers can be independently tuned, there are multiple values for $V_{CLK}$ and $V_{DATA}$ that will provide for the same sampling resolution. The control voltages can thus be set at values where the buffers are least sensitive to variations in the control voltages, while maintaining a high sampling resolution. Figure 4.3 illustrates this principle.

The control voltages $V_{DATA(1)}$ and $V_{CLK(1)}$ presented in Figure 4.3(a), represent an operating point that provides a for a resolution of $\tau_{res}$ seconds. The control voltages $V_{DATA(2)}$ and $V_{CLK(2)}$ set up an identical resolution of $\tau_{res}$. The resolution set by $V_{DATA(2)}$ and $V_{CLK(2)}$ will be less susceptible to noise though, due to the relationship between the control voltages and buffer delay around the associated operating point. Take for example the two control voltages $V_1$ and $V_2$, and their associated identical noise element $\Delta v$, presented in Figure 4.3(b). Voltage $V_1$ lies on a region of the Delay versus Control Voltage curve where the slope of the line, or the delay per volt, is large. The noise variable $\Delta v$ that is superimposed onto $V_1$ causes a change in the buffer delay denoted by $\Delta \tau_1$. Voltage $V_2$ on the other hand, lies on a region of the curve where the delay per volt is considerably smaller, and thus the total change in delay caused by $\Delta v$, denoted by $\Delta \tau_2$, is smaller. Thus, by choosing control voltages on a region of the curve presented in Figure 4.3(b) where the slope is diminished, the effects of noise on the control voltages can be alleviated, since variations in the buffer delays is minimized. This is especially true in the case where the resolution of the sampler is close to its fundamental limit, and the $V_{DATA}$ and $V_{CLK}$ control voltages may be similar in magnitude.
4.1.3 - Tuning Methods

As with all integrated circuits, process, temperature, and other ambient conditions can alter the operation of the circuit. Specifically, in the case of the VDL sampler, the characteristics of the delay cells are altered by these factors. If the VDL sampler is to be manually tuned, then it must be fully characterized as a function of temperature for every integrated circuit. This can be a time consuming process that adds to the cost of production testing, and may not provide for complete coverage of all the conditions that effect the resolution of the sampler. An automatic tuning mechanism that compensates for process and temperature variations is presented in Figure 4.4.

The delays of the two individual delay lines are set using DLL-like tuning mechanisms. Each delay line is equipped with a phase detector, charge pump, and loop filter (capacitor), that set the delay of the entire delay line to be equivalent to one period of an input tuning clock. This implies that the phase detector forces the loop into a lock state when the phase difference between the clock at the input of the delay line, and the clock at the output of the delay line is 360°. The delay per buffer is then given by

$$\tau = \frac{T}{N}$$

(4.6)
where $T$ delimits the period of the input tuning clock, and $N$ represents the number of buffers in the delay line. Note that the use of some dummy buffers is required in order to ensure that the delay cells attached to the phase detectors have the same delay as the remaining cells.

The DATA delay line is tuned with a clock of period $T_1$, while the CLK delay line is tuned with a clock of period $T_2$. Given the condition presented in (4.1) and the relation presented in (4.6), the period $T_2$ must always be greater than $T_1$ for the VDL sampler to function correctly.

Once the delay lines have been tuned, the tuning mechanism may be disabled using switches that disconnect the signals routed to the phase detector. The device is then operated in its normal mode of operation, and given that the capacitor used in the loop filter is typically large, the control voltage can usually be held for the duration of the entire sampling operation. This approach limits the amount of time the VDL sampler may be operated for though, due to the discharging of the capacitor. Other implementations involve a master-slave arrangement [40] whereby an exact copy of the VDL sampler is used to generate the control voltages, which are then routed to another VDL sampler that actually performs the sampling operation. The performance of this approach is also

![Figure 4.4 Tuning Mechanism for a VDL sampler.](image)
subject to several constraints, such as the matching between the master and slave samplers, and the noise generated by the tuning mechanism if it remains in operation during sampling. The sampler used to generate the control voltages may be deactivated during the sampling operation, but then once again, the discharging of the loop filter capacitance limits the amount of time the sampler may be operated for. The additional area required to implement the tuning sampler poses an additional overhead.

One disadvantage of this DLL-based tuning mechanism, is that clocks with accurate periods are required in order to tune the delay lines. This technique is widely used to tune delay lines though, due to its simplicity, accuracy, and the fact that it requires little "user intervention". The accuracy of the clocks required, is dependent on the length of the delay line, and the resolution required of the VDL sampler. The longer the delay lines, the less error will be introduced in the tuning of the delay line by inaccuracies in the clock periods. If an error in a period is denoted by \( t_{\text{error}} \), then the delay per buffer will be tuned to

\[
\tau = \frac{T + t_{\text{error}}}{N} = \frac{T}{N} + \frac{t_{\text{error}}}{N}
\]

seconds, where the error is denoted by the second term in the expansion of the equation. This relation indicates that the more stages in the delay line, the less impact the error in the tuning clock period will have on the resolution of the sampler. This error is also dependent on the magnitude of \( t_{\text{error}} \) with respect to \( T \). Ordinarily, \( t_{\text{error}} \) is significantly smaller than \( T \), implying that a large delay line will ensure that the contribution of the error per buffer is negligible. If on the other hand \( t_{\text{error}} \) is a significant fraction of \( T \), the length of the delay line will have little impact on the error introduced into each buffer delay. Note also that since two clocks are utilized in the tuning of the sampler, they may compound the error in \( \tau_{\text{res}} \) since they both may deviate from their desired values. If both clocks originate from the same source (which is a likely scenario assuming an ATE setup is used to tune the device), then the error in the period of each clock may be predictable, and can be compensated for.
The non-linearities that exist in the sampler due to device mismatches cannot be compensated for though, due to the fact that the lock mechanism tunes a particular delay across the entire delay line. The tuning operation is transparent to the actual delay per individual buffer. Even though the use of a large delay line alleviates some detrimental effects due to tuning clock imperfections, it may have a negative impact on the linearity of the device. This is due to the fact that a large delay line may be spread across a large area of the die, and global die imperfections may have a significant affect on the matching of the delay buffers. Time jitter errors are also more prevalent in long delay lines.

Other advantages of having longer delay lines include relaxing the requirements on the clock source generating the tuning signals. The difference between the periods of the two sampling clocks, can be expressed as

\[ \Delta T = T_2 - T_1 = N(\tau_2 - \tau_1) = N \cdot \tau_{res} = \tau_{RANGE} \] (4.8)

and highlights their dependency on the length of the delay lines. If a high sampling resolution is required, then \( \Delta T \) will be small. This implies that if a device such a PLL based frequency synthesizer is used to generate the tuning signals, it requires accurate fractional frequency division ratios to provide for a negligible difference between periods of the two tuning clocks. If \( N \) is large though, the value of \( \Delta T \) will be large, and thus a clock source with a lower fractional frequency division resolution is required.

### 4.1.4 - A Jitter Measurement Device Based on a VDL Sampler

The VDL sampler presented in the preceding section serves the generic purpose of sampling the logical levels of digital signals with a fine sampling resolution. Given that it has this capability, with the addition of some simple digital circuitry, its operation can be extended to perform high resolution jitter measurements. The fact that the extension is simple in nature is due in part to the simplicity of the algorithm used to extract jitter information from the sampled signals.

The VDL sampler examines portions of a waveform at discrete moments of time, and can provide information as to whether an edge occurred at a particular instant. In the
case where the sampler is operated continuously using a low-jitter clock as the sampling CLK signal, to capture a jittery input DATA signal, the output of the latches will fluctuate, indicating in which discretized sample point an edge occurred. Figure 4.5 provides a graphical representation of this process.

As the location of the jittery DATA edge varies with time, the outputs of the VDL cells that sample the region of time where the edge uncertainty exists, will fluctuate accordingly between the logical "1" and logical "0" levels. By counting the occurrence of logical highs for each sampling cell, an edge density graph or jitter CDF may be obtained (given that the number of transitions to be sampled is known in advance). This count represents an edge density because it indicates the probability of the occurrence of a logical high beyond a certain point in time for a certain number of edge occurrences. This interpretation of the edge count mirrors that of what a jitter CDF is defined as. Peak-to-peak and RMS jitter information may be extracted directly from the jitter CDF [17], or it may be differentiated to provide a jitter PDF [42] from which the same parameters may be extracted.

![Figure 4.5 Jitter Quantification with the VDL Sampler](image-url)
Based on the preceding discussion, the only extension required to the VDL sampler in order for it to be able to perform jitter measurements, is the addition of a counter at the output of each sampling cell. The function of these counters would be to track the number of logical “1”s a cell detects throughout the course of an entire data stream sampling operation. Figure 4.6 portrays the required modifications to the base structure presented in Figure 4.1. The counters required to perform such a function would require two inputs: one input would be the output of the latch of the VDL sampling cell and would indicate to the counter whether or not to increment its count, and the other input would be a timing signal that indicates to the counter when the data at the output of the sampling cell is valid and may be examined. A counter that could readily meet these requirements would be an asynchronous counter with a gated clock. The clock signal for each counter would be the clock signal used to trigger the latch in the associated sampling cell. This measure ensures that the counter only triggers when the sampled data at the associated cell is ready, and eliminates the need for additional synchronization or read-out circuitry. The signal that gates the counter’s clock would be the output of the sampling cell latch. If a logical “1” is present, the clock will propagate to the counter and it will increment its count. The presence of a logical “0” on the other hand will indicate the absence of an edge for a particular cell, and thus the clock signal for the associated counter will not propagate to it and the count will not be incremented.

![Figure 4.6 A Jitter Measurement Device based on a VDL Sampler](image-url)
This jitter measurement device can perform the three types of jitter measurements previously discussed in Chapter 2. The device is universal enough, such that the measurement mode is determined mainly by the signals that are routed to the input ports. In the case of accumulative jitter measurement, the low-jitter reference clock signal is routed to the sampler's CLK port, whereas the signal in question is routed to the DATA port. For a relative jitter measurement, the signal acting as a reference is also routed to the CLK port, whereas the other is routed to the DATA port. Finally, in the case of period jitter measurement, the signal in question must act as both the trigger and data signal. In this case, a buffer with a variable delay must be placed in the path of the data signal to induce a phase lag relative to the sampling edge, given that they are both identical. Neglecting to implement this measure would cause the sampler to consistently skip over the rising edge of the input pulses, as previously discussed.

In order for the jitter measurement device to be implemented as an autonomous unit, the inclusion of some additional supporting circuitry is required as shown in Figure 4.7. If the DLL-based tuning mechanism is not implemented, and a totally digital interface is desired, then the use of DACs may be required in order to provide the two tuning voltages that set the resolution of the sampler. In addition, if the device is required to perform the three aforementioned jitter measurement types on-chip (i.e. the jitter measurement device is used as an on-chip test structure, as opposed to an independent stand-alone device), then a signal router (or multiplexer) must also be implemented to select the appropriate trigger and data signals. Finally, the addition of some variable delay buffers at the input ports of the device may be required to skew the input signals, for reasons previously discussed.
Figure 4.7 Optional Supporting Circuitry for Jitter Measurement Device

Figure 4.8 displays how the jitter measurement device may be incorporated as an on-chip test structure, and the partitioning of the resources required to operate it. The CUT must be stimulated with an external source that provides a data signal, and a clock synchronization signal. The clock signal may not be necessary in the case of an asynchronous device. The data generation unit may also be on-chip, such as the case with on-chip PRBS signal generators [43]. The output of the CUT and the synchronization signal from the external source are routed to the jitter measurement device, which depending on the jitter measurement mode required, chooses whether or not to make use of the synchronization signal through the signal router. An all-digital ATE controller may be used to coordinate and synchronize the operation of both the on-chip and off-chip devices.

Figure 4.8 On-Chip incorporation of the jitter measurement device
The incorporation of a VDL sampler based jitter measurement device as an on-chip structure provides for many advantages. The output of the CUT does not have to be routed off-chip, and thus signal integrity and quality are maintained, ensuring representative jitter measurements. The loading on the CUT is also significantly less than that an I/O pad may provide, since the output signal is only routed to the input of a solitary buffer in the DATA delay line. This architecture also represents a simplified form of VDL sampler, and as such no specialized range extension or read-out circuitry is required.

The jitter test time is also greatly enhanced, since the only factors limiting the speed of the test are the length and speed of the input data stream. If an input test pattern of frequency $F_{\text{TEST}}$ is composed of $k$ bits, then the total test time $\sigma$ is given by

$$\sigma = \frac{k}{F_{\text{TEST}}}$$

excluding the time required to extract the jitter CDF data and process it.

Another advantage of the proposed device is that it possess the ability to process high speed signals without skipping any edges. Most other start / stop time interval measurement devices are limited in terms of their maximum operating speed, which requires that they skip edges. This is mainly due to the fact that the majority of them extract measurement information on every cycle of the trigger signal, thus limiting their speed to the maximum operating frequency of the data collection / read-out circuitry. Other devices such as TIAs also have a large deadtime. The architecture proposed requires data extraction only at the end of the jitter test, and thus can be operated at much higher speeds. Three factors limit the maximum operating speed of this architecture: i) the propagation delay of the buffers in the DATA and CLK delay lines, ii) the propagation delay through the latches in the VDL cells, iii) the maximum operating speed of the counters. If the speed of the incoming DATA or CLK streams exceeds that of either $\tau_1$ or $\tau_2$ respectively, then the signals propagating down the delay lines will be corrupted. Likewise, if the input bit period is less than $t_{p,\text{ latch}}$, the propagation delay of the latches, then the sampled data will be corrupted. The same is also true for the counters, where exceeding their maximum operating speed, $F_{\text{COUNTER_MAX}}$, will also lead to data

86
corruption. A theoretical bound on the maximum operating speed of jitter measurement device may be expressed as

\[
F_{\text{MAX}} = \min\left(\frac{1}{\tau_1}, \frac{1}{\tau_2}, \frac{1}{t_{p\_latch}}, F_{\text{COUNTER\_MAX}}\right)
\]  
(4.10)

where the \( \min(\cdot) \) function extracts the smallest of its input parameters. The major limiting factor of these four constraints is usually the maximum operating speed of the counters. Given an appropriate architecture though, the counters may be designed such that they are limited mainly by a propagation delay of a flip-flop, leading to a moderately high maximum operating speed. Note that the propagation delay of the counter and latches scales with technology, and will play less of a role in determining the maximum operating speed in advanced technologies. The size of the counters will also scale with technology, implying that the data capture circuitry will be more area efficient. The VDL sampler though, is subject to matching constraints that render the delay line buffers large, and therefore may not scale at the same rate as the supporting circuitry.

As a final note, it should be mentioned that the VDL sampler architecture presented may also be used as a start / stop time interval analyzer, but would be subject to the same speed constraints as its conventional counterparts, due to the limitations of the data extraction and read-out circuitry.

4.2 - Implementation

4.2.1 - The Delay Cell

The physical implementation of the jitter measurement device depends mostly on the careful design of the VDL sampler. The choice of an appropriate delay cell and utilizing appropriate layout techniques are imperative to ensuring adequate performance for the device. Factors that affect the performance include susceptibility to noise and linearity. Both factors are affected mainly by the choice of delay cell and the use of appropriate layout techniques.
The delay cell utilized in the implementation of the jitter measurement device is presented in Figure 4.9. Supply noise, which is the main cause of jitter in CMOS delay lines [44], can often be reduced through the use of differential structures. Even though the delay cell utilized is a single ended structure, past implementations of TDCs with this delay cell [40,45] have proven to provide for low-jitter delay lines that are relatively unaffected by power supply fluctuations. Previous surveys [32] also indicate that well designed single-ended delay cells can have similar or better phase noise performance than their differential counterparts. The use of single ended structures also simplifies the design process by eliminating the need for supporting circuitry such as single-ended to differential conversion circuitry, and facilitates the use of proper layout techniques.

The delay cell consists of two voltage-controlled inverters, represented by transistors M1-M4 and M5-M8 respectively. Transistors M1 and M2 represent a standard implementation of a CMOS inverter. Transistor M3 represents the voltage-controlled element of the delay cell, and modifies the current through the inverter though the use of a control voltage, $V_{ctrl}$. Increasing $V_{ctrl}$ turns on M3, providing a greater current through M1 and M2. The greater the current through this pair of transistors, the less the delay of the cell is, due to its increased drive capabilities. Decreasing $V_{ctrl}$ on the other hand decreases the current through the inverter, leading to a lower drive capability that increases the delay of the cell. The purpose of transistor M4 is to provide a constant path to ground for the M1/M2 inverting pair, to ensure that input signals can still propagate through the cell when M3 is in cutoff. The sizing of M4 will play a role in determining the slowest response of the cell. As such, it must be sized appropriately to ensure that it does not provide too large a current that may dominate the delay of the cell and marginalize the effect of transistor M3.

Transistors M5-M8 mirror transistors M1-M4, and serve the purpose of restoring the polarity of the input signal, and ensuring that both the rising and falling edges of the input signal are equally delayed. Having unequal delays for the rising and falling edges of the input signal may lead to the loss of the signal as it propagates down the delay line.
The sizing of the delay cell affects two factors: i) the range of delay the cell can provide, and the linearity of the time-to-digital conversion operation. In an architecture where the absolute value of the delay determines the sampling resolution of the device, extensive simulations must be performed to ensure that the cell is capable of providing the desired resolution over all process corners. This in turn affects the sizing of the device, which must take into account the uncertainty caused by these process variations. In the proposed dual-tunable delay line architecture though, process corners play less of a role in determining the resolution of the TDC, since it relies on the difference between two delay cells. This implies that a fine sampling resolution is almost always guaranteed. This assurance is provided by the fact that if the delay cell is too slow or too fast, the delays of both the DATA and CLK delay buffers can be set close enough such that a fine sampling resolution is always attainable. However, a coarse sampling resolution is not always guaranteed, since the delay cells may be faster than the desired specifications and the required difference between two the delay cells may not be attainable. Note also that the maximum operating speed of the device is limited by the propagation delay of the buffers, so process corners must be taken into account when designing for the maximum input frequency of the DATA and CLK signals.
4.2.2 - Sizing and Layout Considerations

Delay cell transistor sizing also affects the linearity of the device due to the matching constraints of the physical process the device is implemented in. Variations in the delay of individual buffers, affect the sampling resolution per sampling cell, leading to non-linearities in the sampling operation. Factors that influence the delay of the cell are its load capacitance, the dimensions of its transistors, and their threshold voltages. These parameters are all prone to variation due to inconsistencies in the physical construct of the die. As such, the physical implementation of the delay cells must take these factors into account, through the use of appropriate layout techniques.

Local die variations can affect matching between transistors in the delay cell, whereas global die variations affect the matching between different delay cells. In order to combat global die variations, the dimensions of the transistors must be made large such that the percentage variation of transistor widths and lengths is minimized with respect to location on the die. Constructing the delay cell with large transistors also helps combat local die variations, since large dimensions facilitate the use of layout techniques that provide reasonable matching between individual transistors. The sizes suggested in [40] and used in the implementation of the delay cell, provided for a reasonable range of delays and large enough dimensions to provide adequate matching. These dimensions are presented in Table 4.2.

<table>
<thead>
<tr>
<th>Transistors</th>
<th>Width (μm)</th>
<th>Length (μm)</th>
</tr>
</thead>
<tbody>
<tr>
<td>M1 &amp; M5</td>
<td>28</td>
<td>1.2</td>
</tr>
<tr>
<td>M2 &amp; M6</td>
<td>28</td>
<td>1.2</td>
</tr>
<tr>
<td>M3 &amp; M7</td>
<td>28</td>
<td>1.2</td>
</tr>
<tr>
<td>M4 &amp; M8</td>
<td>12</td>
<td>3.0</td>
</tr>
</tbody>
</table>

In the delay cell presented in Figure 4.9(a), each transistor in the first inverter must be matched with its counterpart in the second inverter. This ensures uniformity in the delay of the rising and falling edges of an input signal. As such the transistor pairs...
M1/M5, M2/M6, M3/M7, and M4/M8 must be laid out in a common centroid fashion to alleviate dimension, capacitance, and threshold variations [46]. Figure 4.10 illustrates the layout used for the delay cell.

Note the use of dummy transistors at the peripheries of the delay cell. This measure ensures that all the transistors experience the same lithographic gradients. Take for example the common centroid configuration of the M1/M5 pair. To the immediate right of this pair, is the M2/M6 transistor configuration that causes the right half of M1/M5 to experience a particular lithographic gradient. In order to ensure that both halves of M1 and M5 are identical, a dummy half of the M2/M6 transistor pair must be placed to the left of the M1/M5 pair such that the left half experiences the same lithographic gradients as the right half. The same argument can be made for the M4/M8 pair, and the dummy transistors to their immediate left. The only unmatched gradients in this particular layout involve the boundary between the M3/M7 pair and the M4/M8 pair. This is due to the fact that the M2/M6 pair are not of the same dimensions as the M4/M8 pair. The effect of this mismatch was minimized though, by ensuring that the height and width of the physical implementation of the M4/M8 pair was identical to those of the other transistors.

The use of dummy devices should not only be limited to the internal construct of the delay cell, but is necessary in the implementation of the entire delay line. Dummy delay cells should be placed adjacent to the vernier sampling cells at the extremities of the delay lines. This ensures that all delay cells experience the same lithographic gradients.

![Figure 4.10 Layout of Delay Cell used in TDC](image)
Other concerns involving the implementation of the delay line that affect linearity involve its length.

The longer the delay line, the larger the region of the die it is spread out across, implying that die imperfections will have more of an effect on the linearity of the device. Previous work [47] indicates that a practical limit for CMOS delay lines is in the region of 6 to 7 bits of resolution, or delay lines with a maximum of 64 to 128 delay elements. The delay line should also be laid out in a straight line [40,47], to avoid the static errors due to unmatched interconnect lengths resulting from folding the delay line. Another matter of concern in long delay lines is the ohmic drop in supply voltage that is caused by excessively long power supply lines. This supply voltage drop affects the buffer delays, since the transistors’ operating points are modified. Partitioning the power supply distribution, without adding breaks in the delay lines is possible, as presented in the floor-plan in Figure 4.11. Segmenting the power supply distribution in this fashion, allows for the use of multiple VDD pins to alleviate the ohmic drop, and facilitates current distribution. If the entire delay line were to share one long power line, it would have to be significantly wide to allow for the amount of current drawn from all the delay cells. Note also that appropriate on-chip supply decoupling and filtering is also required to enhance the performance of the device, and allow it to be less prone to effects such as jitter.

The latches that comprise the remaining components of a VDL sampler cell are also vulnerable to process variations and do affect the sampling operation. The main factor that contributes to the non-linearity in the sampling operation is the threshold at which the latch arbitrates whether is has sampled a logical “0” or “1”. Typically this variation in a standard digital CMOS is not large, and is in the region of tens of millivolts [48]. This variation causes sampling errors that are dependant on the rise time of the input waveform. For example, if a 3.3 V waveform with a risetime of 300 ps is sampled with a latch that has a logical threshold that deviates from its nominal value by 20 mV, then a sampling error of 2 ps is introduced. The magnitude of this error is small in comparison to that may be introduced by the non-linearities in the delay-lines. Thus, given that the latches do not contribute significantly to the total error, and to simplify implementation of jitter
measurement device by minimizing the number of hand-crafted components, standard cell realizations of the latches may be used.

**4.2.3 - Edge Counter Implementation**

As previously discussed, the edge counters must selectively increment their count based on whether an edge occurred in the associated cell VDL sampler cell. An asynchronous counter with a gated clock or "count" signal was deemed to be a viable solution. Figure 4.12 represents a ripple counter implementation of the required asynchronous counter.

The structure was modified from the basic implementation of a ripple counter to allow for data extraction and testability. Multiplexers were placed in the path of the
counter's data and clock signals to allow for a TEST/READ mode, and a RUN/COUNT mode. The Data_Sel line can be toggled to allow each flip-flop in the counter to either latch the complement of its previous state as an input, or latch the output of the previous flip-flop as an input. In the former case the counter is in its COUNT/RUN mode, whereas in the latter case the counter is in a scan-chain like configuration that allows for testing and data extraction. In the test mode, the CLK_Sel line is toggled such that an external clock (Scan_Clk) may be used to load data into the flip-flops through the use of the Scan_In pin. The output stream is simultaneously read from the Scan_Out pin and compared with the input data. The counter data may also be extracted in a similar fashion after a jitter test is complete, by clocking out the contents of the flip-flops. Table 4.3 summarizes the counter's valid modes of operation.

### Table 4.3 - Edge Counter's Valid Modes of Operation

<table>
<thead>
<tr>
<th>Data_Sel</th>
<th>Clk_Sel</th>
<th>Mode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>Test/Extract</td>
<td>The counter can be tested in a &quot;scan-chain&quot; like manner through the use of the Scan_In, Scan_Clk, and Scan_Out pins. Once a test is complete, the counter data may be extracted in the same manner.</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>Run/Count</td>
<td>The counter is configured to count when a pulse is asserted at the output of the AND gate.</td>
</tr>
</tbody>
</table>

Note that the Scan_Out pin of one counter may be connected to the Scan_In pin of another counter such that only three pins are required to fully test and extract information from these two counters. This principle can be applied to a multitude of counters, such that only three pins are required for testing and data extraction regardless of the number of counters.

The counter is composed of \( N+1 \) flip-flops. The first \( N \) flip-flops count the number of edge hits, whereas the last flip-flop is an overload detection bit that indicates if the number of hits exceeds the maximum count. The counter is designed to detect a maximum of \( 2^N-1 \) hits. The \( N+1 \) bit acts as the highest bit of an \( N+1 \) bit counter, and is toggled high if the hit count exceeds \( 2^N-1 \). This bit is also tied to the SET signal (not shown) of all the
other flip-flops in the counter, and thus if the count exceeds $2^N - 1$, the counter is disabled from resetting its state and counting further. A RESET line (not shown) is also available, such that the counter may be reset at the beginning of a jitter test.

The mechanism that arbitrates whether or not to increment the count consists of an AND gate that has the sampled logical level from the VDL latch and the clock associated with that latch as inputs. These input ports are denoted as the $DATA$ and $CLK$ ports respectively. If a logical "1" is present the clock signal propagates to the counter and the count is incremented. Otherwise, if a logical "0" is sampled by the latch, the clock is prevented from propagating to the counter and the count is not incremented. An additional delay buffer (not shown) may be required for the $Clk$ signal, to ensure that the counter does not prematurely trigger before the $Data$ signal is ready. Depending on the length and size of the counter, additional buffering may be required for the $Scan_Clk$ signal.

### 4.3 - Experimental Results

The VDL sampler based jitter measurement device was implemented in 3.3 V, 0.35 μm CMOS process, based on the blocks and guidelines described in Section 4.2. The device was implemented with 101 VDL sampling cells (space limitations restricted the size of the delay line from being a power of 2) and 16-bit counters with one overflow bit. The tuning circuitry was also not implemented, due to space limitations and simplicity. Figure 4.13 represents a micrograph of the jitter measurement device chip. The VDL sampler occupies an area of 2.4 mm$^2$, while the counters occupy an area of 3.1 mm$^2$. The counters occupy a large area due to the fact they were implemented with standard cells.

#### 4.3.1 - Test Setup

A Teradyne A567 mixed-signal tester platform (Figure 3.24) was used to test the fabricated IC. A C-based language is used to program and control the tester's components. The tester incorporates a variety of DC and AC sources, as well as a digital sub-system with an maximum edge displacement resolution of 78ps. This limitation implies that the minimum resolution the TDC can be characterized for with some degree of accuracy and
without the use of external equipment, is 78 ps. A Time Measurement Subsystem (TMS) with an 8ps accuracy is also available. This subsystem allows for the measurement of propagation delays, and can be used for the characterization of the delay cell and the delay lines.

Figure 4.13  Jitter Measurement Device Chip Micrograph
The test head supplies I/O signals to the IC through a universal Device Interface Board (DIB). A PCB was designed to be interfaced and mounted onto the DIB, as shown in Figure 4.14, to facilitate testing. A close-up of the two-layer PCB is presented in Figure 4.15. Power supply regulation and decoupling, as well as careful ground plane placement and signal routing are required for such an application, to minimize the effects of noise and crosstalk. A 15 V supply is used to power up a series of regulators that provide the IC's 3.3V supply. Two regulators individually power up the two delay lines in the TDC, whereas the last one supplies power to the counters. This supply partitioning prevents noise from the switching activity of the counters from coupling onto the supply lines of the delay lines.

The lower side of the PCB is designated as being the ground plane, and comprises the entire lower half of the board, excluding the area under the IC. This measure prevents noise generated on the ground plane by switching digital signals from coupling into the IC and affecting its performance. The large ground plane also ensures a low resistance path to ground. Both analog and digital ground points were provided for. The DATA and CLK traces are isolated from each other on the board to minimize crosstalk effects that may influence jitter measurements. Their lengths on both the PCB and on the IC are equalized to minimize static delay offsets. A series of coaxial connectors are also present, to provide access the IC's probe points for debugging and characterization purposes. The primary signals used to control the jitter measurement device are routed from the DIB board to the IC through a series of block connectors located on the lower half of the board.
Circuits For On-Chip Sub-Nanosecond Signal Capture and Characterization  Jitter Measurement

Figure 4.14 PCB mounted onto Test Head

Figure 4.15 Printed Circuit board used to test Jitter Measurement Device IC
4.3.2 - Delay Cell Characteristics

A single delay cell connected directly to I/O pads was included on the prototype IC to facilitate characterization. The cell's delay versus control voltage characteristic is the main parameter of concern. Using the A567 tester a rising edge is applied to the delay cell and using the TMS, the delay between this edge and the resulting output edge for a fixed control voltage is measured. This process is repeated for several control voltages resulting in the graph presented in Figure 4.16. This graph represents the delay of the cell normalized with respect to its nominal delay (i.e. when the control voltage is at its highest). The results are presented in this manner to demonstrate the range of the delays the cell is capable of producing.

From the graph, it is evident that the cell is capable of providing a delay range of 560 ps. This implies that the coarsest sampling resolution the TDC will be able to provide is $\tau_{\text{res}} = 560$ ps, since the maximum attainable difference between $\tau_2$ and $\tau_1$ is 560 ps. This may be accomplished by setting $V_{\text{DATA}}$ to 3.3 V, and $V_{\text{CLK}}$ to 0 V. Note how the delay of the cell only begins to decrease when the control voltage is greater than 0.6 V. This is because the threshold voltage for an NMOS transistor in the chosen technology is approximately 0.6 V, implying that the voltage controlled transistor in the delay cell does not draw significant current until the control voltage surpasses this level. For control voltages between 0.6 V and 1.5 V, the delay cell exhibits a relatively linear transfer characteristic. Control voltages higher than 1.5 V are ineffective in dramatically

![Figure 4.16 TDC Delay Cell Characteristics](image)
decreasing the buffer delay, due to the inverse relationship that exists between the delay and control voltage. It is this region or the region below 0.6 V that might be suitable for providing high resolutions while providing a high immunity to control voltage noise. This immunity is due to the fact that the gain of the cell in terms of Delay/Voltage is significantly smaller than for other operating points.

4.3.3 - Transfer Characteristic and Linearity Measurements

Using the A567 tester’s digital subsystem, the linearity of the TDC was evaluated by capturing its transfer characteristics. This was accomplished by exciting the CLK terminal of the TDC with a single rising edge with a constant phase, and exciting the DATA terminal with a single rising edge of variable phase. The position of the DATA rising edge is swept, while for each sweep the contents of the edge counters are read out. Since the TDC is excited with a single rising edge, the output of the counters should register only one hit. The number of counters that register this single hit though, should change as the position of the DATA edge is varied. This process was repeated for a number of sampling resolutions, and the results are presented in Figure 4.17. Note, as it was previously mentioned, that the tester’s edge displacement resolution is limited to 78 ps at most, so the minimum LSB that can be accurately characterized using its digital subsystem is 78 ps.

The graphs presented display the output code (the number of counters registering an edge) of the TDC versus the time interval between the CLK and DATA edges. Each transfer characteristic curve represents the average of 10 runs. This measure averages out the effects of noise and jitter on the input signals. Note how as the resolution of the device is increased, the non-linearities in the transfer characteristics become more evident. This is due to the fact that static timing offsets constitute a larger proportion of the LSB as its value is decreased. Note that the LSB was calculated using a ‘line-of-best-fit’ algorithm.
Given that the TDC is composed of 101 vernier sampling cells, the jitter measurement device can be characterized as having a resolution of approximately 6.65 bits at most. The LSB of the device is determined by setting the two control voltages $V_{CLK}$ and $V_{DATA}$, and it in turn determines the capture range of the device, as indicated in (4.3). Panels (a) - (c) demonstrate how the capture range decreases with a finer LSB.

From the graphs presented in Figure 4.17, DNL and INL information may be extracted. Figure 4.18(a) and Figure 4.18(b) represent the DNL and INL, respectively, of the TDC when its resolution is set to 115 ps. The accuracy of the TDC for this resolution, in terms of INL, is at most $\pm 1$ LSB. This degree of accuracy is consistent with similar implementations of TDCs [40,47].
4.3.4 - Creating Signals with Jitter

A method for generating signals with a repeatable and predictable jitter was devised to facilitate testing of the jitter measurement device. Figure 4.16 demonstrates that the characteristics of a single delay cell approximate a linear function for a range of control voltages. As such, the cell may be used as a phase modulator for digital signals, by applying the modulating signal at the cell’s \( V_{\text{ctrl}} \) input terminal. The cell’s relatively linear delay versus control voltage characteristic ensures that the amplitude distribution of the input control voltage is mapped directly to the phase distribution of the output signal. As such, the delay cell may be modelled as a voltage-controlled phase modulator, with a linear delay gain that may be expressed in units of seconds/volt. Figure 4.19 demonstrates this principle.

The method of generating signals with predictable jitter with this delay cell is as follows: A relatively jitter-free input signal is applied to the single delay cell. A control voltage with an amplitude distribution that approximates the desired final jitter distribution is simultaneously applied to the cell. Subsequently, portions of the input digital stream, including its rising and falling edges, are delayed in a manner that tracks the control voltage input waveform. As such, the input stream experiences a variable delay, whose distribution approximates that of the cell’s control voltage. This variable...
delay emulates the effects of a deterministic jitter, since a priori knowledge of the control voltage and the delay cell’s characteristics is available.

As indicated in Figure 4.19, using a control voltage with a Gaussian amplitude distribution results in a Gaussian jitter distribution for the output signal. Similarly, using a control voltage with a sinusoidal amplitude distribution results in an output digital stream with a Sinusoidal jitter distribution. Experimental results verifying the effectiveness of this technique are presented in the following measurements section. Note that the relationship between the frequency of the input digital signal and the rate of change of the delay cell’s control voltage will determine the nature of the jitter in the output signal. A control voltage that varies slowly with respect to the input signal will produce jitter that may be interpreted as being edge jitter. This is due to the fact that the delay of the cell will change slowly, and thus several periods of the input signal may pass without the phase being modified, but the phase of a single edge may be modified. If the control voltage changes at a rate that is fast with respect to the input signal, then the resulting jitter may be interpreted as being period jitter. This is due to the fact that the delay of the cell may change rapidly for one period of the input bitstream, resulting in dramatic changes in the signal’s duty cycle, that can be interpreted as being period jitter.

Based on the measured delay cell characteristics, a phase shift of approximately 520 ps/volt is attainable in the linear region of the delay versus control voltage curve. The

![Diagram](image-url)

**Figure 4.19 Generating Signals with Jitter through the use of a Single Delay Cell.**
maximum peak-to-peak jitter that can be produced by the cell (by having $V_{\text{ctrl}}$'s maximum and minimum values being $V_{DD}$ and 0 V, respectively) is 560 ps. The test set-up used to determine the effectiveness of the jitter measurement device is presented in Figure 4.20. The phase modulating control voltage is produced from an off-chip AWG. The data signals and the sampling / trigger clock are also sourced from an external generator. The output of the delay cell is connected to the DATA input of the jitter measurement device, and the external reference clock from the data generator is used as the sampling trigger in the TDC. Note that the clock signal does not necessarily have to be the same speed as the data signal. A slower trigger signal may be used, but this inherently implies that edges will be skipped and the test time will increase.

4.3.5 - Jitter Measurements

Before a jitter measurement may be performed using the set-up prescribed in Figure 4.20, a priori knowledge of the number of falling and rising edges in the input test sequence is required in order to produce the jitter PDF. This count represents the maximum number of hits each counter should expect, and acts as the normalization factor that produces a jitter CDF and PDF that are referenced to unity.

For an input test sequence with $K$ rising edges, measured with an $M$ stage jitter measurement device, the probability of an edge occurring at the instant of time associated with a particular VDL stage, or the $i$-th bin in the jitter CDF is given by

![Diagram](Figure 4.20 Set-up used to evaluate functionality of jitter measurement device.)
\[ CDF(i) = \frac{C(i)}{K} \quad M \geq i \geq 1 \quad (4.11) \]

where \( C(i) \) represents the count from the \( i \)-th counter. The jitter PDF, can then be defined as:

\[ PDF(i) = CDF(i) - CDF(i-1) \quad M \geq i \geq 2 \quad (4.12) \]

Given the above result, the RMS jitter may be defined as:

\[ RMS_{JITTER} = \sigma = \sqrt{\sum_{i=2}^{M} \left[ \frac{C(i) - C(i-1)}{K} \right]^2 \cdot PDF(i) - \sum_{i=2}^{M} \left[ \frac{C(i) - C(i-1)}{K} \right]^2 \cdot PDF(i)} \quad (4.13) \]

which represents the standard deviation of the jitter distribution for an LSB of \( \tau_{res} \). Substituting (4.11) and (4.12) into (4.13), results in:

\[ RMS_{JITTER} = \frac{\sum_{i=2}^{M} \left[ C(i) - C(i-1) \right]}{K} \quad (4.14) \]

which directly provides the RMS jitter from the outputs of the edge counters. The peak-to-peak jitter is defined as:

\[ P_{k} - P_{k,JITTER} = \tau_{res} \cdot \left( \sum_{i=2}^{M} f(PDF(i)) \right) \quad (4.15) \]

where \( f[x] \) indicates whether an edge will ever occur in a particular location in time by providing a value of 1 through the following relation:

\[ f[x] = \begin{cases} 0 & 0 \geq x \\ 1 & 1 > x > 0 \end{cases} \quad (4.16) \]

where if \( x \) is always the jitter PDF, it will be bounded by zero and unity. The above relations may also be applied to the falling edges of the input waveform to arrive at their values of RMS and peak-to-peak jitter.
The measurement equipment utilized limits the characterisation of the TDC to resolutions no lower than 78 ps. Using an indirect measurement scheme that uses signals with a known deterministic jitter to extrapolate the performance of the device, the TDC was demonstrated to provide sampling resolutions that are finer than the 78 ps measurable limit. This technique involved generating several signals with various jitter distributions using the methodology described in Section 4.3.4, and measuring their characteristics with a Tektronix 11801C Digital Sampling Oscilloscope. The jitter measurement device was then set to a resolution below what was previously measurable by the tester, and used to sample the signals with known jitter characteristics. The peak-to-peak and rms jitter of the signals was then extracted in terms number of VDL cells that registered edge fluctuations, and compared to the measurements obtained by the Tektronix oscilloscope. An estimate of the resolution per cell may then be obtained through this comparison. Using the aforementioned technique, the TDC was demonstrated to provide a resolution of 18 ps. This LSB represents a factor of 10 improvement in resolution, over the 180 ps/tap fundamental limit of the DLL, which was implemented in the same technology.

The functionality of the jitter measurement device was verified using a jittery 25 MHz clock signal. Two versions of the clock signal were created: one possessing a Gaussian jitter distribution, the other a Sinusoidal jitter distribution. The jitter histogram of the Gaussian distribution is presented in Figure 4.21, as measured by the 11801C Digital Sampling Oscilloscope. The signal possesses a peak-to-peak and rms jitter of 318 ps and 30.41 ps, respectively. The Sinusoidal jitter distribution, presented in Figure 4.22, possesses a peak-to-peak and rms jitter of 644 ps and 230.4 ps, respectively.

With the LSB of the jitter measurement device set to 18 ps, all 16384 cycles of the clock signal were captured. The resulting on-chip jitter CDF and PDF for the signal with a Gaussian jitter distribution are presented in Figure 4.23(a) and (b), respectively. Note the similarities in the histogram obtained from the oscilloscope, and the PDF obtained by the jitter measurement device. Using (4.14) and (4.15), the measured rms and peak-to-peak jitter were calculated to be 27.61 ps and 324 ps, respectively. These results correlate with those obtained with the Tektronix oscilloscope.
Similarly, the jitter CDF and PDF for the signal possessing a Sinusoidal jitter distribution are presented in Figure 4.23(c) and (d), respectively. The measured rms and peak-to-peak jitter were determined to be 220.2 ps and 666 ps, respectively, which correlate with those obtained with the oscilloscope.

Figure 4.21 Gaussian Jitter Distribution measured with an Oscilloscope
Figure 4.22 Sinusoidal Jitter Distribution measured with an Oscilloscope
4.4 - Performance Limitations

Two main factors influence the performance of the jitter measurement device: the linearity of the delay lines, and the jitter in the delay lines. The non-linearity of the TDC will produce measurement errors due to the fact that the sampling instants are not equally spaced in time. Given that these non-idealities may be characterized, as shown in Figure 4.18, they can be taken into consideration when processing the outputs of the counters. This measure will not improve the resolution of the device, but will help arrive at more accurate bounds for the measured jitter. Characterizing the non-linearities of each chip that contains the jitter measurement device will add to test times. As such this
measure would be more suited for situations where the device is used as a stand-alone jitter measurement tool.

Jitter of the input sampling clock and in the delay lines is also a major cause of error. The input sampling clock serves as the trigger signal for the sampling operation. As such, any fluctuations in the edge of this clock will cause the sampling window to shift in time with respect to the data of interest. This effect causes the measured time between the data and reference edges to be misrepresented. As such, a relatively jitter-free (with respect to the data signal) sampling clock is required. The effects of jitter on the input clock may be suppressed through averaging though, by taking a larger set of measurements.

Jitter in the delay lines has a similar effect on measured results as the TDC nonlinearities, with the exception that these effects are time-varying and statistical as opposed to deterministic. Longer delay lines are also more prone to the effects of jitter, due to its accumulative nature. The first cell in the delay line will usually exhibit the least amount of jitter. The last cell will usually express the accumulated effects of the jitter caused by all its preceding cells. The jitter in a delay line is also proportionally to its delay [13], implying that for lower resolutions, the jitter will be greater.

The experiment described in [40] and used to determine the jitter in a TDC based on a VDL sampler was repeated for the jitter measurement device. Two separate single rising edges were applied to the CLK and DATA terminals 100 consecutive times, with the displacement between them held constant. The variation in the counter hits is indicative of the jitter in the delay lines. This experiment was performed for a sampling resolution of 18 ps. Figure 4.24(a) presents the count histogram obtained for a small displacement between the DATA and CLK edges, implying that the sampled edge transition occurs at an early stage of the delay line. In this case the edge appears with the greatest probability at the 16th VDL sampling cell. The rms jitter at this stage of the delay line is measured to be 5.3 ps.
Figure 4.24(b) presents the count histogram obtained for a larger displacement between the DATA and CLK edges, implying that the sampled edge transition occurs at a later stage of the delay line. The edge appears with the greatest probability at the 76th sampling cell, with an associated rms jitter of 9 ps. These results reaffirm that a greater degree of delay line jitter is exhibited for a greater number of delay cells.

\[ \sigma_n = \sqrt{\sum_{i=1}^{n} \sigma_{\text{jitter}}^2} = n \cdot \sigma_{\text{jitter}} \]

where \( \sigma_{\text{jitter}} \) represents the rms jitter per sampling cell. From the above results, the value of \( \sigma_{\text{jitter}} \) for a sampling resolution of 18 ps is 0.82 ps.

4.5 - Summary

In this chapter, a jitter measurement device based on a VDL time-to-digital converter was introduced. The device is capable of sampling digital signals with sub-gate delay resolutions. An series of on-chip counters examine the sampled digital data and produce an edge density graph, from which rms and peak-to-peak jitter information may be extracted. The device has a minimal test time that is equal to the duration of the input data stream, and also provides for negligible loading on the CUT. Sampling resolutions on the order of 18 ps (25% of a gate delay) where demonstrated in a 0.35 \( \mu \text{m} \) process.
Chapter 5 - Conclusions

In the preceding chapters, two circuits for on-chip sub-nanosecond signal capture were presented. The first is the DLL based timing module that allows for a hardware implementation of an undersampling algorithm. This undersampling algorithm was successfully used in conjunction with an on-chip mixed-signal test-core to capture waveforms with a bandwidth that extends well beyond the sampling speed of the system. The sampling resolution of the timing module is limited by the intrinsic gate delay of the technology it is implemented in. In a 0.35 μm CMOS process implementation, an automatically tuned effective sampling rate of 1.6 GHz was achieved. Resolutions in the range of 5.6 GHz may also be attained without the use of the automatic tuning mechanism. Given these achievable effective sampling rates, the use of the on-chip test-core may be extended to applications that require high bandwidth capture.

The linearity of the combined timing module and test-core operation is limited by the linearity of the VCDL, and the performance of the sampler. A calibration method for the VCDL was introduced that can reduce its INL and DNL by an order of magnitude. The resolution of the entire waveform capture operation is also limited by the jitter of the DLL. This source of error can be minimized through careful DLL design. The design trade-offs pertaining to the implementation of the timing module were also presented.

The sampler was deemed to be the main limiting factor in terms of resolution and linearity. An amplitude resolution of approximately 9 bits is achievable for low frequency
signals, but rapidly deteriorates for high frequency signals. This is attributed mainly to the simplistic design of the sampler implemented in this particular prototype, which lacks mechanisms to compensate for dynamic error sources that are more pronounced in the capture of high speed signals.

The second circuit presented is a specialized jitter measurement device based on a VDL Time-to-Digital converter. This device samples an input digital stream with a sub-gate delay resolution, and constructs an on-chip jitter CDF. In a 0.35 μm CMOS implementation, a sampling resolution of 18 ps, approximately on-quarter of a gate delay, was achieved. The device can perform jitter tests at a minimal test time, equal to the duration of the input test sequence.

The factors affecting the performance of this device are similar to those that affect the aforementioned timing module, given that they are similar in construct. Linearity and jitter are the major performance limiting factors, and their effects can be alleviated through careful design and layout. An accurate external clock reference is also required for the successful operation of this device. As with the aforementioned test-core system though, calibration issues render the jitter measurement device impractical for production testing. It can however serve as a viable stand-alone tool for the measurement of jitter.

Both the aforementioned systems express a commonality that is a result of striving to achieve simplistic circuits to perform complex operations. Instead of attempting to design systems that perform true real-time sampling for high speed signals, the approach of delaying sampling signals by incremental intervals to arrive at high equivalent sampling rates was applied. The restriction of sampling periodic signals exclusively (as with the case of the test-core), or digital signals exclusively (as with the TDC) becomes a drawback, but one that can often be overlooked in test applications. It is through this methodology that on-chip sub-nanosecond sampling structures may be successfully implemented, as has been demonstrated in this thesis.
References


Circuits For On-Chip Sub-Nanosecond Signal Capture and Characterization


