# A Time-Based Approach for Multi-GHz Embedded Mixed-Signal Characterization and Measurement

Mouna Safi-Harab,

B.A.Sc. in Electrical Engineering, 2000

M. Eng., 2003

Department of Electrical Engineering McGill University, Montréal



December 2006

A Thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

© Mouna Safi-Harab, 2006



Library and Archives Canada

Published Heritage Branch

395 Wellington Street Ottawa ON K1A 0N4 Canada Bibliothèque et Archives Canada

Direction du Patrimoine de l'édition

395, rue Wellington Ottawa ON K1A 0N4 Canada

> Your file Votre référence ISBN: 978-0-494-32322-9 Our file Notre référence ISBN: 978-0-494-32322-9

### NOTICE:

The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.

The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

#### AVIS:

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par télécommunication ou par l'Internet, prêter, distribuer et vendre des thèses partout dans le monde, à des fins commerciales ou autres, sur support microforme, papier, électronique et/ou autres formats.

L'auteur conserve la propriété du droit d'auteur et des droits moraux qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.

While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.



Conformément à la loi canadienne sur la protection de la vie privée, quelques formulaires secondaires ont été enlevés de cette thèse.

Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.

### Abstract

The increasingly more sophisticated systems that are nowadays implemented on a single chip are placing stringent requirements on the test industry. New test strategies, equipment, and methodologies need to be developed to sustain the constant increase in demand for consumer and communication electronics. Techniques for built-in-self-test (BIST) and design-for-test (DFT) strategies have been proven to offer more feasible and economical testing solutions.

Previous works have been conducted to perform on-chip testing, characterization, and measurement of signals and components. The current thesis advances those techniques on many levels. In terms of performance, an increase of more than an order of magnitude in speed is achieved. 70-GHz (effective sampling) on-chip oscilloscope is reported, compared to 4-GHz and 10-GHz ones in previous state-of-the-art implementations. Power dissipation is another area where the proposed work offer a superior solution compared to previous alternatives. All the proposed circuits do not exceed a few milliWatts of power dissipation. Finally, and possibly most importantly, all the proposed circuits for test rely on a different form of signal processing; the time-based approach. It is believed that this approach paves the path to a lot of new techniques and circuit design skills that can be investigated more deeply. As an integral part of the time-based processing approach for GHz signal capture, this thesis verifies the advantages of using time amplification. The use of such amplification in the time domain is materialized with experimental results from three specific integrated circuits achieving different tasks in GHz high-speed in-situ signal

measurement and characterization. Advantages of using such time-based approach techniques, when combined with the use of a front-end time amplifier, include noise immunity, the use of synthesizable digital cells, and circuit building blocks that track the technology scaling in terms of area and speed.

í

### Résumé

De nos jours, les systèmes de plus en plus sophistiqués intégrés sur une simple puce placent des conditions rigoureuses sur l'industrie du test. De nouvelles stratégies, équipements et méthodologies doivent être développés afin de soutenir la demande croissante de composantes électroniques pour l'industrie des communications et les applications pour le grand public. Il s'est avéré que les dispositifs d'autovérification incorporporés (BIST) et les stratégies de testabilisation (DFT) offrent des solutions de test plus viables et économiques.

Les travaux précédents ont été conduits pour investiguer des techniques de test sur-puce, de caractérisation et de mesure de signaux. Cette thèse démontre un avancement de ces techniques sur plusieurs niveaux. En termes de performance, une hausse de plus d'un ordre de grandeur en ce qui concerne la vitesse est réalisable. En effet, un oscilloscope surpuce opérant à 70-GHz (vitesse d'échantillonnage effective) est reporté, comparé à 4-GHz et 10-GHz pour les implémentations de pointe actuelles. La consommation de puissance est un autre domaine où le travail proposé offre un meilleur rendement que les alternatives précédentes. Tous les circuits proposés ne requièrent que quelques milliWatts de puissance tout en performant la capture de signaux avec une résolution moyenne à très grande vitesse (quelques GHz). Finalement et principalement, tous les circuits proposés pour le test sont basés sur une forme différente de traitement de signaux: le traitement dans le domaine temporel. Il y a lieu de croire que cette approche soit un précurseur à plusieurs nouvelles techniques et méthodes de conception de circuits qui pourront être investiguées plus profondément. Cette thèse vérifie les avantages d'employer l'amplification dans le domaine temporel comme étant une partie intégrale de la nouvelle approche de traitement de signaux proposée. L'utilisation d'une telle amplification est démontrée par des résultats expérimentaux provenant de trois circuits intégrés spécifiques réalisant différentes tâches à haute-vitesse (GHz), la mesure de signaux in-situ et la caractérisation. Les avantages de cette approche de traitement de signaux basée sur le temps, quand utilisée en conjonction avec un amplificateur temporelle, inclus une plus grande immunité au bruit, l'utilisation de cellules numériques synthétisables et de circuits ayant la capacité de suivre les changements de technologie par rapport à la vitesse et l'espace requis.

# Acknowledgments

First and foremost, I would like to thank my supervisor, Professor Gordon W. Roberts, also my Master's thesis supervisor, for his continuous support and encouragement throughout this thesis, and all my graduate studies years. Your enthusiasm and passion for circuits, test, and microelectronics in general are up-lifting and contagious! Your technical guidance, financial support, and generosity on all aspects, including the many sponsored trips to IEEE conferences, were truly incomparable. For all what you taught me in the past six years, I will always be indebted.

Thank you to my thesis committee, Professors Ramesh Abhari and Roni Khazaka for providing me with the appropriate feedback whenever needed.

I would like to thank Micronet, le Fonds Québécois de Recherche sur la Nature and les Technologies (FQRNT), the Walter C. Sumner Memorial Fellowship, as well as other internal McGill scholarships for financial support, and the Canadian Microelectronics Corporation (CMC) for providing the simulation tools and fabricating the prototype ICs.

Thank you to all the MACS lab people, present and past, who made my stay at McGill an enjoyable experience. In particular, I would like to thank Mourad Oulmane. The time amplifier, a core block that the ideas presented in this thesis rely on, was developed by Mourad as part of his Ph.D. thesis. Thank you also for the interesting and stimulating technical (and non-technical!) discussions we have had. Other members of the MACS lab with whom I shared many years and interesting discussions should be mentioned: Sadok Aouini, who also provided the french translation of the abstract, Dong An, Philippe Salib,

and Chris Taillefer. A special thank you also goes to Rola Abdul-Baki and Sandra Skaff for being true friends, and for the fun breaks we took together from the research world!

Other people did not contribute to the technical aspect of this work, but provided me with the emotional stability and support that were essential to conduct the work, and deserve to be mentioned.

My parents-in-law Mahmoud and Nada deserve a special thank you for being supportive throughout the course of the work of this dissertation.

A heart-felt thank you for my parents: Riad and Fadia, brothers: Sobhi, Ahmad and Ziad, and especially sister and best friend Samar. Despite the long distances that separate us, you were a constant source of support, patience, and encouragement. I am so proud and thankful to have a family as loving and as sacrificing as you are. My brother- and sisters-in-law Khodr, Salma, and Alyssar are acknowledged for their love and encouragement.

Last but not least, to my husband Mazen Farran. Thank you for being by my side, and in particular during the long evenings and weekends while testing the chips. This is just one aspect of your contribution to this work. A bigger aspect is through your love, patience, constant support, and endless encouragement. To you I dedicate this work. Finally, to my son Dani who was the main motivation to get this dissertation completed just in time for his arrival to the world!

# **Claim of Originality**

In this thesis, the following claims of originality are reported:

✤ The broad contribution of this thesis lies in its demonstration of the feasibility of timebased signal processing for GHz-range embedded signal measurements. This processing method is demonstrated in three different applications, using three different custom-built integrated circuits in a deep submicron complementary metal-oxide semiconductor (CMOS) technology.

✤ A new method for the single-shot high-speed measurement of specific timing characteristics of digital signals is proposed and relies on the concept of voltage-crossing detection for time conversion, followed by time amplification and time digitization. This concept was applied and verified in two designs:

- A digital signal rise time measurement circuit capable of measuring signals as fast as 1 GV/s (or equivalently, 1 Volt change in 1 ns time) is proposed in Chapter 4.
- A narrow pulse measurement system is also presented in Chapter 4, capable of pulse widths measurements as narrow as 78 ps.

✤ In the case of analog signals, usual undersampling with sample-and-hold operations are adopted. However, the circuit proposed differs from the previously reported approaches in the digitization process, whereby the digitization adopted in this work relies on some sort of time-domain parallelism. The advantages are numerous in the sense that the parallel or "flash" architecture relied upon is performed in the time domain, preceded by a time amplification stage, which makes the design of the following stages less demanding and with less stringent resolution requirements. For validation purposes, the design of an undersampled 70-GHz on-chip oscilloscope with time-based digitization is presented in Chapter 3.

Section 3.5 proposes an elegant calibration technique intended for the oscilloscope circuit. Calibration is becoming of paramount importance in deep submicron technologies, where ideal or near-ideal circuits are far from existing, and where calibration schemes are used in conjunction with simple circuit components to compensate for their non-ideal behaviour. Adopting simplicity and reduced area overhead for the proposed calibration scheme was a priority in order to ensure that the benefits of the in-situ measurement circuit are not outweighed by the overhead costs of the associated calibration. In particular, only low-end digital testers are required, with minimal area overhead needed to implement a digital selection block and an analog multiplexer.

★ Experimental results constitute a large part of the work presented whereby integrated circuits (ICs) are designed, fabricated, and tested. For testing purposes, custom-made printed circuit boards (PCBs) are designed and a mixed-signal commercial type tester is programmed and used. Three ICs are implemented in a 0.18 micron CMOS technology, with world-class performance in terms of power, speed, and originality reported in Chapter 5.

The work described above was presented at numerous IEEE conferences, and resulted in a few international awards. The work on narrow pulse characterization achieved second place at the 2006 DAC/ISSCC student design contest sponsored by the solid-state circuit society (SSCS), and resulted in an invitation to attend both ISSCC and DAC with poster/oral presentations, in San Francisco, in February and July 2006, respectively. The DAC presentation resulted in an invitation to submit to the IEE Journal on Digital Computers and Test. Also the work on pulse measurement was the winner of the best graduate paper award at the 2006 IMTC, in Sorrento, Italy, in April 2006, with a prize of excellence and a competitive travel award. This award was presented by the

instrumentation and measurement (I&M) society. Finally, the work on on-chip oscilloscope was highly ranked and well-received at the 2006 CICC, also sponsored by the SSCS, which resulted in an invitation to the special issue of the Journal of Solid-State Circuits (JSSC).

In addition, the extensive literature survey of the techniques reported-to-date on designfor-test techniques presented in Chapter 2 constitutes an invited chapter contribution to an IEE book entitled: "Test and Diagnosis of Analogue and Mixed-Signal Integrated Circuits: The System on Chip Approach", Y. Sun (Editor), The Institute of Electrical Engineering, UK.

Finally, a full U.S. patent on high-speed time-based signal capture has been filed.

## **Table of Contents**

| Abstract                              | Ι                       |
|---------------------------------------|-------------------------|
| Résumé                                | III                     |
| Acknowledgments                       | V                       |
| Claim of Originality                  | VII                     |
| Table of Contents                     | X                       |
| List of Figures                       | XIV                     |
| List of Tables                        | XX                      |
| Glossary of Terms                     | XXI                     |
| Chapter 1 - Introduction              | 1                       |
| 1.1 - Motivation                      |                         |
| 1.2 - Importance of Test Integration  |                         |
| 1.3 - Scope of This Thesis            |                         |
| •                                     | iques3                  |
| _                                     | nent5                   |
| 1.3.3 - High-Speed Digital Measuren   | nent5                   |
| 1.4 - Thesis Outline                  | 6                       |
| Chapter 2 - Recent Trends in Test In  | tegration: DFT and BIST |
| Techniques                            | 8                       |
| 2.1 - Introduction                    |                         |
| 2.2 - Signal Generation               |                         |
| 2.2.1 - Direct Digital Frequency Synt | hesis14                 |
| 2.2.2 - Oscillator Based              |                         |
| 2.2.3 - Memory Based                  |                         |

| 2.2.4 -      | Extension to Multi-Tones                                              | 18  |
|--------------|-----------------------------------------------------------------------|-----|
| 2.2.5 -      | Area Overhead                                                         | 19  |
| 2.3 - Signa  | l Capture                                                             | 20  |
| 2.4 - Timir  | ng Measurements and Jitter Analyzers                                  | 25  |
| 2.4.1 -      | Single Counter                                                        | 25  |
| 2.4.2 -      | Analog-Based Interpolation Techniques: Time-to-Voltage Converter      | 26  |
| 2.4.3 -      | Digital Phase-Interpolation Techniques: Delay Line                    | 28  |
| 2.4.4 -      | Vernier Delay Line                                                    | 29  |
| 2.4.5 -      | Component Invariant VDL for Jitter Measurement                        | 31  |
| 2.4.6 -      | Analog-Based Jitter Measurement Device                                | 33  |
| 2.4.7 -      | Time Amplification                                                    | 36  |
| 2.4.8 -      | PLL and DLL – Injection Methods for PLL Tests                         | 38  |
| 2.5 - Calib  | ration Techniques for TMU and TDC                                     | 40  |
| 2.6 - Com    | plete On-Chip Test Core: Proposed Architecture in [13] and Its Versat | ile |
| Applic       | cations                                                               | 42  |
| 2.6.1 -      | Attractive and Flexible Architecture                                  | 42  |
| 2.6.2 -      | Oscilloscope/Curve Tracing                                            | 45  |
| 2.6.3 -      | Coherent Sampling                                                     | 45  |
| 2.6.4 -      | Time Domain Reflectometry/Transmission                                | 46  |
| 2.6.5 -      | Crosstalk                                                             | 46  |
| 2.6.6 -      | Supply/Substrate Noise                                                | 47  |
| 2.6.7 -      | Radio Frequency Testing - Amplifier Resonance                         | 49  |
| 2.6.8 -      | Limitations of the Proposed Architecture in [13]                      | 50  |
| 2.7 - Recei  | nt Trends                                                             | 51  |
| 2.8 - Conc   | lusions                                                               | 52  |
| Chapter 3    | - Time-Based Digitization for Analog Signals                          | 54  |
| 3.1 - Introd | duction                                                               | 55  |
| 3.2 - Syste  | m-Level Description                                                   | 59  |
| 3.2.1 -      | System Overview                                                       | 59  |
| 3.2.2 -      | Proposed Calibration                                                  | 62  |
| 3.3 - Circu  | it Details                                                            | 63  |
| 3.3.1 -      | Voltage-Controlled-Delay Cell 1: Clocking Scheme                      | 63  |
| 3.3.2 -      | Voltage-Controlled-Delay Cell 2: Signal Capture                       |     |
| 3.3.3 -      | Time Amplifier                                                        | 68  |
| 3.3.4 -      | TDC and Additional Digital Logic                                      |     |
| 3.3.5 -      | The CUT                                                               | 72  |
| 3.4 - Desig  | gn Choices: Resolution, Speed, and Area Trade-Offs                    | 73  |
| 3.4.1 -      | Effective Sampling Rate                                               | 73  |
| 3.4.2 -      | Effective Voltage Resolution                                          | 75  |
| 3.5 - Syste  | m Calibration                                                         | 77  |

| 3.6 - Test  | Time                                                         | 83  |
|-------------|--------------------------------------------------------------|-----|
| 3.6.1 -     | Signal Capture Time                                          | 83  |
| 3.6.2 -     | Calibration Time                                             | 84  |
| 3.7 - Conc  | lusions                                                      | 85  |
| Chapter 4   | - Real-Time Single-Shot Digital Measurements                 | 87  |
| 4.1 - Intro | duction                                                      |     |
| 4.2 - A Cle | oser Look at Pulse Measurement Systems                       | 89  |
| 4.2.1 -     | Classes of Pulse Measurement Circuits                        | 89  |
| 4.2.2 -     | Time-Walk Effects in Pulse Measurement Systems               | 92  |
| 4.3 - Prop  | osed Systems Description                                     | 94  |
| 4.4 - Circu | nit Details                                                  | 96  |
| 4.4.1 -     | Front-End Voltage-Crossing Detector                          | 96  |
| 4.4.2 -     | Front-End Voltage-Crossing Detector Buffers                  | 99  |
| 4.4.3 -     | Front-End Pulse-Detector Circuit                             | 100 |
| 4.4.4 -     | Front-End Pulse-Detector Operation                           | 102 |
| 4.4.5 -     | Front-End Voltage-Crossing Detector Buffers                  | 104 |
| 4.4.6 -     | Time Amplifier/TDC/Serial Shift-Out                          | 105 |
| 4.5 - Syste | ems Verification                                             | 106 |
| 4.5.1 -     | Edge Measurement System                                      | 106 |
| 4.5.2 -     | Pulse Measurement System                                     | 108 |
| 4.6 - Obse  | ervation and Comments                                        | 110 |
| 4.6.1 -     | Noise Immunity                                               | 110 |
| 4.6.2 -     | Effect of Non-Symmetry in Pulse Rise/Fall Times              | 111 |
| 4.7 - Conc  | clusions                                                     | 113 |
| Chapter 5   | - Experimental Results                                       | 115 |
| 5.1 - On-C  | Chip Oscilloscope IC Implementation and Experimental Results |     |
| 5.1.1 -     | IC and Test Setup                                            | 115 |
| 5.1.2 -     | System Calibration and Experimental Results                  | 117 |
| 5.1.3 -     | Measurement Mode                                             | 122 |
| 5.1.4 -     | Comments                                                     | 125 |
| 5.2 - Rise  | Time Measurement IC Experimental Results                     | 126 |
| 5.2.1 -     | IC and Test Setup                                            | 126 |
| 5.2.2 -     | Measured Results                                             | 128 |
| 5.2.3 -     | Comments                                                     | 129 |
| 5.3 - Pulse | e Receiver IC Implementation and Experimental Results        | 130 |
| 5.3.1 -     | IC and Test Setup                                            |     |
| 5.3.2 -     | Measured Data and Discussion of Results                      |     |
| 5.3.3 -     | Additional Comments                                          | 136 |
| 5.4 - Expe  | erimental Results Summary                                    | 137 |

| Chapter 6 - Concluding Remarks |                                |     |  |
|--------------------------------|--------------------------------|-----|--|
| 6.1 -                          | Thesis Findings Summary        | 139 |  |
| 6.2 -                          | Recommendation for Future Work | 140 |  |
| Refer                          | ences                          | 142 |  |

.

# **List of Figures**

| Figure 1.1  | System-level approach used throughout this research for arbitrary                 |
|-------------|-----------------------------------------------------------------------------------|
| wave        | eforms                                                                            |
| Figure 2.1  | Functional behavioural description: (a) digital (b) analog9                       |
| Figure 2.2  | Block diagram of the MADBIST scheme [7]11                                         |
| Figure 2.3  | Block diagram of the complete tester on-chip [13]                                 |
| Figure 2.4  | Conventional analog signal generation14                                           |
| Figure 2.5  | Digitally-driven analog signal generation based on DDFS14                         |
| Figure 2.6  | Digital resonator16                                                               |
| Figure 2.7  | Improved digital resonator with the multiplier replaced with a 1-bit              |
| mult        | iplexer                                                                           |
| Figure 2.8  | Looping back of a selected set of a $\Delta\Sigma$ modulator output bitstream     |
| Figure 2.9  | Conceptual illustration of a multi-bit signal generation                          |
| Figure 2.10 | Area overhead and partitioning of the bitstream signal generation method          |
| with        | (a) analog stimulus, (b) analog stimulus using DSP-based techniques and           |
| expli       | icit filtering operation, and (c) digital test stimulus, while relying on the DUT |
| built       | -in (implicit) filtering operation                                                |
| Figure 2.11 | Evolution of the analog signal capture capabilities, (a) without and (b) with     |
| analo       | og buffering21                                                                    |
| Figure 2.12 | Signal capture with (a) focus on the comparator and (b) showing the               |
| digit       | ization process                                                                   |
| Figure 2.13 | Illustration of the undersampling algorithm                                       |

| Figure 2.14 Illustration of the undersampling and multi-pass methods24                        |
|-----------------------------------------------------------------------------------------------|
| Figure 2.15 A simple counter method for the measurement of a time interval25                  |
| Figure 2.16 Interpolation-based time-to-digital converter [28]26                              |
| Figure 2.17 Interpolation-based time-to-digital converter insensitive to the absolute value   |
| of the capacitor, also known as the time or pulse stretcher [29]27                            |
| Figure 2.18 A delay line used to generate multi-phase clock edges. Also can be used to        |
| measure clock jitter with time resolution set by the minimum gate delay offered               |
| by the technology                                                                             |
| Figure 2.19 A Vernier delay line achieving sub-gate time resolution                           |
| Figure 2.20 A component-invariant, single-stage Vernier delay line jitter measurement         |
| device [37]32                                                                                 |
| Figure 2.21 An analog-based macro for jitter measurement [38]                                 |
| Figure 2.22 An example of the received eye diagram of high-speed link, with acceptable        |
| openings shown, both in the voltage and time scales, defining the mask. Violations            |
| of those limits are detected by the eye-opening monitor suggested in [43]35                   |
| Figure 2.23 A MUTEX time amplifier [44]                                                       |
| Figure 2.24 A single-stage time amplifier [3]                                                 |
| Figure 2.25 PLL system view                                                                   |
| Figure 2.26 Analogy between stimulating an ADC and a PLL with a $\Delta\Sigma$ bitstream, for |
| testing purposes40                                                                            |
| Figure 2.27 Architecture for an almost all-digital on-chip oscilloscope43                     |
| Figure 2.28 Emphasis on the digital-in digital-out interface of the BIST proposed44           |
| Figure 2.29 Possibility of a reduced on-chip core, and therefore, reduced area, while         |
| maintaining a digital-in digital-out interface                                                |
| Figure 2.30 Supply noise measurement block diagram [51]48                                     |
| Figure 2.31 Focused board-level RF testing                                                    |
| Figure 3.1 Signal capture using undersampling56                                               |
| Figure 3.2 On-chip multi-pass under-sampled algorithm high-level implementation.59            |
| Figure 3.3 Proposed system                                                                    |

- Figure 3.8 TDC schematics.....71
- Figure 3.9 CUT: Transmission-line structures for on-chip (far-end) crosstalk measurement. Digital circuitry (not shown) controls which and how many aggressor lines are switched on; either (1) and (2) alone, (3) and (4) alone, or all four aggressor lines switching simultaneously.
  Figure 3.10 Step 1 of the calibration scheme.

- Figure 4.2 Time-to-voltage converter schematics followed by a dual-slope ADC [72]. 90
- Figure 4.3 Synchronous versus Asynchronous signal sampling [79][80]......94

| Figure | e 4.7 | Simulatio   | n results  | showin  | ng a) th | e eff | fect of | adding  | the | capa  | citor C, | on | the |
|--------|-------|-------------|------------|---------|----------|-------|---------|---------|-----|-------|----------|----|-----|
|        | curre | nt "spike'  | ' availabl | e to di | scharge  | the   | output  | t node, | and | b) it | s effect | on | the |
|        | outpu | ut discharg | ge rate    | •••••   |          |       |         |         |     |       |          | 9  | )9  |

- Figure 4.8 Simplified schematics of the pulse-to-edge converter......101
- Figure 4.10 (a) Transfer characteristics of the front-end voltage-crossing detector, (b) its absolute error, and corresponding (c) differential linearity error......107

Figure 4.11 System input-output relationship, including the time amplifier......108

- Figure 5.5 Corresponding results from step 2 in the calibration. Results shown here with the reference voltage of the reference DVCD<sub>clk</sub> cell, V<sub>ref,clk</sub>, set to 1.1 V, and V<sub>var,clk,cal</sub> varying between 1.1 V and 0.9 V in steps of 5 mV. Displayed is the

- Figure 5.8 Experimental results (-) for different switching activity, and 10-mV increments for  $DVCD_{clk}$ . Three cases are considered: the two close, far, and all four lines are switching. In all cases, the simulation results (-.) are superimposed. Also shown is the setup used to capture the larger voltages of the crosstalk waveform.
- Figure 5.10 Edge measurement IC photograph......127

# **List of Tables**

| Theoretical performance summary of the on-chip oscilloscope | 76  |
|-------------------------------------------------------------|-----|
| Summary of the measured oscilloscope specifications 1       | 125 |

# **Glossary of Terms**

| Acronym | Definition                                  |
|---------|---------------------------------------------|
| AC      | Alternating Current                         |
| ADC     | Analog-to-Digital Converter                 |
| ATE     | Automatic Test Equipment                    |
| BER     | Bit Error Rate                              |
| BIST    | Built-In-Self-Test                          |
| CDF     | Cumulative Distribution Function            |
| CFD     | Constant Fraction Discriminator             |
| CMOS    | Complementary Metal-Oxide Semiconductor     |
| CODEC   | Coder Decoder                               |
| CUT     | Circuit Under Test                          |
| DAC     | Digital-to-Analog Converter                 |
| DC      | Direct Current                              |
| DDFS    | Direct Digital Frequency Synthesis          |
| DFF     | D-type Flip-Flop                            |
| DFT     | Design-For-Testability (or Design-For-Test) |

- DIB Device Interface Board
- DLL Delay-Lock Loop
- DSP Digital Signal Processing
- DUT Device Under Test
- DVCD Differential Voltage-Controlled Delay
- EOM Eye Opening Monitor
- FFT Fast Fourier Transform
- FS Full Scale
- FWHM Full Width Half Magnitude
- IC Integrated Circuit
- IEEE Institute of Electrical and Electronics Engineers
- IIP Input Inter-modulation Product
- IP Intellectual Property
- LSB Least-Significant Bit
- MADBIST Mixed Analog-Digital Built-In-Self-Test
- MUTEX Mutual Exclusive
- PCB Printed Circuit Board
- PDM Pulse-Density Modulated
- PFD Phase Frequency Detector
- PGA Programmable Gain Amplifier
- PLL Phase-Lock Loop
- PVT Process-Voltage-Temperature

- RF Radio Frequency
- RMS Root Mean Square
- ROM Read-Only Memory
- S/H Sample and Hold
- SAR Successive Approximation Register
- SFDR Spurious-Free Dynamic Range
- SNR Signal-to-Noise Ratio
- SoC System-on-Chip
- SOTDC Sampling-Offset Time-to-Digital Converter
- SPC Statistical Process Control
- STC Semiconductor Test Consortium
- TDC Time-to-Digital Converter
- TDT Time Domain Transmission
- TDR Time Domain Reflectometry
- T.L. Transmission Line
- TMU Time Measurement Unit
- TVC Time-to-Voltage Converter
- VCD Voltage-Controlled Delay
- VCO Voltage-Controlled Oscillator
- $\Delta\Sigma$  Delta Sigma

### **Chapter 1 - Introduction**

### 1.1 - Motivation

The continuous decrease in the cost-to-manufacture a transistor, mainly due to the exponential decrease in the CMOS technology minimum feature length, has enabled higher levels of integration, and the creation of extremely sophisticated and complex designs and systems-on-chip (SoC). This increase in packing density has been coupled with a cost-of-test per transistor that has remained fairly constant over the past two decades. In fact, the Semiconductor Industry Association (SIA) predicts that by the year 2014, testing a transistor with projected minimum length of 35 nm might cost more than manufacturing it [1].

Many reasons have contributed to a fairly flat cost-of-test per transistor over the past years. While the transistors dimensions have been shrinking, the same can not be said about the number of I/O needed. In fact, the increased packing density and operational speeds have been inevitably linked to an increased pin count. Firstly, maintaining a constant pin count-bandwidth ratio can be achieved through parallelism where pin count penalty is inevitable. Secondly, the increased power consumption implies an increased number of dedicated supply and ground pins for reliability purposes. Thirdly, the increased complexity and the multiple functionalities implemented in today's SoCs entail the need for an increased number of probing pins for debugging and testing purposes. All the above mentioned reasons, among others, have resulted in an increased test cost. Testing high-speed analog and mixed-signal designs in particular is becoming a more difficult task, and observing critical nodes in a system is becoming increasingly challenging if the test cost is to be maintained at an acceptably low level.

### **1.2** - Importance of Test Integration

As the technology keeps scaling, especially past the 90 nm technology, metal layers and packing densities are increasing with signals bandwidth and rise times extending beyond the GHz range. Viewing tools such as wafer or on-chip probing are no longer feasible since the large parasitic capacitive loading of a contacting probe would dramatically disturb the normal operation of the circuit. On the other hand, the Automatic Test Equipment (ATE) interface has become a major bottleneck to deliver signals with high fidelity, due to the significant distances the signals have to travel at such operational speeds. In addition, the ATE cost is exploding to keep up with the ability to test complex integrated SoCs. In fact, a \$20 million ATE system, capable of testing such complicated systems, was forecasted by the SIA roadmap. Embedded test techniques, benefiting from electrical proximity, area overhead scaling, and bandwidth improvements leading to atspeed testing, constitute therefore the key to an economically viable test platform. When such test solutions are placed on the chip, they become known as structural test or BIST. It is important to note that test solutions can also be placed on the board level, or as part of the requirements of the ATE. Each solution will entail verification of signal fidelity and responsibility to different people (the designer, the test engineer, or the ATE manufacturer), different calibration techniques, and different test instruments, all of which directly impact the test cost, and therefore the overall part cost to the consumer.

It will be the purpose of the next chapter to highlight some of the work and latest developments on embedded mixed-signal testing in particular, and the work that has been accomplished so far on this topic, specifically for the purpose of design validation and characterization. Nonetheless, it is important to point out that there is a lot of effort on placing more components on the board, as well as trying to combat the exploding costs of big ATE systems through low-cost ones, specifically to address the volume or production testing cost of semiconductor devices, but the discussion is beyond the scope of this thesis. Nonetheless, some of the recent trends in the testing industry will be briefly highlighted.

### 1.3 - Scope of This Thesis

In this thesis, the ability to perform high-speed mixed-signal characterization and measurement is investigated. In particular, a new class of waveform processing technique is introduced, whereby the large percentage of the signal processing, with the exception of a front-end signal-conditioning stage, is performed in the time domain. This will prove beneficial in capturing high-speed signals with little power budget, little circuit complexity, and the possibility of multi-GHz operational speed. While the motivation behind this work is to perform embedded signal capture for test applications, attempting therefore to offset the ever increasing cost of test in mixed-signal systems, the proposed digitization technique is general enough to be used in a multitude of other applications ranging from sensory to biomedical to communication systems.

#### **1.3.1** - Time-Based Processing Techniques

Analog signal processing techniques consist in general of manipulating physical quantities such as voltage, current, and charge. Correspondingly, voltage-, current-, and chargebased class of circuits exist. Voltage-mode circuits are those with voltage inputs and outputs, and the voltage operational amplifiers are the most common example in this category. Log-domain filters, for example, fall in the current-mode class of circuits, while charge-coupled devices (also known as CCDs) are the most common devices relying on charge-mode processing. In this thesis, we will focus on a new processing technique which relies mainly on time domain events to perform the signal processing.

Another goal of this dissertation is to perform the time-based signal capture while concentrating on the multi-GHz range signals, which is believed to be more of a challenge than the low-speed signal capture counterpart. To achieve the high-speed signal capture, while still relying on relatively simple circuit components, the concept of digital time amplification will be used extensively, relaxing the requirements on the following time measurement and digitization stages. With this time amplification, the information will be processed with simple synthesizable digital circuits that are 1) easy to design, 2) take little time using the technology built-in standard libraries, and 3) scale favourably in terms of area and speed with technology scaling. What is of paramount importance is the ease of calibration of those blocks, making the task of parallelism without the concern of matching a possibility or a reality. In fact, the preceding amplification stage places even less requirements on the following stages, which makes those of less concern as well. Voltage amplification preceding an analog-to-digital converter (ADC) has long been used and constitutes in fact common practice in the design industry. The advantage of such amplification stage is that it extends the overall dynamic range of the following ADC stage by adjusting the gain of the preceding amplifier, otherwise known as a variable gain amplifier. In fact, this very same concept is used internal to the ATE equipment to be able to achieve an overall system dynamic range exceeding 100 dB with an ADC resolution much lower than 100 dB.

Some work, albeit very little, has been achieved in the area where time-voltage analogy or duality is used for signal processing. Phase-lock loop (PLL) and delay-lock loop (DLL) testing have been proposed in [2] to consist of techniques that control the alignment of digital edges, and therefore phases, as opposed to the control of a voltage or amplitude quantity, in the case of voltage-based testing. In addition, the development of digital time amplification which was recently introduced in [3] will be used as an essential building block of the current dissertation's proposed systems. More on the previous work in the time-base circuits category will be presented in Section 2.4.

The general block diagram illustrating the approach adopted throughout this dissertation for the time-based high-speed signal measurement is illustrated in Figure 1.1. As briefly mentioned earlier, the information to be captured being generally a voltage signal will undergo a front-end signal-conditioning stage to convert the appropriate voltage into time. Once the information is conditioned suitably, time-based processing is performed. Two cases will be considered. The first being the general case where an arbitrary analog voltage waveform is to be captured, while the second case corresponds to a digital signal with prespecified time parameters only to be extracted.





#### 1.3.2 - High-Speed Analog Measurement

In this first case of an arbitrary analog waveform, the front-end signal conditioning block consists first of a sample-and-hold circuit used to undersample the voltage information. A voltage-to-time converter then follows to convert the voltage information into time edges. Time processing is then used to perform the digitization in a simple circuit arrangement and easily synthesizable digital cells, after a preceding time amplification stage.

#### 1.3.3 - High-Speed Digital Measurement

In the second case, the signals are assumed to be digital and too fast for a sample-and-hold in the front-end to capture the information in a timely manner and with high fidelity. In this particular case, new circuits and systems need to be devised. For that, some assumptions are made whereby only specific timing information is needed. The system design is therefore simplified, eliminating the need for a sample and hold, and performing the measurement at-speed and in real time. Two examples will be considered in this case; the rise time measurement in digital signals, and the picosecond narrow pulse measurement. In both of these cases, the systems will consist of a front-end block performing the at-speed detection of the time information to be captured, followed by the time amplifier and time digitizers blocks.

Experimental results will further verify the correct functionality of the proposed systems. In fact, very high speed operations are achieved experimentally, and from a first design spin, which further proves the simplicity of the proposed designs despite their state-of-theart performance.

### 1.4 - Thesis Outline

Chapter 1 provided a brief motivation to the work conducted in this dissertation. Namely, the time-based approach is presented in the context of its advantages for multi-GHz in-situ signal capture and characterization. Time-based techniques developed so far in the literature, albeit not numerous, are briefly mentioned. A brief motivation, and the scope of this thesis are then presented, with focus on the proposed systems and how they differ from previous approaches; and how they make use of time amplification and time-based processing is introduced.

Chapter 2 provides a detailed literature survey as to the common mixed-signal techniques that have been developed so far for DFT and BIST. A comprehensive evolution of these techniques is presented. The abundance of these techniques, and the continuous effort and research in this area shows the importance and need for continuing to develop more clever integrated test solutions.

Chapters 3 and 4 present three different examples of circuits developed and which rely on the high-speed time-based processing approach to achieve different tasks. In Chapter 3, the focus is placed on digitizing an arbitrary analog signal with an effective 70-GHz sampling rate. An on-chip oscilloscope circuit is therefore developed, and will be proven

7

capable of capturing, in-situ, high-speed signals. For validation and illustration purposes, the high-speed signals are generated on-chip in the current work using a transmission line aggressor-victim model. In Chapter 4, further simplifications are assumed on the shape of the signal, whereby the signal is assumed to be digital and that the at-speed extraction of some of its timing characteristics is desired. A rise time and a narrow pulse measurement systems are therefore proposed, capable of achieving high GHz-range digital signal measurement on-chip.

Experimental results from integrated prototype circuits are presented in Chapter 5, demonstrating the capabilities the proposed systems can achieve. Comparison between simulation and experimental results are shown with attempts to explain any discrepancy whenever appropriate.

A summary of the findings of this thesis together with concluding remarks are finally presented in Chapter 6. Methods to improve the current work, and recommendation for future work are also briefly discussed.

# Chapter 2 - Recent Trends in Test Integration: DFT and BIST Techniques

### 2.1 - Introduction

The standard test methodologies for testing digital circuits are simple and consist largely of scan chains, automatic test pattern generators and are usually used to test for catastrophic and processing/manufacturing errors. In fact, digital testing including digital BIST has become quite mature and is now cost effective [4][5]. The same can not be said about analog tests which are performed for a totally different reason: meeting the design specifications under process variations, mismatches, and device loading effects. While digital circuits are either good or bad, analog circuits are tested for their functionality within acceptable upper and lower performance limits as shown in Figure 2.1.



Figure 2.1 Functional behavioural description: (a) digital (b) analog.

They have a nominal behaviour and an uncertainty range. The acceptable uncertainty range and the error or deviation from the nominal behaviour is heavily dependant on the application. In today's high resolution systems, it could well be within 0.1% or lower. This makes the requirements extremely demanding on the precision of the test equipment and methods used to perform those tests. Added to this problem is the increased test cost when testing is performed after the integration of the component to be tested into a bigger system. As a rule of thumb, it costs ten times more to locate and repair a problem at the next stage when compared to the previous one [6]. Testing at early design stages is therefore economically beneficial. This paradigm where early on in the design stages trade-offs between functionality, performance and feasibility/ease of test are considered has come to be known as DFT.

Ultimately, one would want to reduce if not eliminate the test challenges as semiconductor devices exhibit better performance and higher level of integration. The most basic test setup for analog circuits consists of exciting the device under test (DUT), sometimes referred to as the circuit under test (CUT), with a known analog signal such as a direct current (DC), sine, ramp, or arbitrary waveform, and then extracting the output information for further analysis. Commonly, the input stimulus is periodic to allow for mathematically averaging the test results, through long observation time intervals, to reduce the effect of noise [7]. Generally, the stimulus is generated using a signal generator and the output instrument is a root-mean-square (RMS) meter which measures the amount of RMS power over a narrow but variable frequency band. A preferred test setup is the digital signal processing (DSP) based measurement for both signal generation and capture.

Most, if not all, modern test instruments rely on powerful DSP techniques for ease of automation [8] and increased accuracy and repeatability. Most mixed-signal circuits rely on the presence of some components like a digital-to-analog converter (DAC) and an ADC. In some cases, it is those components themselves that constitute the DUT. Testing converters can be achieved by gaining access to internal nodes through some analog switches (usually CMOS transmission gates). The major drawback for such method is the increased I/O pin count and the degradation due to the non-idealities in the switches, especially at high speed, even though some techniques have been proposed to correct for some of these degradation effects [9]. Nonetheless, researchers were looking to define a mixed-signal test bus standard compatible with the existing institute of electrical and electronics engineers (IEEE) 1149.1 boundary scan standard [10] to facilitate the testing of mixed-signal components. One of the earliest BIST that was devised was as a go/no-go test for an ADC [11]. The technique relies on the generation of an analog ramp signal and a digital finite state machine is used to compare the measured voltage to the expected one. A decision is then made about whether the ADC passes the test or not. While not a major drawback on the functionality of the devised BIST, the proposed test technique in [11] relies on an untested analog ramp generation which constitutes a drawback on the overall "popularity" of the method. An alternative approach would therefore be to devise signal generation schemes that can be controlled, tuned, and transferred to and from the chip easily in a digital format. Several techniques have been proposed for on-chip signal generation and will be the subject of Section 2.2. Here, it suffices to mention that with the use of delta-sigma ( $\Delta\Sigma$ ) based schemes, it is possible to overcome the drawback of the analog ramp, as was proposed by [7] in another BIST scheme, referred to as MADBIST for mixed-analog-digital BIST. The method relies on the presence of a DAC and an ADC on a single IC as is the case in a coder/decoder (CODEC), for example. Figure 2.2 illustrates such a scheme.



Figure 2.2 Block diagram of the MADBIST scheme [7].

In the MADBIST scheme, first the ADC is tested alone using a digital  $\Delta\Sigma$  based oscillator excitation. Once the ADC passes the test, the DAC is then tested using either the DSP engine or the signal generator. The analog response of the DAC is then looped back and digitized using the ADC. Once the ADC and then the DAC pass the test respectively, they can be used to characterize other circuit behaviours. In fact, this technique was used to successfully test circuits with band-pass responses as in wireless communications. In [12], MADBIST was extended to a superhetrodyne transceiver architecture by employing a band-pass  $\Delta\Sigma$  oscillator for stimulus which was then mixed down using a local oscillator and digitized using the ADC. Once tested, the DAC and transmit path are then characterized using the loop-back configuration explained above. To further extend the capabilities of on-chip testing, a complete on-chip mixed-signal tester was then proposed in [13] and is capable of a multitude of on-chip testing functions while relying on transferring the information to/from the IC core in a purely digital format. The architecture format is generic and is shown in Figure 2.3.


Figure 2.3 Block diagram of the complete tester on-chip [13].

The functional diagram is identical to that of a generic DSP-based test system. Its unified clock guarantees coherence between the generation and measurement subsystems which is important from a repeatability and reproducibility point of view, especially in a production testing environment. This architecture in particular is the simplest among all those presented above and is versatile enough to perform many testing functions as will be shown in Section 2.6. Of particular interest to the architecture proposed in [13], besides its simplicity and its digital interfacing, is its potential in achieving a more economical test platform in an SoC environment. SoC developers are moving towards integration of thirdparty intellectual properties (IPs), and embedding the various IP cores in an architecture to provide functionality and performance. The SoC developers have also the responsibility of testing each IP individually. While attractive to maintain the integration trend, the resultant test time and cost has inevitably increased as well. Parallel testing can be used to combat for such difficulty, avoiding therefore sequential testing where a significant amount of DUT, DUT interfaces, and ATE resources remain idle for a significant amount of time. However, incorporating more of the specialized analog instruments (arbitrary waveform generators and digitizers) within the same test system is one of the cost drivers for mixedsignal ATEs, placing a bound on the upper limit of parallelism that can be achieved. In fact, modest parallelism is already in use today by the industry to test devices on different wafers, using external probe cards [14]. However, reliable operation of high pin count probe is difficult, placing an upper constraint to the parallel testing, a constraint that can not seem to keep up with the integration level, and therefore, the increased IC pin count, I/O bandwidth, and the complexity and variation in the nature of the IPs integrated, the semiconductor industry has been facing. Concurrent testing, which relies on devising an optimum strategy for DUT and/or ATE resource utilization to maintain a high tester throughput can help offset some of the test time cost that is due to idle test resources. The shared-resource architecture available in today's tester equipment cannot support an onthe-fly reconfiguration of the pins, periods, timing, levels, patterns and sequencing of the ATE. On the other hand, embedded or BIST techniques can improve the degree of concurrency significantly. Embedded techniques, such as the one proposed in [13], benefit from an increased level of integration due to the very mere fact of technology scaling that allows multiple embedded test core integration in critical location. This, together with the manufacturing cost, bandwidth limitation, and area overhead, all scaling favourably with the technology evolution. This allows for parallelism and at-speed tests to be exploited to a level that could potentially track the trend in technology/manufacturing evolution. Before presenting the architecture and its measurement capability in more details, a description of some of the most important building blocks that led to the implementation of such architecture are detailed first.

# 2.2 - Signal Generation

Conventional analog signal generation rely on tuned or relaxation oscillator circuits as shown in Figure 2.4. The problem with such generation is that it is not suitable as an onchip solution, DFT or BIST technique; firstly they are sensitive to process variations since their amplitude and frequency depend on absolute component values, secondly, they are inflexible, difficult to control, and do not allow multi-tone signal generation, and finally, their quality is in big part dependent on the quality factor, Q, of the reactive components unless piezoelectric crystals are used.



Figure 2.4 Conventional analog signal generation.

## 2.2.1 - Direct Digital Frequency Synthesis

An early signal generation method that is more robust and flexible is known as the direct digital frequency synthesis (DDFS) method [15] whereby a digital bitstream is first numerically created and then converted to analog form using a DAC followed by a filtering operation. One such form is shown in Figure 2.5.



Figure 2.5 Digitally-driven analog signal generation based on DDFS.

The read-only-memory (ROM) can store up to D-bit accuracy and can have up to 2W words recorded. The phase accumulator enables the user to scan the ROM (digitally) with different increments changing therefore the resultant sine wave frequency,  $f_{out}$ , according to

$$f_{out} = M \cdot \frac{f_s}{2^W}, \qquad (2.1)$$

where W is the number of bits at the output of the phase accumulator, M is the number of complete sine wave cycles, and  $f_s$  is the sampling frequency. The amplitude precision that is a function of D, the ROM word width, is then given according to

$$\Delta A_{\text{DDFS}} = \frac{A_{\text{max}}}{2^{D+1}}.$$
(2.2)

The above method requires the use of a DAC which needs to be tested and characterized if it is to be used in a BIST. The number of bits required from the DAC is dictated by the resolution required for the analog stimulus, which is often multi-bit. This in turn entails a large silicon area, sophisticated design, and increased test time, all of which are not desirable.

### 2.2.2 - Oscillator Based

An alternative approach to generating digital sine waves is through the use of a digital resonator circuit [16] that simulates the inductor-capacitor (LC) resonator and is shown in Figure 2.6. The two integrators in a loop with the multiplier cause the system to oscillate. The frequency, amplitude, and phase of the sine wave can be arbitrary. The tuning is achieved through setting the initial condition of the registers and varying the coefficient k. The digital output is then converted to an analog form using a DAC, and typically, a  $\Delta\Sigma$  DAC where a single-bit digital output can be encoded into an infinite-precision signal using pulse density modulated digital patterns.



Figure 2.6 Digital resonator.

The major drawback to the digital resonator method is the need for a multi-bit multiplier which consumes a lot of power and silicon area and can limit the frequency range of operation. An implementation that gets around this problem is to replace the multi-bit multiplier with a single-bit multiplier/multiplexer [17]. This architecture is shown in Figure 2.7. Note that the DAC can be implemented as a high-order  $\Delta\Sigma$  modulator, giving a much higher signal resolution while maintaining a 1-bit multiplexer. As mentioned earlier, this advantage is the result of the inherent property of  $\Delta\Sigma$  modulation where a 1-bit digital signal is a pulse density modulated version of an analog signal with near infinite precision. The previously proposed architecture was further used in a BIST application in [18]. Modifications to the basic architecture were then added transforming the oscillator into a multi-tone generator [19]. Arbitrary precision signals were then demonstrated in [20]. The extension to high-frequency signals was then performed in [21] with the use of band-pass oscillators. While good candidates for on-chip signal generation, they suffer from some drawbacks such as the need for a cascade of adders, slowing down the speed of operation. In some cases, an increased level of design difficulty might arise and limit the range of application. The solution lies in memory-based generation.



Figure 2.7 Improved digital resonator with the multiplier replaced with a 1-bit multiplexer.

#### 2.2.3 - Memory Based

The idea of generating a digital bitstream was then extended into the purpose of BIST by reproducing it and periodically repeating it as shown in Figure 2.8. As little as 100 bits could be enough for a good accuracy which significantly reduces the hardware required [22]. The idea is to record a portion of the bitstream and reproduce it periodically by looping it back. The creation of the original bitstream is usually done according to a preselected noise transfer function (given a required resolution), which is then mapped into a software implemented modulator. The parameters representing the input frequency, the number of complete cycles of the input and the total number of samples, also referred to as f<sub>in</sub>, M, and N, are then chosen according to the coherency requirement [23]. N is chosen given a certain maximum memory length. The bitstream is then generated according to a set of criteria such as signal-to-noise ratio (SNR), dynamic range, amplitude precision, etc. The practicality of choosing the appropriate bitstream using the minimum hardware needed while maintaining a required resolution in terms of amplitude, phase, and spurious-free dynamic range (SFDR) were analyzed in details in [24]. Little changes in the bitstream could lead to as much changes as 10-40 dB in the quality or resolution of the signal. As a result, an optimization can be run to achieve the best resolution for a given number of bits and a given hardware availability.



Figure 2.8 Looping back of a selected set of a  $\Delta\Sigma$  modulator output bitstream.

### 2.2.4 - Extension to Multi-Tones

Multi-tone signal generation becomes particularly important for characterizing such blocks as filters. They can reduce test time by stimulating the DUT only once with a multitude of tones, and then relying on DSP-techniques such as the FFT algorithm to extract the magnitude and phase responses at each individual frequency. Another important application of multi-tone signals is in the testing of inter-modulation distortion. This is particularly important in RF testing where a measure such as the third-order input inter-modulation product (IIP3) requires the use of a minimum of two tones. The repeatability and accuracy of the results is usually at its best if coherency, also known as the M/N sampling principle, is maintained, as it is under this condition that maximum frequency resolution per bin is obtained. Multi-tone signal generation is conceptually illustrated in Figure 2.9, where a multi-bit adder and a multi-bit DAC are needed for analog signal reconstruction, increasing therefore the hardware complexity. However, the  $\Delta\Sigma$  bitstream signal generation method presented in the previous sub-section is readily extendible to the multi-tone case by simply storing a new sequence of bits in the ROM, with the new bits now corresponding to a software generated multi-tone rather than a single-tone signal. No additional hardware (such as multi-bit adders and DACs for analog signal reconstruction) is needed. This is another added testimony to the advantages of this signal generation method for BIST.



Figure 2.9 Conceptual illustration of a multi-bit signal generation.

## 2.2.5 - Area Overhead

An important criterion in any BIST solution is the area overhead it entails. While it is argued that the area occupied by the test structure benefits from technology scaling, especially in the case of digital implementations, it is always desired to minimize the silicon space, and therefore cost, occupied by the test circuit. The memory-based signal generation scheme presented in Section 2.2.3, and which was seen to improve the test stimulus generation capabilities from a repeatability point of view when compared to its analog-based stimulus counterpart, can be improved even further. Commonly, the device under test has a front-end low-pass filter; an example would be an ADC with a preceding anti-aliasing filter. In this case, the analog-filter that follows the memory-based bitstream can be removed altogether, while relying on the built-in filtering operation of the CUT [18]. This concept is illustrated graphically in Figure 2.10. Later, it will be seen how this same area-savings concept can be applied to the testing of phase-locked loops.



Figure 2.10 Area overhead and partitioning of the bitstream signal generation method with (a) analog stimulus, (b) analog stimulus using DSP-based techniques and explicit filtering operation, and (c) digital test stimulus, while relying on the DUT built-in (implicit) filtering operation.

# 2.3 - Signal Capture

Testing in general comprises of first sending a known stimulus and then capturing the resultant waveform of the CUT for further analysis. As discussed previously, the interface to/from the CUT is preferably in digital form to ease the transfer of information. The previous sections discussed the reliable generation of on-chip test stimulus. Signal generation constitutes just one aspect of testing analog and mixed-signal circuits. This section discusses the other aspect of testing; the analog signal capture. The signal capture of on-chip analog waveforms underwent an evolution. First, a simple analog bus was used to transport this information directly off-chip through analog pads [25]. Later, an analog buffer was included on-chip to efficiently drive the pads and interconnect paths external to the chip. This evolution is illustrated graphically in Figure 2.11.



Figure 2.11 Evolution of the analog signal capture capabilities, (a) without and (b) with analog buffering.

In both cases above, the information is exported off-chip in analog form and is then digitized using external equipment. Perhaps a better way to export analog information is by digitizing it first. This led to the modification shown in Figure 2.12(a), whereby the analog buffer is replaced with a 1-bit digitizer, or a simple comparator. Here, too, the digitization is achieved externally, shown in Figure 2.12(b) in one possible implementation using the successive approximation register (SAR), and with external reference voltages feeding the comparator, usually and commonly generated using an external DAC.

21



Figure 2.12 Signal capture with (a) focus on the comparator and (b) showing the digitization process.

Another important evolution to the front-end sampling process is the use of undersampling. This becomes essential when the analog waveform to be captured is high speed. In general, capturing an analog signal comprises of first sampling and holding the analog information, and then converting the resulting analog signal into a digital representation using an ADC. There exist many classes of ADCs each suitable for a given application. Whatever the class of choice might be, the front-end sampling in ADCs have to obey the Nyquist criterion; that is the sampling of the information having a bandwidth BW has to be done at a high enough sampling rate,  $f_s$ , where  $f_s \ge 2 \cdot BW$ . As the input information occupies higher bandwidth, the sampling rate has to correspondingly increase making the design of the ADC equivalently harder, as well as more area and power consuming. Instead, testing applications make use of an important property of the signal to be captured, and that is its periodicity. Any signal that needs to be captured can be made periodic by repeating the triggering of the event that causes such an output signal to exist using an externally generated, accurate, and arbitrarily slow clock with multiple phases. Each time the external clock is triggered, a new phase (or clock edge) is used. This periodicity feature in the signal to be captured and incremental delay in the external trigger give rise to an interesting capture method known as the undersampling, illustrated in Figure 2.13.



Figure 2.13 Illustration of the undersampling algorithm.

For that, a slowly running clock (slower than the minimum required by the Nyquist criterion) is used to capture a waveform, such that the clock frequency is slightly offseted with respect to the input signal period. That is, if the clock period is  $T+\Delta T$  (with  $\Delta T \ll T$ ) and the input signal period is T, then the signal, using a multi-pass approach, can be captured with an effective resolution of  $\Delta T$ . This method has been demonstrated to be an efficient way of capturing high frequency and broadband signals where the input information bandwidth can be brought down in frequency, making the transport of this information off-chip easier and less challenging, as was first demonstrated in the implementation of the integrated on-chip sampler in [26]. In order to make the digitization

also included on-chip, a multi-pass approach was first introduced in [27] whereby the undersampling approach is still maintained in the front-end sample-and-hold stage, and then further demonstrated and improved in [13] with the inclusion of the comparator and reference level generator on-chip. The top-level diagram of the circuit that performs such a function with the corresponding timing and voltage diagram are shown in Figure 2.14, and operates as described next.



Figure 2.14 Illustration of the undersampling and multi-pass methods.

The programmable reference is first used to generate one DC level. The sampled-and-held voltage of the CUT is then compared to this reference level and quantized using a 1-bit ADC (or simply a comparator). The next run through, the DC reference voltage is maintained constant and the clock edge for the sampling operation is moved by  $\Delta T$ . The new sampled-and-held information of the CUT is then compared to the same reference voltage. This sequence is then repeated until a complete cycle of the input to be captured is covered. Once this is done, the programmable reference voltage is then incremented to the next step, one least-significant-bit (LSB) away from the previous reference level and the whole previous cycle of incrementing  $\Delta T$  is then repeated. The above is then repeated until all DC reference voltages are covered. This is referred to as the multi-pass approach. This implies that a time resolution of  $\Delta T$  and a voltage resolution of LSB can be achieved in the time and voltage domains, respectively. Undersampling, together with a multi-pass approach, combined with an embedded and reliable DC signal generation scheme now constitute a complete on-chip oscilloscope tool.

# 2.4 - Timing Measurements and Jitter Analyzers

An important feature in BIST techniques is the ability to measure fine time intervals for a multitude of purposes. Most high-speed mixed-signal capture systems today rely on undersampling. Undersampling allows the capture of high frequency narrowband signals by shifting the signal components into a much lower frequency range (or bandwidth) which is then easily digitized with low speed components. This was detailed in Section 2.3. The achieved results are in large a function of the resolution and accuracy attained by the time intervals. For that, many circuits capable of generating fine delays, and the corresponding circuits that allow for on-chip characterization of such circuit components are summarized in this section.

## 2.4.1 - Single Counter

The simplest form of time measurement between two edges is through the use of a single counter triggered with a fast running clock as shown in Figure 2.15. The use of an N-bit register at the output acts as an N-bit counter. The number (or count) of clock edges that elapsed between two data events (in Figure 2.15, the events are the rising edges of the Start and Stop signals) are computed using an N-bit counter. The output count corresponds to a digital representation of the time interval,  $\Delta T$ .





The resolution attained by this method is largely dependent on the clock frequency with respect to the time difference (or data) to be measured. The higher the frequency of the clock, the better the counter accuracy and the overall count resolution is. As the technology is shrinking, the time differences to be measured are decreasing. Intervals on the order of, or even more than the fastest clock periods that can be generated sometimes need to be measured. On the other hand, the task of generating clocks much faster than the data to be measured is becoming a much more difficult task, and in some cases, not feasible. As a result, better approaches are needed and are highlighted next.

## 2.4.2 - Analog-Based Interpolation Techniques: Time-to-Voltage Converter

One of the most basic building blocks in time measurement is the time-to-voltage converter, also known as the interpolation-based TDC. The idea behind such a circuit is to convert the time difference between the edges to be measured into complementary pulses using appropriate digital logic. The pulse width is then integrated on a capacitor, C, as shown in Figure 2.16 [28].



Figure 2.16 Interpolation-based time-to-digital converter [28].

The ramp final DC value (more accurately, the step size on the capacitor) is directly related to the pulse width. Using a high resolution ADC, this DC value can then be

digitized performing therefore a digitization of the time difference, or data to be measured. The disadvantage of the above method is that it relies on the absolute value of the capacitor, C. It also requires the design of a good ADC. While the ADC is required to digitize DC levels only, making its design task slightly easier than high-frequency ADCs, nonetheless, this ADC can be power hungry and its design could be a tedious and time consuming task. A better approach that is insensitive to the absolute value of the capacitor relies on the concept of charging and then discharging the same capacitor by currents that are scaled versions of each other. The system [29] is shown in Figure 2.17, and accomplishes two advantages: 1) it does not rely on the actual capacitor value since the capacitor is now only used as a mean of storing charge and then discharging it at a slower rate, and 2) it performs pulse stretching which will make the original pulse to be measured much larger, making the task of quantizing it a lot easier.



Figure 2.17 Interpolation-based time-to-digital converter insensitive to the absolute value of the capacitor, also known as the time or pulse stretcher [29].

In this case, a single threshold comparator (1-bit ADC) can be used to detect the threshold crossing times. A relatively low-resolution time measurement unit can then be used to digitize the time difference. The time measurement unit (TMU) can be a simple counter, as explained earlier in Section 2.4.1, or one of the other potential TMUs that will be

discussed next. The techniques in [28] and [29] can become power hungry if a narrow pulse is to be measured. Trade-offs exists in the choice of the biasing current,  $I_B$ , and the bit-resolution of the ADC (for a given integration capacitor, C); the larger  $I_B$  the lower the ADC resolution required. However, as the pulse width decreases, and in order to maintain the same resolution requirement on the ADC while using the same capacitor, C, the biasing current and therefore the power dissipation has to increase. In fact, for very small pulse width, the differential pair might even fail to respond fast enough to the changes in the pulse. For that, digital phase interpolation techniques offer an alternative to the analog based interpolation schemes. More on the analog pulse measurement technique and the trade-offs involved in their design will be discussed when a new pulse measurement system is proposed in Chapter 4.

#### 2.4.3 - Digital Phase-Interpolation Techniques: Delay Line

Through the use of a chain of delays that will delay the clock and/or data as it is propagating down the chain, generation [30] and measurement of fine delays can be achieved. With the use of an edge-triggered D-type flip-flop (DFF), delayed clock or data edges can be obtained. This, unlike the analog techniques that rely on an ADC, are known as sampling-phase-time-measurement units, and fall more in the category of digital-time measurement techniques. The operation of such TDC is analogous to a flash ADC, where the analog quantity to be converted into a digital word is a time interval. They operate by comparing a signal edge to various reference edges all displaced in time. Typically, those devices measure the time difference between two edges, often denoted as the Start and Stop edges. The Start signal usually initiates the measurement while the Stop edge terminates it. Given that the delay through each stage is known apriori (which will require a calibration step), the final state of the delay lines can be read through a set of DFF and which is directly related to the time interval to be measured. Usually the use of such delay lines has a limited time dynamic range. Some TDCs employ some time range extension techniques which rely on counters for a coarse time measurement and the delay lines for fine intervals digitization. This is identical to the coarse/fine quantizers in ADCs. Other techniques include pulse stretching [31], pulse shrinking, and time interpolation. The use

of the above devices extends to applications such as laser ranging and high-energy physics experiments. With the addition of a counter at the output, this simple circuit can be used to measure the accumulated jitter of a data signal (DATA) with respect to a master clock (CLK), as shown in Figure 2.18. The above circuit can be used for time resolutions down to a gate delay offered by the technology they are implemented in. To overcome the above limitation, a Vernier delay line (VDL) can be used.



Figure 2.18 A delay line used to generate multi-phase clock edges. Also can be used to measure clock jitter with time resolution set by the minimum gate delay offered by the technology.

## 2.4.4 - Vernier Delay Line

In a VDL, both the data to be digitized or analyzed, as well as the clock signal are delayed with two slightly offseted delays as shown in Figure 2.19.



Figure 2.19 A Vernier delay line achieving sub-gate time resolution.

Using this arrangement, time resolution as small as  $\tau_{res} = (\tau_2 - \tau_1)$  can be achieved, provided that  $\tau_2 > \tau_1$  (sometimes also referred to as  $\tau_s$  and  $\tau_f$  for slow and fast respectively). In this case, and having a total of N delay stages, the time range that can be captured is given by  $\tau_{range} = N \cdot (\tau_2 - \tau_1)$ . Usually those delays can be implemented using identical gates which are purposely slightly mismatched. A few picoseconds timing resolution can be achieved in this method, equivalent to a deep sub-gate delay sampling resolution. VDL samplers have previously been used to perform time-interval measurements [32] and data recovery [33]. When Vernier samplers are used, data is latched at different moments in time, leading to synchronization issues that must be considered when interfacing the block with other units. Read-out structures exist though, allowing for continuous operation and synchronization of the outcoming data [34]. For the purpose of jitter measurement, this synchronization block is not needed. The circuit was indeed used for jitter measurement and implemented [35] in a standard 0.35 µm CMOS technology, achieving a jitter measurement resolution of  $\tau_{res} = 18$  ps. The RMS jitter was measured to be 27 ps, and the peak-peak jitter was 324 ps. For jitter measurements, the same circuit can be configured with the addition of the appropriate counters, as shown in Figure 2.19. Note that in general, those delay stages are voltage controlled to allow for

tuning ranges, and more often, in a negative feedback arrangement, known as a DLL that is a lot more robust to noise and jitter due to the feedback nature of the implementation. For that it is worth mentioning that, almost exclusively, DLLs are now relying on a linear voltage-controlled delay (VCD) cell introduced by Maneatis [36]. The linear aspect of the cell stems from the use of a diode connected load in parallel with the "traditional load" resulting in an extension to the linearity range of the delay cell. The biasing of those cells are also made more robust to supply noise and variations due to the use of a uniform biasing circuit to generate both the N- and P-sides biasing. The same biasing is also used for all blocks where variations affecting one will affect the other in a uniform manner.

### 2.4.5 - Component Invariant VDL for Jitter Measurement

The disadvantages of the previously proposed VDL, namely the increased number of stages for large time dynamic ranges, the matching requirements between the many stages, and the area and power dissipation overheads can be overcome with the use of a single-stage component-invariant VDL [37]. The proposed system is shown in Figure 2.20.



Figure 2.20 A component-invariant, single-stage Vernier delay line jitter measurement device [37].

The single stage consists of two triggered delay elements, one triggered by the data and the other by the reference clock. The counter acts as a phase detector. This method was indeed implemented in a 0.18  $\mu$ m CMOS technology, and using standard cells only which facilitates the design task even further. The area occupied by a single stage of the VDL is 0.12 mm<sup>2</sup>, which is at least an order of magnitude less in area overhead when compared to other methods. The measured resolution [37] in this circuit was 19 ps. The test time was ~ 150 ns/sample, for a clock running at 6.66 MHz. Note that the inverters in the feedback loop were implemented as voltage-controlled delay cells to allow for calibration and tuning.

## 2.4.6 - Analog-Based Jitter Measurement Device

The delay-line structures that were presented in the previous sections can be referred to as digital-type jitter measurement devices. Recently, an analog-type macro, shown in Figure 2.21, has been introduced and acts as an on-chip jitter spectrum analyzer [38].



Figure 2.21 An analog-based macro for jitter measurement [38].

The basic idea is to convert the time difference between the edges of a reference clock and the jittery clock or signal to be measured into analog form using a phase-frequency detector (PFD) followed by a charge pump. The voltage stored on the capacitor, and representing the time information of interest is then digitized using an ADC. The speed of the proposed macro in [38] is limited by that of the ADC, as well as the ability of the PFD to resolve small time differences. A calibration step is usually required in such a design to remove any effects of process-voltage-temperature (PVT) variations. A jitter range of  $\pm 200$  ps in a 0.18 µm CMOS technology was demonstrated with a sensitivity of 3.2 mV/ps. A similar idea was presented in [39] but at the board level in order to provide a production-type testing of the accumulated-over-many periods timing jitter, and was applied to the board-level testing of data transceivers. No external jitter-free clock is needed as a reference, which makes the implementation more attractive. The clock to be measured is delayed and both the jittery signal and its delayed version are then used to control a PFD-charge pump-ADC combination where jitter is then digitized using an ADC. The comparators in the ADC were implemented using a chain of inverters that were sized according to different switching thresholds, acting therefore as some sort of multi-bit digitization. The system was also used as a BIST technique to measure the jitter [40] and experimentally verified in [41]. The measured jitter, accumulated over eight periods, on a 1 GHz clock was successfully tested and evaluated at 30 - 50 ps. The performance was then slightly improved in a more recent design with detailed experimental results presented in [42].

Later, an embedded eye-opening monitor (EOM) was successfully implemented in [43]. The purpose of such monitor is to continuously compare the horizontal and vertical openings of the eye diagram of an incoming digital high-speed link, as illustrated conceptually in Figure 2.22.





The horizontal measure gives information about the amount of time jitter present in the system, while the vertical one is related to the amplitude jitter of the system. Given some pre-specified acceptable voltage and phase (time) limits set by the application under consideration, that could be fed to the embedded EOM solution, a pass or fail is then generated by the system. The accumulated count of fail, also related to the bit-error-rate (BER) of the system, can be fed to an equalizer that is usually used to adaptively compensate, in a feedback mechanism, for the digital signal degradation. The circuit was experimentally verified to successfully test for eye-opening degradation of digital signals running between 1 Gb/s and up to 12.5 Gb/s, from a single 1.2 V supply.

All this rush of recent papers with embedded solutions for signal integrity measures, as was highlighted in this section for two such measures, the jitter and eye diagram measurements techniques, is a testimony to the pressing need for such ideas and techniques that would otherwise make some state-of-the-art electronics un-testable. While the current section highlighted some of the most common techniques used for jitter measurements, the next section will highlight some new ideas that have the potential of being applied to jitter measurement applications, among many others.

## 2.4.7 - Time Amplification

Perhaps of most relevance to the research conducted in this dissertation is this section on time amplification. All the system-level ideas presented in later chapters rely heavily on the time amplification concept. This section therefore deserves special attention.

Analogous to voltage amplification in ADCs where a front-end programmable gain amplifier (PGA) can be used to extend the voltage dynamic range of the measurements, time amplification has recently emerged as a way to amplify the time difference between two events. The principle of time amplification involves comparing the phase of two inputs and then producing two outputs that differ in time by a multiple of the input phase difference. Two techniques have been proposed for the purpose of time amplification. The first [44] is based on the mutual exclusive (MUTEX) circuit shown in Figure 2.23.



Figure 2.23 A MUTEX time amplifier [44].

The cross-coupled NAND gates form a bistable circuit while the output transistors switch only when the difference in voltage between nodes  $V_1$  and  $V_2$ , say  $\Delta V$ , reach a certain value. The OR gate at the output is used to detect this switching action. Time amplification occurs when the input time difference is small enough to cause the bistable to exhibit metastability. The voltage difference  $\Delta V$ , is then given by

$$\Delta V = \theta \cdot \Delta t \cdot e^{t/\tau}, \qquad (2.1)$$

where  $\theta$  is the conversion factor from time to initial voltage at nodes V<sub>1</sub> and V<sub>2</sub>,  $\Delta t$  is the time difference between the rising edges of the signal and reference, and  $\tau$  is the device time constant. By measuring the time t between the moment that the inputs switch to when the OR gate switches,  $\Delta t$  can be found. The previously proposed circuit is compact area wise, but its use is limited for only a few picoseconds in input time range. The gain also is in the single digit. Cascading might get around the latter problem.

A second method proposed for time amplification [3] is shown in Figure 2.24.



Figure 2.24 A single-stage time amplifier [3].

The circuit consists of two cross-coupled differential pairs with passive RC loads attached. Upon arrival of the rising edges of In1 and In2, the amplifier bias current is steered around the differential pairs and into the passive loads. This causes the voltage at the drains of transistors M1 and M2 to be equal at a certain time, and that of M3 and M4 to coincide a short time later. This effectively produces a time interval proportional to the input time difference which can then be detected by a voltage comparator. The second time amplification method proposed in [3], while more area and power consuming works for very large input ranges extending therefore the input time dynamic range. Its gain can also be at least an order of magnitude higher, using a single stage only. The circuit has been built in a 0.18  $\mu$ m CMOS technology, and was experimentally verified to achieve a gain of 200 s/s for an input range of 5 ps – 300 ps, giving therefore an output time difference of 1 ns – 60 ns. Time amplification, when thought of as analogous to the use of programmable gain amplifiers in ADCs, is the perfect block to precede a TDC. The reason being that with a front-end time amplification stage, a low-resolution TDC can be used to get an overall high resolution time measurement unit. This block constitutes an important building block for the proposed systems of this research. Some of its additional features will therefore be discussed in Chapter 3 and Chapter 4.

### 2.4.8 - PLL and DLL – Injection Methods for PLL Tests

Of indirect relevance to the research conducted in this thesis is this section. While the focus of the current work is on embedded analog signal measurements, and not necessarily PLL testing, the method briefly discussed in this section is one of the earliest reported techniques that realized the duality or connection between voltage and time processing, and deserves special mention.

PLLs and DLLs are essential blocks in communication devices. They are used for clock de-skewing in clock distribution networks, clock synchronization on-chip, clock multiplications, etc. Those blocks are mainly tested for their ability to lock or track the reference clock fast (hence the tracking or locking time characteristic), and characterised in terms of phase or jitter noise which is of paramount importance in today's SoCs. An embedded technique for the measurement of the jitter transfer function of a PLL was suggested in [2]. The technique relies on one of three methods where the PLL is excited by a controlled amount of phase jitter from which the loop dynamics can be measured. These techniques, shown in Figure 2.25, include: phase modulating the input reference,  $\Phi_i$ , sinusoidal injection (using a  $\Delta\Sigma$  bitstream, or a Pulse-Density-Modulated, PDM, representation of the input signal) at the input of the low-pass filter, or varying the divide-by-N counter between N and N+1.



Figure 2.25 PLL system view.

All three techniques have been verified experimentally and tested on commercial PLLs, allowing one to easily implement such testing techniques for the on-chip characterization of jitter in PLLs. Given the proposed PLL testing method technique presented in [2], and in particular, the phase modulation technique, it is beneficial to draw some analogies with the voltage measurement or stimulating schemes presented earlier in Section 2.2.5. A pulse-density modulated signal is injected into the PLL. Due to the inherent low-pass filter present in PLLs, the testing or stimulating of such systems, similar to the testing of ADCs, can be achieved in a purely digital manner without the need for an additional low-pass filter. Silicon area savings and reduced circuit complexity can be achieved, and is an added bonus of the proposed PLL BIST. So stimulating the PLL is, here too, done using only a digital interface [45]. Another analogy can be drawn with respect to the voltage domain testing. While in the analog stimulus generation, it is the amplitude that is modulated, in the case of a PLL, it is the phases or clock edges that are controlled, as shown in Figure 2.26.



Figure 2.26 Analogy between stimulating an ADC and a PLL with a  $\Delta\Sigma$  bitstream, for testing purposes.

# 2.5 - Calibration Techniques for TMU and TDC

Calibration is an important procedure that measurement instruments, whether built-in or external to the IC to be tested, should undergo before use. Calibration is usually carried out by exciting the instrument to be calibrated with a series of known input signals and then correlating the output to the corresponding input each time. In the special case of time measurements circuits, the inputs normally consist of a series of edges with known time intervals. However, as the desired calibration resolution becomes smaller than a few picoseconds, such a task becomes more difficult; on-chip, mismatches and jitter put a lower bound on reliable achievable timing generators, while off-chip, edge and pulse generators can produce such intervals accurately at additional costs. Calibration methods and their associated trade-offs are therefore important. Here, we will restrict the discussion to the calibration of time measurement instruments, and in particular, to the flip-flop calibration of what is known as the sampling-offset TDC, or SOTDC for short [46]. A SOTDC is a type of flash converter that relies solely on flip-flop transistor mismatch, instead of separate delay buffers, to obtain fine temporal resolution. While a rather specific

type of TDC, it is probably one of the more challenging types to calibrate due to the very fine temporal resolutions that this TDC can achieve, making therefore the task of measuring and calibrating such small time differences a difficult task. In fact, it was shown in [47] that mismatches due to process variation can produce temporal offsets from 30 ps down to 2 ps, depending on the implementation technology and architecture chosen for the flip-flop. Those flip flops need therefore to be calibrated first before they can be used as time measurement units. In [47], an indirect calibration technique was proposed, and involves the use of two uncorrelated signals (practically, two square waves running at slightly offseted frequencies) to find the relative offsets of the flip-flops used in the SOTDC. Finding the absolute values of the offset, which is statistically referred to as the mean of a distribution of offsets requires a direct calibration technique. This technique was introduced in [48]. It involves sending two edges to the flip-flop to be calibrated, with time difference,  $\Delta T$ , tightly controlled, and repeating the measurements many times to get a (normal or Gaussian) distribution.  $\Delta T$  is then changed and the same experiment is repeated. A counter or accumulator is then used to find the cumulative distribution function (CDF) of the distributions. The point on the CDF that corresponds to a probability of exactly 50% is the mean of the distribution which is the estimated absolute offset of the flip-flop. While experimentally verified, an improved calibration scheme was then developed in [48] to get around the problem of having to tightly generate  $\Delta T$  (which is more often done off-chip for resolution purposes at the expense of increased cost as discussed earlier). The basic idea involves intentionally "blowing up" the noise distribution by externally injecting temporal noise into the flip-flop with a standard deviation an order of magnitude (or even more) bigger than the offset standard deviation that needs to be measured. The standard deviation proper to the flip-flop alone will be somewhat lost with the new distribution, but the mean value becomes much easier to measure as the need for generating fine  $\Delta Ts$  is eliminated. With this new proposed method, temporal offsets on the order of  $\sim 10$  ps were successfully measured in a prototype implemented in a 0.18 µm CMOS technology.

# 2.6 - Complete On-Chip Test Core: Proposed Architecture in [13] and Its Versatile Applications

Some of the BIST techniques that were highlighted in previous sections have been incorporated in a single system that was used to perform a full set of tests, emulating therefore the function of a mixed-signal tester on chip. The advantages of such proposed system in [13] include: a full digital input/output access, a coherent system for signal generation and capture, a fully programmable DC and alternating current (AC) systems, and a single comparator or 1-bit ADC, which with an on-chip DLL can perform a multipass digitization. The proposed system [13] was shown earlier in Figure 2.3. This section is dedicated to show some of it versatile applications that were indeed built, tested and characterized.

## 2.6.1 - Attractive and Flexible Architecture

Perhaps of most significance, from a BIST point of view, are two important aspects that the architecture in [13] offers. First, its almost all-digital implementation makes it very attractive from a future scaling perspective whereby the occupied area overhead is expected to decrease with newer CMOS technologies. As shown in Figure 2.27, with the exception of a crude low-pass filter for DC generation, an analog filter for AC generation, two sample and hold (S/H) circuits, and a comparator, the remaining consists of an all-digital implementation.



Figure 2.27 Architecture for an almost all-digital on-chip oscilloscope.

Notice that theoretically, only one S/H is needed, and that is at the output of the CUT where the information is analog and might be varying. However, for practical reasons, and more specifically, in order to combat capacitor charge leakage which occurs when the charge is held for a long time, an identical S/H is placed on the other terminal of the comparator [27], namely where the DC voltage is fed. This provides symmetry and therefore identical charge loss at both comparison terminals. Another very important aspect of the proposed architecture is its digital-in digital-out scan capabilities, shown in Figure 2.28. This is particularly important whereby a digital interface, both at the input and output terminals, is a lot more immune to noise and signal degradation caused by the interconnect paths. Last but not least, its flexibility from an area overhead perspective is what adds to its BIST value. As highlighted in Figure 2.29, the proposed test core can be greatly reduced if the area is of paramount importance.



Figure 2.28 Emphasis on the digital-in digital-out interface of the BIST proposed.



Figure 2.29 Possibility of a reduced on-chip core, and therefore, reduced area, while maintaining a digital-in digital-out interface.

The AC and DC memory scan-chains can be off-chip using external equipment. Similarly, the memory that holds the digital logic and performs the back-end DSP capabilities can also be external. Both previously mentioned tasks can be achieved while still maintaining digital in and out interface. In this case, the abbreviated mixed-signal test core consists of simple digital buffers (to restore the rise and fall times of the digital bitstream), the crude low-order low-pass filter, and the single comparator performing the digitization in a multipass approach.

## 2.6.2 - Oscilloscope/Curve Tracing

The system was first checked for its ability to perform signal generation, as well as signal capture. A fully digital memory-based DC and AC signal generation systems are incorporated. The programming of the memory is achieved with a routine, optimized using software. The memories are then loaded with the bitstream, through a global clock. With appropriate on-chip low-pass filtering, and using the DLL which controls the sample-and-hold clock (all of which are generated using the same global clock) and a 1-bit comparator, a multi-pass algorithm allows the capture of the generated signals. The digitized version of the output is then exported for further analysis. Experimental results from DC curve tracing showed a linearity of 10 bits in a 0.35  $\mu$ m CMOS technology, for an effective capture rate of 4 GHz, corresponding to a time resolution of 250 ps. Single and multi-tone generation have also been demonstrated in the same technology, as well as 0.25  $\mu$ m and 0.18  $\mu$ m CMOS technologies. Spectral purity as good as 65 dB at 500 kHz and 40 dB at 0.5 GHz have been achieved. The capture method was also tested demonstrating a resolution of ~ 12 bits.

## 2.6.3 - Coherent Sampling

Coherency is an important and essential feature in production testing where repeatability and reproducibility of the test results is in large a function of the signal generation and output capture triggering time. A single master clock clocking the different parts of the complete system ensures coherency and edge synchronization. With shorter distances, as is the case on-chip, delays between different sub-systems are less critical. In the case of relatively larger chips or high speed and/or high performance, localized PLL and DLL might be necessary. The proposed system does indeed have an inherent coherency which makes it even more attractive for production testing.

## 2.6.4 - Time Domain Reflectometry/Transmission

With a clock rate in excess of 10 GHz by 2010 [1], clock periods as little as 100 ps will be needed to cross from one end of the chip to the other. On the other hand, it takes about 67 ps for an electromagnetic wave to travel 1 cm in silicon dioxide, a delay comparable to the clock period. Signal integrity analysis such as time domain reflectometry (TDR), time domain transmission (TDT), crosstalk, etc. is therefore essential. Due to their broadband nature, capturing such high-frequency signals off-chip is very costly. Embedded tools are therefore essential for such characterisation tasks. Board-level time domain reflectometry have also been experimentally proven using the system proposed in [13]. The digitizer core, introducing only a few femtoFarads of capacitive loading can be, and was in fact used as a tool for testing TDR and TDT on a board. For that only the digitization part of the system was used and a 6-bit resolution at an effective sampling rate of 10 GHz was demonstrated [49]. External clocks with a time offset of 100 ps were used in this particular experiment.

### 2.6.5 - Crosstalk

One other ultimate application for digital communication in deep-submicron technologies is the crosstalk which is becoming more pronounced as technologies scale down, speeds go up, and interconnect traces become longer and noisier. The increased level of packing density is inevitably introducing lines that are in proximity of each other where quiet lines, in proximity of aggressor lines get transformed into what became known as victim lines. This crosstalk effect was indeed captured using the versatile system proposed above [49]. An earlier version was also implemented in [50]. The embedded circuit was also used to measure digital crosstalk on a victim line due to aggressor lines switching. In this latter implementation, only the sample-and-hold was placed on chip, together with a VCD line that was externally controlled with a varying DC voltage to generate the delayed clock system. Buffers were then used to export the analog sampled-and-held voltage, and the signal reconstructed externally. The circuit relies on external equipment for the most part (which is not always an undesired effect, in fact it is more desired in a testing environment for more control and tuning). Nonetheless, the system was among the earliest to measure interconnect crosstalk in an embedded fashion and requires therefore attention and credit.

## 2.6.6 - Supply/Substrate Noise

An important metric in signal integrity measurement is broadband (random) noise characterization. While switching or more generally, deterministic noise can be characterized using undersampling, capturing random noise needs to be approached differently as was recently presented in [51] for the measurement of a system supply random noise. The authors in [51] rely on capturing indirectly the dynamics of the noise as a function of time, by measuring the autocorrelation function, R, of the noise signal, x(t). The autocorrelation is given by the expected value of the random process,

$$R(\tau) = E[x(t + \tau/2) \cdot x(t - \tau/2)].$$
(2.1)

The Fourier transform of the autocorrelation gives the power spectral density of the supply noise. Measuring the autocorrelation function is important in this particular case as it seems to be the only way to capture or quantify broadband noise without the aliasing problems associated with the undersampling method. The implementation is interesting as well and is shown in Figure 2.30.


Figure 2.30 Supply noise measurement block diagram [51].

Only two samplers are used, with an external pulse generator to generate a variable  $\tau$ , together with a digitization process that relies on a voltage-controlled-oscillator (VCO) for achieving the high-resolution conversion. The sampled-and-held value of the supply noise is used to control the frequency of oscillation of the VCO. This frequency is then measured using a high frequency counter and exported off-chip in a digital manner. Calibration is necessary in this implementation in order to capture the voltage-frequency-digital bitstream relationship. The system in [51] was implemented in a 0.13  $\mu$ m CMOS technology and experimentally verified to capture both the deterministic nature of the noise (largely captured using undersampling), as well as the stationary noise in a 4 Gb/s serial link system. The stationary noise was captured using the autocorrelation function and was in large due, and correlated, to the clock in the system. The power spectral density revealed a highest noise contribution at 200 MHz agreeing with the system clock. Other noise contributions in the power spectral density (PSD) occurred at other frequencies that

were directly related to some switching activity in the system. So the proposed system in [51] was indeed capable of capturing both deterministic (also referred to as periodic) and stationary properties of the supply noise in a Gb/s serial link system.

Also recently, an on-chip system to characterize substrate integrity beyond 1 GHz was implemented in a 0.13  $\mu$ m CMOS technology [52] and successfully tested. The relevance of this paper is on one hand its circuit implementation for measuring substrate integrity, which confirms the need for embedded approaches. On the other hand, the paper's conclusion confirms that in an SoC, integrity issues have to be studied and can not be ignored, especially beyond 1 GHz of operational speed.

#### 2.6.7 - Radio Frequency Testing - Amplifier Resonance

One other test was also performed on the proposed system in [13], and that is through the capture of a radio frequency (RF) low-noise amplifier (LNA) frequency response, particularly around its resonance frequency. The CUT was implemented on-chip and its frequency response was tested through the multi-tone signal generation and multi-pass single-comparator capture system proposed. A 1.2 GHz centre resonance frequency was successfully measured with a 29 dB of spurious free dynamic range [49].

More focused RF BIST testing has been proposed in [53]. An example diagram for testing RF gain and noise figure is shown in Figure 2.31.



Figure 2.31 Focused board-level RF testing.

A noise diode generates a broadband RF noise signal, and an output diode, preceded by an LNA for amplification purposes, acts as an RF detector. Narrowband filters are used to

filter out the broadband noise. Sweeping of the power levels is achieved by varying the DC bias of the diode. Calibration is also made possible to characterize and verify the correct functionality of the board-level test path. Additional block diagrams for other RF testing functions can be found in more details in [53]. They all fall in the category of RF-to-DC or RF-to-analog testing, whereby the high-frequency signals are converted to low-frequency or DC signals, which are then captured with more ease and higher accuracy.

#### 2.6.8 - Limitations of the Proposed Architecture in [13]

With all the above applications that were indeed experimentally verified, the system proposed in [13] is in fact versatile, almost all-digital with the exception of one comparator, two low-pass filters, and two sample-and-hold systems. The circuit was proven to perform test capabilities that are otherwise non-achievable, or to say the least, very expensive to test for. Despite its versatility, some limitations exist for the proposed system in [13], and are highlighted next. Comparator offset is one such limitation; the comparator needs to be fully characterized for its offset, as well as dynamically tested, two tasks not easily done, or at best, test time consuming and require some additional consideration. The other limitation, albeit less severe, lies in the uncertainty associated with the rise/fall time mismatch of the digital bitstream in the on-chip memory bitstream DC generation. This however, can be taken care of at the design level and accounting for the worst-case process variations. Another limitation is the increased test time that each test will require due to the multi-pass approach. The dead time needed for the DC signal generation sub-system to settle to an acceptable level within an acceptable resolution, each time the DC generation block updates its output level, is another source of increased test time. This was a trade-off between design complexity and test time that the authors had to consider. Finally, the need to increase the effective sampling rate of the system beyond 10 GHz will require more sophisticated on-chip circuit and instrument to perform the picosecond clock phase calibration and measurement.

The work proposed in this dissertation will attempt to address these limitations by approaching the system design from a different perspective, while still driven by the goal of achieving the embedded characterization of very high-speed events at low cost.

## 2.7 - Recent Trends

If the cost of a component has to be brought down to track Moore's law, its testing cost has to go down as well. While lots of the recent tools are mainly for characterization and device functional testing, and many of the most recent methods have been highlighted in this chapter, more needs to be done about production testing. One important criterion in production testing is the ability to calibrate all devices while using simple calibration techniques, with as little test time overhead as possible to be a production worthy solution. It is therefore important to highlight some of the latest test concerns and techniques that have been emerging in the recent couple of years, mainly to reduce overall test time and cost. Adaptive test control and collection and test floor statistical process control (SPC) are now emerging topics that are believed to decrease the overall test time through investigating the effect of gathering statistical parameters about the die, wafer and lot, and feeding those back to a test control section through some interactive interface. As more parts are tested, it is believed that the variations in the parts are better understood, allowing the Test Control to enable or disable tests, re-order tests, for example, allowing tests that are catching the defects to be run first [54]. This has the potential effect of centring the distribution of the devices performance more tightly around its mean; in other words, getting test results with less variance or standard deviation. Once this is achieved, the remaining devices in the production line can be easily scanned and binned more quickly. However, this solution does not address the issue of mean shifting which could happen if there is a sudden change in the environmental setup. Also the time it takes to gather a statistically valid set of data that works more or less globally is not yet defined. This is an important criterion since having a set that works for only a small percentage of the devices to be tested is not an economically feasible solution. In other words, the time offset introduced by the proposed method should not have a detrimental effect on the overall test time. Otherwise the proposed method is not justified.

A design for manufacturability technique based on a manufacturable-by-construction design was also recently proposed in [55]. The idea proposed is specifically intended for the nanometer era and puts forward the concept of incorporating accurate physical and

layout models of a particular process as part of the computer-aided-design tool used to simulate the system. Such models are then continuously and dynamically updated based on the yield losses. The concept was experimentally verified on five different SoCs implemented in a 0.13  $\mu$ m CMOS process, including a baseband cell phone, a micro-controller, and a graphics chip. Experimental results show a yield improvement varying between 4% and 12%, depending on the nature of the system implemented on the chip. The yield improvement was measured with respect to previous revisions of the same ICs implemented using traditional methods.

Recent questions and efforts are also entailing the consideration by the ATE manufacturing industry to what is known as an open architecture with modular instruments to standardize test platforms and increase their lifetime, which resulted in The Semiconductor Test Consortium (STC) formed between Intel and the Japanese ATE Advantest Corp. [56].

Finally, the testing of multiple Gb/s serial links and buses has been the focus of recent panel discussions [57]. Some of the questions that have been addressed include the appropriateness of DFT/BIST for such tests, whether such measures are, or will be, the bottleneck for analog tests, rather than the RF front-end in mobile/wireless computing, and finally, whether it is necessary to even consider testing for jitter, noise and bit-error-rate from a cost and economics perspective in a production environment.

### 2.8 - Conclusions

In summary, it is clear that test solutions and design for test techniques are important, but where the test solutions are implemented and how they are partitioned, especially in an SoC era, have an effect on the overall test cost. Improvising the optimum test strategy that is affordable, achieves a high yield and minimizes the time-to-market is a difficult task. Test solutions and platforms can be partitioned anywhere on the chip, the board, or as part of the requirements of the ATE. Each solution will entail responsibility to different people (designer, test engineer, or ATE manufacturer), different calibration techniques, and different test instruments, all of which directly impact the test cost, and therefore the overall part cost to the consumer. This chapter focused mainly on the latest developments in DFT and BIST techniques and the embedded test structures of analog and mixed-signal communication systems, for the purpose of design validation and characterization. Emerging ideas and latest efforts to decrease the cost of test include adaptive testing where environmental factors are accounted for and fed back to the testing algorithm. This could potentially result in a more economical long-term production testing, but is yet to be verified and justified. On the ATE level, ideas such as concurrent test and openarchitecture are also being considered. Despite the differences in the views and the abundance in the suggested solutions for test, more efficient test techniques continue to be a subject for research. A great number of mixed-signal test solutions will have to continue to emerge to respond to the constantly pressing needs for shipping to the electronics consumers better, faster and more economically feasible (cheaper) devices.

## Chapter 3 - Time-Based Digitization for Analog Signals

It was argued in the earlier chapters of this thesis that on-chip circuit structures offer a superior solution for the non-destructive signal probing; they alleviate the problem of external capacitive and inductive loading, and simplify significantly the external test setup and test equipment cost. The time-based processing technique introduced earlier in this thesis will be first applied in this chapter whereby an arbitrary waveform digitization technique is presented and demonstrated through the development of a 70-GHz effective sampling rate oscilloscope. Undersampling, combined with single-path time-domain amplification and processing is used to perform the embedded measurement in a timeefficient manner. The proposed system relies on simple circuit components while performing high-speed measurements. One of its additional advantages is its ease of calibration with minimal silicon area overhead; a critical component in a DFT and BIST technique. The circuit was implemented in a 0.18 µm standard digital CMOS process using a single 1.8 V supply. On-chip interconnect crosstalk generation with variable strength is included for characterization, and successfully measured using the prototype chip. Full experimental evaluation is left until Chapter 5 of this thesis. In this chapter, we will place the focus on the proposed system and its circuits details. Circuit trade-offs and design choices are also discussed, and simple equations describing the behaviour of the

proposed system are derived to allow the designer to quickly consider the many trade-offs involved. A brief look at the test time savings of the proposed system is also highlighted.

## 3.1 - Introduction

Previous work on embedded test demonstrated the feasibility of implementing a complete diagnosis tool in deep submicron CMOS technology as highlighted in Chapter 2. Capabilities such as arbitrary waveform generation, periodic analog waveform synchronous capture, curve tracing, oscilloscope and spectrum analyzer tasks at a rate of 20 MHz with an effective resolution of 8 bits [13] have been demonstrated. The work in [13] was also extended to capture high-frequency narrow-band periodic signals using sub-sampling at an effective sampling rate of 4 GHz and 10 GHz [58] in 0.35  $\mu$ m and 0.18  $\mu$ m CMOS technologies respectively. Broadband signals such as TDR measurements were also demonstrated using sub-sampling and a delayed-clock system, achieving a timing resolution of 200 ps.

Another 8-channel, 100-GHz effective sampling rate on-chip oscilloscope for tracing purposes was reported in [59]. Circuits with special focus placed on the characterization of signal integrity and on-chip crosstalk have also been successfully demonstrated in [50][60]. Other on-chip signal capture variations also exist, such as the successive approximation register ADC design reported in [61].

All the prototypes previously reported rely on the use of undersampling, a DLL, and the assumption of a periodic input signal, as shown in Figure 3.1.



Figure 3.1 Signal capture using undersampling.

The main advantage of the architecture presented in [13] over other implementations is in the way the data is captured. In [13], the system integrates a single comparator as a 1-bit ADC, with a clocking phase selection scheme adopted from the different taps of a DLL, performing a multiple-pass capture approach. An on-chip DC reference generator is also implemented. The function of such a memory-based DC generator is to generate the different DC levels that are used to compare the sampled-and-held value of the CUT node to a different DC level each time, acting effectively as a flash ADC while using a single comparator and varying the DC reference level each time. If we however look at other systems, [50] for example, the time increments in the clocking scheme is performed using a voltage-controlled-delay cell with an externally supplied reference voltage performing the time (or equivalently, phase) shift in the clock in the S/H. In addition, like the system proposed in [59], the sampled-and-held value of the analog node to be captured is then exported off-chip using a voltage follower or a buffer. The former method in [13] therefore has the advantage of an almost all-memory design with input/output data in DC and/or digital format, making it very attractive from a BIST point-of-view. The multi-pass method presented in [13] however suffers from an increased test time, mainly due to sweeping the reference voltages over all possible levels, which also requires that the voltage settles to the required resolution every time it is changed to the next level. This settling time is a function of the resolution required, but could take up to three dead cycles of clocking before the resolution is reached. Another disadvantage of the previously proposed systems is the maximum effective sampling achieved. Using on-chip components (a DLL), a maximum of 4 GHz effective sampling was achieved [13], with an extension to 10 GHz [49] with an off-chip interpolation scheme. In [59], 100-GHz effective sampling is obtained, with analog information exported off chip using proper buffering. Advanced and multiple DLL systems are needed, and the system is power and area hungry; often undesired in a design-for-test solution.

Other circuit architectures have been proposed for the on-chip characterization of highspeed signals, and inductance extraction in particular. A 1-ps resolution on-chip oscilloscope has been recently reported in [62]. The circuit, however, requires dedicated on-chip circuitry (a reference ring oscillator with frequency division) to perform the fine time resolution measurement. This adds to the design complexity and introduces additional silicon area penalty.

Based on the above discussions and limitations, the main purpose of the current work presented in this chapter is to perform high-speed effective sampling rate generation, together with digitization, both on chip. The challenge with the former is to be able to generate fine time steps, but also more importantly, to be able to calibrate those intervals with as little additional hardware as possible. To achieve both tasks, time-based signal processing is adopted whereby the recent development in time amplification [3] makes the task of time digitization and clock undersampling calibration less demanding.

Another concern this system tackles is the signal capture time. Parallelism allows a more efficient capture from a time point-of-view. A flash ADC is therefore a logical extension to the multi-pass approach proposed in [13]. However, matching concerns, and testing for mismatches, especially dynamic mismatch, make this solution less attractive. One way to get around the problem is to resort to a different kind of time-based parallelism. Recently, a VCD cell used as a comparator was reported and experimentally verified in [63]. Similarly, relying on a voltage-to-pulse-delay-time converter to perform analog-to-digital conversion was recently reported in [64]. The advantage of such a "comparator" is that it is easily calibrated using a digital tester. The input voltage is converted into time difference which can be easily captured or measured using the time measurement units of even digital low-end testers. Relying on such system, and performing the parallelism in the time domain using digital blocks preceded with a time amplifier, the signal capture time is reduced over the method proposed in [13], all this without the jeopardy imposed by the matching requirements of a multiple-comparator design. Calibration time is however increased, and contributes to a decrease in the overall test time savings.

The advantages of such proposed system and how it can be elegantly combined with time amplification and applied to the task at hands to reduce test time will be detailed in Section 3.2 where the system-level architecture is proposed. Circuit details are then shown in Section 3.3, followed by a theoretical derivation on the design choices and trade-offs in Section 3.4. The calibration scheme is then presented in Section 3.5, and the proposed system test time savings discussed in Section 3.6. Integrated circuit (IC) implementation details and experimental results are presented in Chapter 5. Concluding remarks are given in the last section of the chapter, outlining the advantages and limitations of the proposed approach.

## 3.2 - System-Level Description

#### 3.2.1 - System Overview

As discussed in Chapter 2, undersampling effectively allows one to capture a high-speed signal at a low capture rate with a time resolution of  $\Delta T$ , set by the delay circuit generator. This is achieved by iteratively selecting a different clock phase  $\Delta T$  seconds apart from the previously selected one. Using a multiple-pass approach basically relies on the same comparator with a varying DC reference voltage in a multiple-iterations approach. Both of those loops; the first for phase selection and the second for DC level voltage sweeping, generate what is referred to as the undersampled multi-pass approach. The idea behind an undersampled multiple-pass approach is illustrated in Figure 3.2.



Figure 3.2 On-chip multi-pass under-sampled algorithm high-level implementation.

The programmable reference is first used to generate one DC level. The sampled-and-held voltage of the CUT is then compared to this reference level and quantized using a 1-bit ADC (or simply a comparator). The next run through, the DC reference voltage is maintained constant and the clock edge for the sampling operation is moved by  $\Delta T$ . In the case of [13], this  $\Delta T$  is generated using a DLL and is limited to a resolution of about 250 ps. In another spin, this  $\Delta T$  was generated externally with a resolution of 100 ps. The new sampled-and-held information of the CUT is then compared to the same reference voltage. This sequence is then repeated until a complete cycle of the input to be captured is

covered. Once this is done, the programmable reference voltage is then incremented to the next step, one LSB away from the previous reference level and the whole previous cycle is repeated. The above steps are then repeated until all DC reference voltages are covered. This implies that a time resolution of  $\Delta T$  and a voltage resolution of LSB can be achieved in the time and voltage domains, respectively.

The above scheme trades hardware complexity for increased test time. As mentioned in the previous section, relying on parallelism reduces test time. In this work, test time is reduced with a time-based parallelism, whereby the proposed system architecture relies on a single VCD cell to convert the voltage information into time information, followed with a flash TDC to perform the multi-bit time digitization. The proposed system is shown in Figure 3.3.



Figure 3.3 Proposed system.

It consists of two differential VCDs (or DVCD for short); one used for clock interpolation and the other for voltage-to-time conversion of the information to be captured. The other block is a time amplifier. Time amplification stretches the input time information into larger time intervals, which relaxes the requirements on the time digitization and processing of subsequent stages [3]. The time amplifier can be removed if large time steps, and therefore low effective sampling rates, are used in the sampling process. Finally, a low-resolution TDC and a synchronous parallel-to-serial converter converts the thermometer-coded digital output to a single digital bit minimizing the number of external pins needed. Effectively, the system processes analog data and exports it off-chip in a digital format, after undergoing an intermediate time-based transformation followed by time processing, eliminating therefore the need for an ADC or an analog voltage follower/buffer as was required in [59][60].

One potential drawback to the proposed method is that the back-end digital processing unit has to run at a faster rate (proportional to  $2^{b}$ , where b is the bit resolution of the TDC). This was not seen as a bottleneck since the design consists of basic digital blocks (flipflops, multiplexers, and buffers) which can easily run at rates exceeding  $T_{slow}$ . Alternatively said, the maximum resolution of the proposed system is set by the ratio of the maximum rate at which the digital back-end (comprised of standard digital cells) can run with respect to the slowly running analog blocks (at the rate of  $T_{slow}$ ). The above is true if no pin count penalty is allowed. As the number of additional pins used in the system is allowed to increase, the resolution of the proposed system can be increased beyond the speed ratio of the digital-to-analog blocks.

The circuit implementation for each block in the system is described in details in the next section. But before that, let us highlight some of the advantages of our calibration scheme.

#### 3.2.2 - Proposed Calibration

Calibration is of utmost importance in an on-chip testing environment. The advantages and importance of the proposed calibration scheme is that it relies on the existing building blocks, minimizing therefore the additional silicon overhead and cost (with the exception of an analog multiplexer and a digital control block). The calibration is performed recursively, implying that the new information to be derived relies only on the information that was just derived in the previous step; so the proposed calibration scheme is selfcontained and only relies on the quality of the reference signals. Finally, the calibration procedure requires low-end test equipment; DC generators, digital signal generators with accurate timing information, and a digital capture analyzer, all of which can be easily obtained from any low-end digital tester. Calibration time is a potential drawback to the overall test time savings of the proposed system. However, once the previous calibration scheme is performed, the calibration parameters can be stored and used for subsequent onchip testing. Actual calibration details will be further discussed in Section 3.5.

## 3.3 - Circuit Details

#### 3.3.1 - Voltage-Controlled-Delay Cell 1: Clocking Scheme

A basic current-starved topology [65] for a VCD cell that performs voltage-to-time conversion is shown in Figure 3.4(a).



Figure 3.4 (a) Variable VCD circuit, (b) an example of a typical VCD transfer characteristics showing the time delay as function of the variable voltage, V<sub>var</sub> and (c) a symbolic representation of the VCD circuit.

This VCD cell generates a variable edge,  $T_{var}$ , that is offseted with respect to the trigger edge,  $T_{trig}$ , by an amount proportional to a variable voltage,  $V_{var}$ , according to

$$\Delta T_{\rm VCD, \, var} = G_{\rm VCD} \cdot V_{\rm var}, \qquad (3.1)$$

where  $G_{VCD}$  is the gain of VCD and  $\Delta T_{VCD, var} = T_{var} - T_{trig}$ . The typical shape of the transfer characteristics of the VCD cell is shown in Figure 3.4(b). From this figure, we see that the VCD cell has a relatively large linear region; 0.6 V - 1.1 V over a 1.8 V range. A symbol that captures the variable VCD cell behaviour described above is shown in Figure 3.4(c), and will be used to replace the circuit schematics of Figure 3.4(a) in subsequent illustrations.

If another identical cell is created but driven by a constant reference voltage  $V_{ref}$ , then a reference time edge,  $T_{ref}$ , results according to

$$\Delta T_{\rm VCD, \, ref} = G_{\rm VCD} \cdot V_{\rm ref}, \qquad (3.2)$$

where  $\Delta T_{VCD, ref} = T_{ref} - T_{trig}$ . Here, it is assumed that both the reference and variable VCD cells are matched and have therefore equal gains. If these two VCD cells are driven simultaneously by the same trigger edge,  $T_{trig}$ , as illustrated in Figure 3.5(a), then the output can be taken differentially, forming a differential voltage-controlled-delay cell, or DVCD. We will refer to this cell as the DVCD<sub>clk</sub> when used in the clock interpolation scheme. The symbolic representation of the DVCD<sub>clk</sub> cell is shown in Figure 3.5(b).



Figure 3.5 (a) Symbolic representation of the variable and reference VCD cells, (b) DVCD cell, and (c) an example showing typical inputs and outputs waveforms of the clock interpolation DVCD, DVCD<sub>clk</sub>.

(c)

For this particular case, the input-output relationship that describes the differential output time of DVCD with respect to the differential input voltage is obtained by subtracting (3.2) from (3.1), resulting in

$$\Delta T_{\text{DVCD, clk}} = G_{\text{DVCD, clk}} \cdot \Delta V_{\text{DVCD, clk}}, \qquad (3.3)$$

where  $G_{DVCD, clk} = G_{VCD}$ ,  $\Delta V_{DVCD, clk} = V_{var} - V_{ref}$ , and  $\Delta T_{DVCD, clk} = \Delta T_{VCD, var} - \Delta T_{VCD, ref}$ . Typical DVCD<sub>clk</sub> input-output waveforms are illustrated in Figure 3.5(c) for a decreasing ramped input signal.

From an implementation point-of-view, the control signal  $V_{var}$  can be generated either using a ramp generator or a memory-based DC  $\Delta\Sigma$  encoded bitstream. The logic needed to generate the DC levels (or slow ramp) can either be placed on-chip, or programmed using an off-chip FPGA. Referring back to Figure 3.4(b), simulations show that for a  $V_{var}$ varying between 1.1 V and 0.6 V, and for a  $V_{ref}$  of 0.6 V, the variable *falling*<sup>1</sup> edge is delayed by ~ 0 ps - 1000 ps with respect to the reference edge. This corresponds to the highest gain region and results in  $G_{DVCD, clk} = 2 \text{ ps/mV}$ . The gain  $G_{DVCD,clk}$  decreases as the voltage is increased or decreased beyond the {0.6 V - 1.1 V} linearity range. A cascade of two identical DVCD<sub>clk</sub> cells was adopted in this design resulting in twice the time delay, and therefore the gain  $G_{DVCD, clk} = 4 \text{ ps/mV}$ .

#### 3.3.2 - Voltage-Controlled-Delay Cell 2: Signal Capture

Similar to the architecture adopted for the clocking scheme, the differential-voltagecontrolled-delay cell used for signal capture, and referred to as  $DVCD_{sig}$ , was implemented using a cascade of two current-starved VCDs. Similar sizing as  $DVCD_{clk}$ was also adopted, making the task of designing, laying out and potentially testing this cell easier. In fact, using identical cells eliminates the need to test each cell separately. In this case, the resolution obtained is at best equal to the matching resolution that the technology can offer. Matching on the order of 1% can easily be achieved, resulting in slightly less than 7-bit accuracy; a resolution commonly used in on-chip signal integrity tests. The

<sup>1.</sup> Since the output edges of DVCD<sub>clk</sub> are used to control the switches of an NMOS sample-andhold circuit, the falling edges time separation matters.

 $DVCD_{sig}$  cell, together with typical waveforms at the inputs and outputs of  $DVCD_{sig}$  are shown in Figure 3.6.



Start and Stop signals as inputs to the flash TDC



67

Time delays from the *rising*<sup>1</sup> edges of the outputs of  $DVCD_{sig}$  matter in this case, and the resulting gain of this cell was simulated and found to be  $G_{DVCD, sig} = 2 \text{ ps/mV}$ .

An important thing to note here is that the DVCDs used for clock interpolation and signal capture are only linear over a fraction of the full supply range as shown earlier in Figure 3.4(b). Extending the usable range of operation is possible, and it is the calibration scheme that renders this extension feasible. Also worth mentioning is the fact that the two DVCDs might have different voltage-to-time conversion or gain factors, even if designed to have identical rising and falling edge delays. It is the operational voltage range generating the time delays that will dictate the actual gain of the cell. With an appropriate calibration scheme, as will be described later, the transfer characteristics can be piece-wise linear approximated, with appropriate gain factors in the different voltage regimes extracted from the calibration procedure.

#### 3.3.3 - Time Amplifier

Similar to the programmable gain amplifier that usually precedes an ADC to relax its dynamic range requirements; a voltage-time duality implies that a time amplifier preceding a TDC could also relax its dynamic range requirements, as was recently shown in [3]. The circuit presented in [3] can provide single-stage gains of at least two orders of magnitude for a relatively large input range of a few picoseconds to a few hundreds of picoseconds. This is very promising for our purposes since time intervals of a few tens of picoseconds need to be measured. With an amplification of 100 s/s, the resultant nanosecond-apart edges can be easily measured with a relatively low-resolution TDC. The differential pair circuit schematic for the time amplifier is shown in Figure 3.7(a).

<sup>1.</sup> For DVCD<sub>sig</sub>, it is the rising edges that are relevant. Ideally, one would size the VCD cells to generate rising and falling edges with similar delays, and as a result, similar gains. This was not adopted in this design, and as a result, G<sub>DVCD,clk</sub> and G<sub>DVCD,sig</sub> are not identical.



Figure 3.7 Time amplifier differential pair circuit schematics [3]. Also shown is the time amplifier comparators used to generate the amplified time outputs, labelled as Start and Stop.

If we denote the times of occurrences of the first and second input edges by  $T_1$  and  $T_2$  with respect to a reference edge, respectively, and similarly those of the outputs by  $T_{out,1}$  and  $T_{out,2}$ , then an equation that describes the time amplifier behaviour is

$$\Delta T_{out, amplifier} = G_{amplifier} \cdot \Delta T_{in, amplifier}, \qquad (3.4)$$

where  $\Delta T_{in, amplifier} = T_2 - T_1$ ,  $\Delta T_{out, amplifier} = T_{out, 2} - T_{out, 1}$ , and  $G_{amplifier}$  is the gain of the time amplifier, expressed in s/s.

The time amplification stems from the slow discharging/charging of the output nodes with a time constant controlled by properly choosing the resistive and capacitive output loads, and by the transconductance of the differential pair and its sizing. This slow mode of operation for the time amplifier usually follows a fast mode, whereby upon arrivals of the input edges, the charge incurred due to the high rate of change in the input edges gets redistributed through the differential pairs parasitic capacitance paths, setting the initial conditions on the four output nodes Out1, Out2, Out1b, and Out2b. Detailed description and operation can be found in [3]. An example of the four outputs of the differential pair of the time amplifier are then fed to two comparators, arbiters, or phase inversion detectors, to obtain the time amplified outputs,  $T_{out,1}$  and  $T_{out,2}$ , commonly referred to as the Start and Stop signals for the following TDC stage. The dead-time, an important metric in time measurement systems, and defined as the time it takes the input information to propagate and be digitized using the TDC, is around 20 ns in this particular system.

The time amplifier is an attractive block to use since it is believed to offer added noise immunity to the signals propagating through the on-chip scope. The proposed system relies on edges that carry the information to be measured while having very fast rise times and/or slew rates. Generally speaking, signals with higher slopes result in less time error than those with slower slopes when exposed to random phase noise. More details will be presented in Section 4.6. It could be argued that the time amplifier does in fact rely on some form of analog stretching, and is therefore prone to supply noise. However, techniques can be used to disconnect the supply as it is mainly used for the sole purpose of setting the initial conditions on the four output nodes of the time amplifier cross-coupled devices.

A moderate gain of 10 s/s was implemented for the current version of the circuit design while bearing in mind that the gains can be increased by at least an order of magnitude. It is also important to note that the actual gain of the time amplifier is subject to change with temperature, supply, input rise time [3], and other process variations. In fact, the gain is also a function of the actual time difference to be amplified. A calibration step is usually necessary to get the actual behaviour and transfer characteristics of this block. Other time amplification circuits have been proposed in the literature [44] as was described earlier in Section 2.4.7, but provide too low a gain (single digit quantities only) over a very limited input time range (below 5 ps).

#### 3.3.4 - TDC and Additional Digital Logic

The low-resolution TDC was implemented with a flash architecture, consisting of a chain of buffer delays and DFFs, all using standard digital cells. Figure 3.8 illustrates the basic architecture of the TDC.



Figure 3.8 TDC schematics.

The individual delay of the loaded buffer,  $\tau$ , was chosen to be on the order of ~ 70 ps. Space limitations kept the number of stages to 64, implying a time range of ~ 4.48 ns (equivalent therefore to a maximum "un-amplified" input time difference of ~ 448 ps), resulting therefore in a 6-bit TDC. This can be seen in the following equation

$$FS_{TDC} = N \cdot LSB_{TDC}, \qquad (3.5)$$

where  $FS_{TDC}$  represents the full scale time range of the TDC, N is the number of delay stages, and  $LSB_{TDC}$  represents the smallest time the TDC can resolve,  $\tau$  in this case.

The 64 thermometer-coded outputs of the TDC are then converted, using an integrated parallel-to-serial converter, also made of standard cells, to a single digital bit easily transferable off-chip.

#### 3.3.5 - The CUT

To verify the correct functionality of the oscilloscope, interconnect crosstalk is generated on-chip in an aggressor-victim configuration to constitute the CUT. A five-line transmission line (T.L.) structure was adopted with some degree of control over the switching activity. The T.L. structure with the appropriate physical dimensions used is shown in Figure 3.9.



Figure 3.9 CUT: Transmission-line structures for on-chip (far-end) crosstalk measurement. Digital circuitry (not shown) controls which and how many aggressor lines are switched on; either (1) and (2) alone, (3) and (4) alone, or all four aggressor lines switching simultaneously.

The victim near-end line is set to a DC level that is externally controlled in order to give greater flexibility when testing the structure. This offers the ability to DC level shift the

far-end crosstalk to be measured to within the dynamic range of the subsequent blocks (such as the linear range of the VCD cells, and the time range of the amplifier/TDC combination). Appropriate digital control blocks are added to control the level of switching activity, and therefore, the amount of crosstalk noise on the victim's far-end node. For the current experiment, four cases were considered, and are listed below in order of increasing far-end switching noise:

- Case 1: No aggressor switching
- Case 2: Aggressors 3 and 4 switching simultaneously
- Case 3: Aggressors 1 and 2 switching simultaneously
- Case 4: Aggressors (1 and 2) and (3 and 4) all switching.

Other combinations of aggressors and victims could be chosen in much the same way, but would lead to similar results.

# 3.4 - Design Choices: Resolution, Speed, and Area Trade-Offs

In this section, the maximum voltage resolution and the minimum time interpolation the system can achieve are derived. This gives the designer a guideline as to the design choices and trade-offs for a given application. Since in the proposed system, the time interpolation and voltage signal capture are designed independently, two sets of equations are derived.

#### **3.4.1 - Effective Sampling Rate**

The effective sampling rate, denoted as  $f_s$ , is defined as the inverse of the time difference between the edges of the signals controlling the two sampling switches, driven by DVCD<sub>clk</sub>. If we denote this difference as  $\Delta T_{DVCD,clk}$ , then

$$f_s = \frac{1}{\Delta T_{\text{DVCD, clk}}}.$$
(3.6)

The lower bound, say  $f_{s,min}$ , is set by the number of stages of the TDC, or equivalently, the full scale time range of the TDC, denoted as FS<sub>TDC</sub>. The upper bound,  $f_{s,max}$ , is set by the

minimum time resolution that can be measured on-chip (by the combination of the time amplifier and the LSB of the TDC). Collectively, we can write an expression that bounds  $f_s$  as follows,

$$\frac{G_{amplifier}}{N \cdot LSB_{TDC}} \le f_s \le \frac{G_{amplifier}}{LSB_{TDC}}.$$
(3.7)

which implies that

$$f_{s, max} = \frac{G_{amplifier}}{LSB_{TDC}}$$
(3.8)

and

$$f_{s,\min} = \frac{G_{amplifier}}{N \cdot LSB_{TDC}}.$$
(3.9)

The output time of  $DVCD_{clk}$  is generated through a voltage-to-time conversion operation. Hence, the voltage generation, as well as the voltage-to-time conversion gain will have a direct impact on the achievable speeds. Now, if we redefine the input of the  $DVCD_{clk}$  as  $\Delta V_{gen}$ , in other words, if  $\Delta V_{gen} = V_{var} - V_{ref}$  then from (3.3) we can write

$$\Delta T_{\text{DVCD, clk}} = G_{\text{DVCD, clk}} \cdot \Delta V_{\text{gen}}, \qquad (3.10)$$

where the subscript "gen" is used to refer to the external DC reference generator. Subsequently, we can combine (3.6), (3.8), and (3.10) to write the minimum DC voltage generator step size, denoted as  $\Delta V_{gen, min}$  as

$$\Delta V_{\text{gen, min}} = \frac{\text{LSB}_{\text{TDC}}}{G_{\text{DVCD, clk}} \cdot G_{\text{amplifier}}}.$$
(3.11)

Likewise, we can write

$$\Delta V_{\text{gen, max}} = \frac{N \cdot \text{LSB}_{\text{TDC}}}{G_{\text{DVCD, clk}} \cdot G_{\text{amplifier}}}.$$
(3.12)

In terms of sampling rates, and using (3.8) and (3.9), (3.11) and (3.12) can be re-written as

$$\Delta V_{\text{gen, min}} = \frac{1}{G_{\text{DVCD, clk}} \cdot f_{\text{s, max}}}$$
(3.13)

and

$$\Delta V_{\text{gen, max}} = \frac{1}{G_{\text{DVCD, clk}} \cdot f_{\text{s, min}}}.$$
(3.14)

Given a bound on the sampling rates, we design DVCD<sub>clk</sub> according to (3.13) and (3.14). Here, it is worth noting that while it is desired to maximize the effective sampling rate of the system, this comes at the expense of a smaller required step size for the voltage generator,  $\Delta V_{gen}$ . A smaller step size implies a more stringent requirement for the DC generator from a noise perspective, whereby the noise RMS should be kept lower than the voltage LSB of the voltage generation system. Also, increasing the time amplifier gain has an impact on increasing the maximum sampling rate of the system. This, however, also occurs at the expense of added area needed to accommodate the larger time range required. A similar conclusion is drawn if a large { $f_{s,min}$ ,  $f_{s,max}$ } range is desired, whereby the increase in area overhead is due to an increase in the number of stages in the TDC, needed to accommodate a larger time measurement range.

#### 3.4.2 - Effective Voltage Resolution

Using similar arguments as the preceding sub-section, and applying the derivations of (3.11) and (3.12) to the signal capture DVCD, or  $DVCD_{sig}$ , minimum and maximum  $\Delta V_{DVCD,sig}$  are derived as follows

$$\Delta V_{\text{DVCDsig, min}} = \frac{\text{LSB}_{\text{TDC}}}{G_{\text{DVCD, sig}} \cdot G_{\text{amplifier}}},$$
(3.15)

and

$$\Delta V_{\text{DVCDsig, max}} = \frac{N \cdot \text{LSB}_{\text{TDC}}}{G_{\text{DVCD, sig}} \cdot G_{\text{amplifier}}}.$$
(3.16)

Here too, it can be seen that in order to increase the maximum attainable resolution of this system, a large time amplifier gain and/or  $DVCD_{sig}$  voltage-to-time conversion gain is

needed. This comes at the expense of added hardware and silicon area to increase the TDC time range. The voltage range that this system can capture is limited, but as was adopted in this design, an appropriate DC level shift can be introduced to always bring the voltages to be measured to within the voltage dynamic range of this system. This system, like all systems in general, is designed according to speed, resolution, and area trade-offs. This section highlighted some of the basic equations and design trade-offs in the selection of the various parameters of the circuit. A summary of the theoretically achievable specifications of this design is summarized in Table 3.1.

| Building<br>Block<br>Parameters | G <sub>DVCD,clk</sub>                       | G <sub>DVCD,sig</sub>                      | G <sub>amplifier</sub>                                      | LSB <sub>TDC</sub>                                                                               | Time Range<br>before<br>Amplifica-<br>tion<br>(FS <sub>TDC</sub> /G <sub>am-<br/>plifier</sub> ) |
|---------------------------------|---------------------------------------------|--------------------------------------------|-------------------------------------------------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
|                                 | 4  ps/mV<br>(@ V <sub>ref</sub> = 0.6<br>V) | $2 \text{ ps/mV} \ (@V_{ref} = 0.6 \ V)^a$ | 10 s/s                                                      | 70 ps                                                                                            | 448 ps                                                                                           |
| System<br>Parameters            | f <sub>s,min</sub>                          | f <sub>s,max</sub>                         | Voltage Res-<br>olution<br>(ΔV <sub>DVCD-</sub><br>sig,min) | Voltage<br>Range (ΔV <sub>D-</sub><br>VCDsig,max <sup>-</sup><br>ΔV <sub>DVCD-</sub><br>sig,min) | DC Refer-<br>ence Step<br>Size<br>(ΔV <sub>gen,min</sub> )                                       |
|                                 | 2.2 GHz                                     | 142.8 GHz                                  | 3.5 mV                                                      | 220.5 mV <sup>b</sup>                                                                            | 1.75 mV                                                                                          |

Table 3.1 - Theoretical performance summary of the on-chip oscilloscope.

- a. The two DVCD cells,  $DVCD_{clk}$  and  $DVCD_{sig}$ , were designed to be identical. Their operation however differs. Falling edges matter for  $DVCD_{clk}$ , while  $DVCD_{sig}$  relies on rising edges. Since the rising and falling edges were not designed to be symmetrical, as a result, the gains listed in Table 3.1 are adjusted based on the relevant edge direction.
- b. With proper DC level shifting (by simply adjusting  $V_{ref, sig}$ ), a full-scale voltage range can be achieved.

## 3.5 - System Calibration

Highlights of the calibration scheme were briefly discussed in Section 3.2.2. Here, the detailed calibration procedure is presented. Four steps are involved in the proposed calibration, and they go as follows:

The first step is used to deduce the relationship between the input time of the time amplifier and the serially-shifted digital bit at the output of the chip. This essentially allows the ratio of  $G_{amplifier}/LSB_{TDC}$  to be quantified. This step is illustrated in Figure 3.10. It involves switching in two rising edges with time difference externally controlled (a task easily accomplished by digital testers or low-end benchtop equipment). The time difference is then amplified, digitized, and serially shifted out. The thermometer-coded output is then stored for this particular input time difference. The procedure above is then repeated for many time differences and the digital output stored in some sort of look-up-table (LUT) that constitutes the calibration parameters of the digital back-end circuit.



Figure 3.10 Step 1 of the calibration scheme.

• The second step, shown in Figure 3.11, involves measuring the time interpolation steps obtained from the  $DVCD_{clk}$ , or the delay cell used for clock interpolation. This involves switching in known DC voltages from the DC reference generator, denoted as  $V_{ref,clk}$  and  $V_{var,clk,cal}$ , and observing the time difference  $\Delta T_{clk,cal}$  that results. In other words, this step is used to quantify  $G_{DVCD,clk}$  according to

$$G_{\text{DVCD, clk}} = \frac{\Delta T_{\text{clk, cal}}}{\Delta V_{\text{clk, cal}}}.$$
(3.17)

where  $(\Delta V_{clk, cal} = V_{var clk, cal} - V_{ref, clk})$ , and  $\Delta T_{clk, cal}$  is indirectly measured using the sequence of blocks that was already calibrated in Step 1. This timing information is

important because it constitutes one of the two pieces of information needed (besides voltage) to reconstruct the waveform to be captured.



Figure 3.11 Step 2 of the calibration scheme.

• The third calibration step involves the calibration of the S/H -  $DVCD_{sig}$  combination, where  $DVCD_{sig}$  represents the delay cell used for signal capture. An analog multiplexer is introduced for this step as illustrated in Figure 3.12. The purpose of this multiplexer is to allow switching between two analog waveforms; the first being the DC reference levels for calibration purposes, or  $V_{var,sig,cal}$ , and the second is the CUT analog waveform to be captured when in measurement mode, or  $V_{CUT}$ . We will refer to the output of the multiplexer as  $V_{var,sig}$ . Another reference voltage,  $V_{ref,sig}$ , is applied to the system through a sample-and-hold added for symmetry, and constitutes the reference voltage of the DVCD<sub>sig</sub>. The digital signal that would result is then measured. In other words, G<sub>DVCD,sig</sub> is deduced from this step according to

$$G_{\text{DVCD, sig}} = \frac{\Delta T_{\text{sig, cal}}}{\Delta V_{\text{sig, cal}}},$$
(3.18)

where  $\Delta V_{sig, cal} = V_{var, sig, cal} - V_{ref, sig}$ . The above is achieved while relying on the information from Step 1 for the digital back-end transfer characteristics, to capture and digitize  $\Delta T_{sig, cal}$ . This step is then repeated for different DC levels for  $V_{var, sig, cal}$  so that the transfer characteristic of the capture system over the desired voltage range of operation is recorded. This, therefore constitutes the step to gather the second piece of information to reconstruct any unknown waveform; its voltage.



Figure 3.12 Step 3 of the calibration scheme.

The final step involves turning off all calibration mode control signals and setting the control bit of the analog multiplexer to its normal measurement mode with the desired CUT node connected to the sample-and-hold. As a result, V<sub>CUT</sub> is digitized and captured.

In summary, and from the calibration procedure outlined above, the system having  $V_{ref,clk}$ ,  $V_{var,clk}$ , and  $V_{ref,sig}$  as inputs, and  $D_o$  as output (where  $D_o$  represents the digital word at the output of the TDC) can now be used to measure any unknown waveform. This is achieved while relying on the calibration equations that will be briefly summarized next. Two pieces of information are extracted from the calibration, with the first being the time step size for the clock interpolation. This is deduced according to

$$\Delta T_{clk, cal} = G_{DVCD, clk} \cdot \Delta V_{clk, cal}.$$
(3.19)

The second piece of information needed to reconstruct any desired waveform in test mode is the voltage-to-digital conversion process. This is a two-step process. The first consists of converting the voltage to-be-captured into time, according to

$$\Delta T_{sig, cal} = G_{DVCD, sig} \cdot \Delta V_{sig, cal}, \qquad (3.20)$$

and the second digitizes the resulting time,  $\Delta T_{sig,cal}$  according to

$$D_{o} = floor\left(\frac{\Delta T_{sig, cal} \cdot G_{amplifier}}{LSB_{TDC}}\right), \qquad (3.21)$$

where floor  $\left(\frac{T}{LSB_{TDC}}\right)$  represents the quantization of T by the TDC. Combining (3.20) and (3.21), we get the digital word at the output of the TDC, defined D<sub>o</sub>, as a function of  $\Delta V_{sig,cal}$ , and some of the system's calibrated parameters according to

$$D_{o} = floor\left(\frac{G_{DVCD, sig} \cdot \Delta V_{sig, cal} \cdot G_{amplifier}}{LSB_{TDC}}\right).$$
(3.22)

One can easily deduce the input voltage in measurement mode by replacing  $\Delta V_{sig,cal}$  in (3.22) with  $(V_{CUT} - V_{ref, sig})$ , to obtain

$$V_{CUT} = V_{ref, sig} + \left(\frac{LSB_{TDC}}{G_{DVCD, sig} \cdot G_{amplifier}}\right) \cdot [N - (b_0 + b_1 + ... + b_{N-1}) + 1], (3.23)$$

where  $b_0$ ,  $b_1$ , ...,  $b_{N-1}$  represent the thermometer-code representation of the TDC digital output word,  $D_0$ .

Equations (3.19) and (3.22) constitute the calibration equations. From those equations, and given inputs  $V_{ref,clk}$ ,  $V_{var,clk}$ , and  $V_{ref,sig}$  to the system, and a TDC output  $D_o$ , any unknown waveform at the input of the analog multiplexer,  $V_{CUT}$ , can now be reconstructed.

## 3.6 - Test Time

In this section, the test time savings of the proposed system are evaluated and compared to other digitization approaches, and in particular, the multi-pass approach [13][27]. We will proceed by dividing the total test time into two components: the signal capture time and the calibration time. The former is due to the measurement or data collection time in normal mode of operation, while the latter is due to the time it takes to calibrate the system.

#### 3.6.1 - Signal Capture Time

The signal capture time corresponds to the time it takes to collect a full cycle of the waveform to be digitized. In the case of the multi-pass digitization method, if we denote by b the bit-resolution of the system,  $T_{slow}$  the slow capture rate, and n the number of phases generated by the DLL, then the capture time, denoted as  $T_{multi-pass, capture}$  is given by

$$T_{\text{multi-pass, capture}} = [(n \cdot T_{\text{slow}}) \cdot 2^{b}] + T_{\text{DC gen settling-time}}, \qquad (3.24)$$

where  $(n \cdot T_{slow})$  refers to the time it takes the system running at  $T_{slow}$  to cover all the DLL phases, for a single comparison level. This first step is then repeated 2<sup>b</sup> times for all digitization levels, hence the second multiplication term in (3.24).  $T_{DC \text{ gen settling-time}}$  is the time it takes the DC reference generator to settle to a final DC value, every time the DC comparison level is updated. This settling time is a function of the resolution of the system and the architecture chosen for the implementation of the DC reference generator low-pass
filter order. The larger the required resolution, the longer the settling time, and vice versa. Typically, for a  $\Delta\Sigma$  memory-based DC reference generator with a moderate resolution, two to three clock periods, T<sub>slow</sub>, are needed. This settling time is incurred every time the reference generator DC voltage is updated, 2<sup>b</sup> times in our case, resulting in a total DC generator settling time of

$$T_{DC \text{ gen settling-time}} = 3T_{slow} \cdot 2^{b}$$
 (3.25)

The total signal capture time of the multi-pass approach is then obtained by substituting (3.25) into (3.24) to obtain

$$T_{\text{multi-pass, capture}} = [(n \cdot T_{\text{slow}}) \cdot 2^{b}] + [3T_{\text{slow}} \cdot 2^{b}]. \qquad (3.26)$$

The proposed system, on the other hand, while still needing to perform the clock phase updating for the undersampling of the signal to be captured, does not require any DC voltage level sweeping. If we denote by  $T_{proposed, capture}$  the time it takes our proposed system to perform the signal capture, then

$$T_{\text{proposed, capture}} = n \cdot T_{\text{slow}}. \qquad (3.27)$$

Comparing (3.26) and (3.27), we find that the proposed system provides more than  $2^{b}$  times savings in the capture test time<sup>1</sup>. The result in signal capture time savings comes as a no surprise since the proposed system benefits from parallelism in its capture, performed in the time domain with the flash TDC circuit.

#### 3.6.2 - Calibration Time

The calibration time is a little more difficult to quantify and compare to the multi-pass approach method since little derivation exists on the latter. Nonetheless, it is believed that the proposed system will require more time-elaborate steps in its calibration procedure,

<sup>1.</sup> Here it is assumed that the serial shift out of the proposed system is running much faster than  $T_{slow}$ , and more accurately, running at least 2<sup>b</sup> times faster. This is a valid assumption given that the standard digital cell components used in the serial shifting operation are expected to be able to run at a much faster rate than the analog front-end circuits of the system.

resulting in a slight reduction in the overall test time savings over the multi-pass approach. However, it is important to note that the detrimental effect of the calibration time overhead can be decreased if the proposed system core is used to diagnose multiple nodes on the same die. The calibration can then be performed once, and calibration parameters stored in a look-up-table, and used throughout the multiple diagnosis or measurement tests. This will further reduce the relevance and significance of the calibration time to the overall test time.

## 3.7 - Conclusions

A new approach for on-chip signal capture has been proposed. The system relies on voltage-controlled delays to perform clock interpolation for the sub-sampled capture scheme. The interpolated clocks are then used to control the sample-and-hold clocking scheme. The system also relies on identical voltage controlled delays to convert the sampled-and-held value of the node to be diagnosed into time information (with respect to a reference edge). This edge difference is then amplified using a time amplifier and digitized easily using a low-resolution TDC. This in a way "investigates" a time-domain parallelism which has the advantage of great measurement time savings, in contrast to a previously proposed multi-pass approach. Unlike the case of a parallelism that is based on a flash architecture, this system is fully calibratable using very simple components; DC or ramp signal generation (that could easily be integrated on-chip), and time measurement units. This calibration scheme is attractive because it can be performed with low-end digital testers. The calibration presented in this chapter relies on the same components as those used in the signal capture, minimizing therefore the silicon area penalty incurred. However, the overall test time efficiency is decreased due to the additional calibration steps required. If multiple nodes are to be diagnosed in a system, calibration is performed once and the data stored in a look-up-table. This will constitute an efficient way to reduce the overall calibration time, resulting in a test-time efficient system from a capture and calibration points-of-views.

In summary, the proposed system has the following advantages:

- Minimal analog design involved, decreased design complexity, and reduced static power dissipation.
- Decreased measurement time while relying on a time-domain parallelism architecture;
   a DVCD converts the DC information into the time domain, which is then captured in
   a flash manner using a TDC made of standard digital cells, and preceded by a time
   amplifier. Added calibration time, however, decreases the overall test time efficiency
   of the proposed system. If multiple nodes are diagnosed in the same system, the calibration data can be stored and re-used, increasing the overall test time efficiency.
- New clock interpolation scheme, which relies on an identical DVCD that was used to replace the comparator, further reducing the design complexity. The smallest time intervals can be easily quantified using the built-in time amplifier. The designer can also incorporate the non-uniform timing information, in the case of a non-linear DVCD, and correct for it.
- Fully calibratable system with an added bonus that the calibration is easy to perform using low-end testers and testing equipment.

The proposed system can find applications as a design-for-test or built-in-self-test approach in the non-destructive characterization of critical nodes in a system-on-chip. It has attractive properties such as: low parasitic capacitance loading, potential increase in test time savings, ease of design, and ease of calibration using low-end components.

## Chapter 4 - Real-Time Single-Shot Digital Measurements

Extending the work developed in the previous chapter, special cases of digital signals are considered. An embedded low-power technique for the single-shot measurement and extraction of timing characteristics of GHz digital signals is proposed. The method relies on irregular sampling, also known as level-crossing sampling [66], and sometimes referred to as an asynchronous ADC [67]. Two circuits will be demonstrated; the first being a rise time measurement core, while the second represents an embedded technique for the characterization of narrow pulses. Both systems rely on the time-based processing techniques developed before, which in the context of this chapter, can be seen as a general tool to increase the low-end time dynamic range measurements of digital events in CMOS. The added advantage here is the real-time processing of the information, as opposed to the undersampled nature of the previous case presented in Chapter 3. The nature of the information to be captured here, such as signal rise time and digital pulse width, makes the use of a voltage sample-and-hold unnecessary, simplifying the task of embedded capture even further, and allowing real-time operation of the proposed circuit. In particular, the circuits rely on a new fast voltage-crossing detector to convert the input information and condition it into same polarity edges, separated by the timing information to be measured. Those edges are then in turn stretched further using time amplification, making them easily detectable with low-resolution time-to-digital converters. Dynamic current

generation techniques are used in the front-end detector to greatly reduce the power consumption. The proposed circuits are compact and introduce only a few tens of femtoFarad capacitive loading. The circuits were implemented in a standard 0.18  $\mu$ m CMOS process. Rise times of 1 ns and pulses as narrow as 78 ps are shown in Chapter 5 to be successfully captured in a single shot measurement approach, with total power dissipation not exceeding a few milliWatts, in each of the two cases. More details on the experimental results will be presented in Chapter 5 of this dissertation. The advantages of the irregular sampling scheme and how it can be combined with time amplification and applied to the desired measurement will be the focus of the remainder of this chapter.

## 4.1 - Introduction

Characterizing systems with very high slew rates and signals with fast rise times is becoming increasingly challenging. This task can not be accomplished using external equipment due to their loading effects, and the increase in test cost prohibits the use of dedicated equipment and/or probes. Embedded techniques for determining such characteristics are therefore becoming more attractive and possibly, the only economically feasible technique. Undersampling is a general technique that can measure rise/fall time, among other transient phenomena, as was demonstrated in [35] for signals with 1 ns rise time. Other circuits have been specifically developed to measure signal rise times [68]-[70]. In almost all the previously proposed methods, undersampling is used. Advantages of undersampling exist at low- to medium-speeds, and with high-speed narrowband signals, mainly due to the averaging effects, which improve the repeatability of the results. However, if the implementation uses a front-end RC sampling network, as is the case in [35], then at speeds in excess of 1 GHz, such under-sampling techniques can suffer from high distortion levels. This is mainly due to 1) the variation in the switch on-resistance, R, and 2) aperture errors in the sampling clock. Delay line and sampling clock jitters can also have an impact on the accuracy of the measured results which is a concern in the circuits implemented in [35], [68]-[70].

Another form of digital signals that is highly desirable to be able to measure at low cost is narrow digital pulses. The problem however gets more complicated when the pulses are as narrow as a few tens of picoseconds, such as for example in pulsed range radar systems when it is often required to measure very short pulses (~ 100 ps) at a slow repetition rate. In ultra wideband systems, it is often the preferred way to sample signals using very short pulses, triggering a diode sampling bridge. While more common in technologies such as GaAs, SiGe, and some other technologies [71], pulse compressions have recently emerged in pure CMOS technologies [72]. The use of external equipment to characterize such extremely fast pulses is often non-existent, or at best, very expensive, pressing for the need of a fast, cheap, and easily integrated solution.

## 4.2 - A Closer Look at Pulse Measurement Systems

#### **4.2.1** - Classes of Pulse Measurement Circuits

On-chip pulse capture<sup>1</sup> has been demonstrated before using either digital techniques [73][74], or analog-based interpolation schemes [28][75]. Analog techniques are attractive because they are less sensitive to digital logic switching and jitter noise, and can provide better single-shot accuracy. The analog-based measurement techniques mainly rely on a front-end time-to-voltage converter (TVC) to convert the pulse into a DC voltage, stored on a capacitor, and then digitized using an ADC. In some cases, the ADC is a high-resolution digitizer [28] as shown in Figure 4.1, and in others, the TVC is followed by a dual-slope ADC [75], as illustrated in Figure 4.2. A brief discussion was given in Chapter 2, here the trade-offs are analyzed in more details.

<sup>1.</sup> In both of the proposed methods which are going to be discussed next, the assumption is that the inputs are two edges of the same polarity (either both rising or falling) which is the case when jitter or clock skew measurements are to be performed. The current work investigates the measurement of small *pulses* rather than *edges*. The comparison nonetheless was deemed appropriate; in the previously reported methods, the input edges are directly converted into a pulse using appropriate digital logic. From that step on, the remaining processing is performed on a pulse rather than edges.



Figure 4.1 Time-to-voltage converter schematics followed by a high-resolution ADC [28].



Figure 4.2 Time-to-voltage converter schematics followed by a dual-slope ADC [75].

In the latter case, a single-bit comparator with a pre-specified reference voltage,  $V_{ref}$ , is used to generate time edges. Those time edges are then digitized using a TDC based on a chain of digital delays. The TVC and ADC could become power hungry for small input

pulse widths. The voltage step,  $\Delta V_{cap}$ , incurred across the capacitor, C, in the TVC is directly proportional to pulse width, W, and the biasing current I<sub>B</sub> according to

$$\Delta V_{cap} = \frac{1}{C} \cdot I_B \cdot W.$$
(4.1)

The output  $\Delta V_{cap}$  is then digitized with an ADC. A larger  $\Delta V_{cap}$  is usually desired since it relaxes the resolution requirements of the ADC, whereby the voltage incurred across the capacitor has to exceed the ADC least-significant-bit (denoted as LSB<sub>ADC</sub>), according to

$$\Delta V_{cap} \ge LSB_{ADC}, \qquad (4.2)$$

where

$$LSB_{ADC} = \frac{FS}{2^{b} - 1},$$
(4.3)

with FS representing the full voltage scale, and b is the resolution of the ADC. From (4.2) and (4.3), it can be seen that increasing  $\Delta V_{cap}$  results in a decrease in the required bit resolution of the ADC, but at the expense of an increase in the current and therefore power dissipation, as can be deduced from (4.1). Equation (4.1) also shows how the current and therefore power dissipation becomes more pronounced as the pulse width to be measured, W, becomes smaller, for a given ADC resolution.

In the dual slope approach, the ADC requirements are much less demanding since now only a single-bit comparator can be used, and the digitization is then performed partially in the time domain. Processing the time pulse information in this way offers particular advantages since it allows for pulse stretching [29][76], making the new slope more easily detectable with a time digitizer, and making the overall system insensitive to the absolute value of C. The power dissipation concerns in designing the front-end TVC still exists in this case. Linearity-gain-power trade-offs in the choice of  $V_{ref}$  is an added design constraint. Other difficulties in the pulse stretching technique stem from the time-walk effect as discussed in the next section.

#### 4.2.2 - Time-Walk Effects in Pulse Measurement Systems

The delay introduced by the comparator to detect a threshold-crossing time is related to many parameters such as the comparator's input overdrive, underdrive, slew rate, etc. The time walk error refers in particular to the dependency of the time crossing detection on the input overdrive/underdrive. Output time distortion results [77]-[79], particularly when the overdrive is small. This was experimentally verified in [77] for a few commercially available discrete comparators. To understand this, and starting with the effect of slew rate, a simple model estimates the delay,  $t_d$ , to consist of a fixed component,  $t_{d,fixed}$ , and a varying component according to [80]:

$$t_{d} = t_{d, \text{ fixed}} + \frac{B}{\sqrt{\frac{dV_{in}}{dt}}}, \qquad (4.4)$$

where B is a constant that depends on the comparator structure. In the case of pulse stretching, the rising and falling ramps which constitute the input to the comparator are purposely designed to have different slopes (as a direct consequence of the pulse stretching). This will directly have an influence on the accuracy of the threshold-crossing time as can be seen from (4.4). So here too, trade-offs between a large stretching ratio (and therefore a large gain) and a small time-walk error have to be made in order to use the pulse stretching technique efficiently. Note however that the rising and falling slopes, albeit different, are constant. Their effect could therefore be calibrated for. However, the comparator delay model highlighted in (4.4) is valid for large overdrives. As the pulse width decreases, so does the overdrive, causing the model outlined in (4.4) to cease to accurately capture the time walk effect, and results in a first source of distortion in the measured results. Another source of distortion caused by the overdrive voltage of the comparator is due to the dependency of the voltage step at the output of the differential pair,  $\Delta V$  (or comparator input height and therefore overdrive) on the pulse width to be measured. This argument is somewhat analogous to a clocked voltage sample-and-hold trade-offs/limitations. A limited bandwidth due to the front-end switch on-resistance, R<sub>on</sub>, and hold capacitor, C, causes a gain error (that can be calibrated and corrected for). The variation in the switch on-resistance as a function of the input is a more severe problem that is responsible for output distortion. Techniques such as the constant fraction timing discriminator (CFD) exist to correct for a comparator time-walk effects [81]. These techniques are mainly used in the instrumentation for particle physics experiments, and where the threshold varies as a function of the input height, but at the expense of additional power dissipation and circuit complexity. Low power dissipation techniques also exist [82], at the expense of circuit complexity and silicon space.

In order to circumvent the above problems in both rise time and pulse width measurement receivers, a new system is proposed and is aimed at applications where the rise times and slew rates to be measured are in excess of 1 V change in 100 ps time, or where the pulses to be measured are as narrow as 20-30 ps or less, for which a more efficient technique is essential. In the present work, we approach the problem differently where we rely on a new broadband signal capture scheme based on irregular sampling, also known as level-crossing sampling [66], combined with a key block, the time amplifier [3]. The advantages of such irregular sampling scheme combined with time amplification will be detailed in Section 4.3 where the system-level architectures are presented. Circuit details are then shown in Section 4.4 together with the transistor-level implementation of two high-speed low-power front-end detectors of the edge and pulse measurements systems. Simulation results and system verification are presented in Section 4.5, while integrated circuit (IC) implementation and experimental results are given in Chapter 5. Additional comments and observation about the proposed systems are highlighted in Section 4.6, followed by concluding remarks in the last section of the chapter.

Here, it is worth mentioning that measuring rise times and digital pulses might be used in different applications. Nonetheless, it was deemed appropriate to discuss both systems in the same chapter, since with the exception of a slight modification to the front-end circuits, the proposed systems in both cases share common blocks and rely on similar irregular-sampling time-based processing technique.

93

## 4.3 - Proposed Systems Description

The irregular sampling scheme [66] adopted here is shown in Figure 4.3. The system is based on fixed voltage levels that trigger the time conversion or count, rather than the more standard sampling, which samples at equal time steps, specified by the sampling clock frequency. This asynchronous sampling scheme was proposed in [67] for ADC applications. However, the proposed system differs from the level-crossing ADC proposed in [67] in order to accommodate the application sought; capturing very fast events at minimal power levels. With the level-crossing sampling technique, the proposed system for high-speed digital signal capture is shown in Figure 4.4 for edge measurement, and in Figure 4.5 for pulse width characterization.



Figure 4.3 Synchronous versus Asynchronous signal sampling [66][67].



Figure 4.4 System-level description of the proposed asynchronous approach, showing the waveforms at the output of each block.



Figure 4.5 System-level description of the proposed pulse measurement system, showing the waveforms at the output of each block.

For edge measurement, the system shown in Figure 4.4 consists of a fast voltage-crossing level detector with references generated either on-chip using a resistor string, a digital-toanalog converter, or a  $\Delta\Sigma$  memory based bitstream. The voltage levels could also be generated off-chip for more flexibility and less silicon area. The {low, high} voltage reference levels can be varied depending on whether, for example, digital rise time or analog slew rates are to be measured. We will refer to these levels as  $V_{low}$  and  $V_{high}$ , respectively. For rise time measurements, these voltage levels are commonly set to {10%, 90%}, {20%, 80%}, or {30%, 70%} of the input step size, denoted as  $V_{step}$ , respectively. The main purpose of the front-end high-speed voltage-crossing detector is to transform the information to be captured into time edges. This is an important step allowing the designer to rely mainly on digital blocks together with the recent developments in digital time amplification [3].

In the case of the pulse detection and measurement system shown in Figure 4.5, a slight modification exists but the general idea of time event detection, followed by time processing, is identical to that of the rise time measurement system. As shown in Figure 4.5, the system first converts the differential input pulse into falling edges separated by a phase difference equal to the pulse width to be measured. This in fact, like the edge measurement system, is the most challenging part since this front-end circuit will need to respond to pulse widths that approach the limit that the technology can offer. The edges are then inverted and progressively buffered to drive the time amplification stage.

The second block in both systems is therefore a time amplifier, with main purpose to stretch the input time information into larger time intervals. The use of this time amplifier is essential when very fast events are to be captured to relax the requirements on the following TDC stage. Finally, a low-resolution TDC and a synchronous parallel-to-serial converter converts the thermometer-coded digital output, denoted as  $D_o$ , to a single digital bit,  $D_{o,serial}$ , minimizing therefore the number of external pins needed.

## 4.4 - Circuit Details

#### 4.4.1 - Front-End Voltage-Crossing Detector

The front-end voltage-crossing detector, needed for signals with high slew rates, is shown in Figure 4.6.



Figure 4.6 Two fast voltage-crossing level detectors.

The main purpose of this block is to transform the input edge,  $V_{in}$ , into two digital edges, denoted as  $T_{low}$  and  $T_{high}$ , with time separation to be measured. For an incident input edge,  $V_{in}$ , of step size equal to  $V_{step}$ , we define  $T_{high}$  as the time instant where  $V_{in}$  crosses the high reference voltage, normally 80% of the voltage step size of  $V_{in}$ , according to

$$T_{high} = T_{80\% \cdot V_{step}}.$$
 (4.5)

 $T_{low}$  is the time instant where  $V_{in}$  crosses the low reference voltage, normally 20% of the voltage step size of  $V_{in}$ ,

$$T_{low} = T_{20\% \cdot V_{sten}}.$$
 (4.6)

The outputs of the front-end detector are then two voltage waveforms,  $V_{out,low}$ , and  $V_{out,high}$ , occurring at times  $T_{out,low}$ , and  $T_{out,high}$ , with time separation given by

$$\Delta T_{\text{rise}} = G_{\text{edge detector}} \cdot (T_{\text{high}} - T_{\text{low}}), \qquad (4.7)$$

where  $\Delta T_{rise}$  is defined as  $\Delta T_{rise} = T_{out, high} - T_{out, low}$ , and  $G_{edge \ detector}$  is the gain of the crossing detector block, ideally equal to 1 s/s. The phases  $T_{high}$  and  $T_{low}$  in (4.5) and

(4.6) respectiely, are measured with respect to a reference edge or time instant, normally the trigger signal of the input.

From a circuit details point-of-view, the detector block works as described next. Referring back to Figure 4.6, the assumption made here is that the reference voltages are all set above the threshold voltage of transistors M1 and M4. Initially, when the input voltage is low, no current flows through the M1-M3 nor M4-M6 branches, and both outputs, Vout, low and Vout, high, are reset to the supply level. As the input increases and exceeds the threshold voltage of M1 first and then the reference voltage, V<sub>low</sub>, transistor M3 starts conducting causing the output voltage V<sub>out, low</sub> to start discharging. The rate at which this output discharges is directly related to the biasing current, IB. With an input slew rate on the order of 10 GV/s, or digital rise times on the order of 100 ps, a current as high as few mA is needed to discharge the output node to a low enough level (at least below 0.8 V) before the input has completed its rising transition. This results in excessive static power dissipation. Instead, a capacitor C is added in parallel with the biasing source, resulting in a dynamic current spike, when V<sub>in</sub> exceeds V<sub>low</sub>, given by a fraction of the ratio  $C \cdot (dV_{in}/dt)$ . This fraction could well be within the mA range for a capacitor in the pF range and an input slew rate in the order of 10 GV/s. Simulation results graphically illustrating the advantages of adding the capacitor in parallel with the current source are shown in Figure 4.7(a). The corresponding effect of adding this capacitor on the discharge rate of the output node is also shown in Figure 4.7(b). It is worth noting here that increasing C beyond 0.5 pF results in an increased area overhead with little improvement in the output discharge capabilities as can be seen in Figure 4.7(b). Hence, an optimized choice for the capacitance C can be made. The dynamic current technique results in an extremely low static power dissipation to discharge the output node, since now the biasing current  $I_B$  can be on the order of few  $\mu A$  only, and is used for the sole purpose of setting the DC initial conditions of the circuit. In fact, transistor M2 and the variable reference voltage at its gate act as a DC level shifter to the M1-M3 branch, and more specifically, to the source of transistor M1. Same reasoning as above applies to the M4-M6 branch and transistor M5. The front-end block is compact with transistor dimensions (widths) varying between 1  $\mu$ m and 15  $\mu$ m.



Figure 4.7 Simulation results showing a) the effect of adding the capacitor C, on the current "spike" available to discharge the output node, and b) its effect on the output discharge rate.

99

#### 4.4.2 - Front-End Voltage-Crossing Detector Buffers

The time difference between the two falling edges (output waveforms) of the front-end voltage-crossing detector is the only information that is needed for subsequent processing, and in particular, for time amplification purposes. The output waveforms, however, are not full scale outputs, and that is due to the different reference voltages driving transistors M2 and M5, which set different final DC conditions on the outputs. This results in a constant time offset at the output of the measurement system and could therefore be calibrated out. Otherwise, different digital buffers with different drive capabilities could precede the time amplifier and follow the voltage-level crossing detector. A straightforward way to design those buffers consists basically of adding a properly sized PMOS transistor to improve the discharging capabilities of the inverter when the final DC level for high reference voltages (V<sub>high</sub>) is higher than those for lower reference voltages (V<sub>low</sub>). Alternatively, the switching threshold of the inverter can be adjusted by appropriately sizing the N- and P-transistor ratio. Alternatively, regular inverters can be used with a measured system output requiring an additional calibration step.

Another thing to note here is that progressively sized inverters are then added in order to drive the input stage of the time amplifiers. Also, the outputs of the buffers are designed to be inverted with respect to the falling outputs of the voltage detectors in order to make use of an NMOS input stage for the time amplifier. The NMOS input stage gives better gain characteristics (for a given current) when compared to its PMOS counterpart.

#### 4.4.3 - Front-End Pulse-Detector Circuit

The front-end detector is probably the most challenging part of the design as it requires very fast response time, given the short intervals to be measured, while maintaining a reasonably low power budget. The proposed circuit that can perform the above task is shown in Figure 4.8.



Figure 4.8 Simplified schematics of the pulse-to-edge converter.

The main purpose of this block is to transform the differential input pulses,  $V_{in}$  and  $\overline{V_{in}}$ , into two digital edges with time occurrences denoted as  $T_1$  and  $T_2$ , such that the time separation between  $T_1$  and  $T_2$  is equal to the pulse width, W, of  $V_{in}$ . In other words, the output of the front-end detector can be expressed as

$$\Delta T_{W} = G_{\text{pulse detector}} \cdot (T_{2} - T_{1}), \qquad (4.8)$$

where  $\Delta T_W$  is the time separation between  $V_{out,1}$  and  $V_{out,2}$ , and  $G_{pulse \ detector}$  is the gain of the crossing detector block, ideally equal to 1 s/s. For the remainder of the description, the pulse-to-edge converter will be divided into two halves or cells, with cell 1 referring to the branch containing transistors M1 and M3, and detecting the occurrence of  $T_1$ , while cell 2 consists of the M2-M4 branch detecting  $T_2$ .

From a circuit description point-of-view, and like the edge measurement case, dynamic current generation techniques are used as well, as will be detailed next. However, additional resetting operation is needed here to accommodate and process an input pulse, rather than an edge. The assumption made here is that the width to be measured is with respect to the threshold voltage of transistors M1 and M2. Initially, when the input is low, no current flows through the M1-M3 branch, the output V<sub>out.1</sub> is initially reset to the supply, while the capacitor, C, is set to ground with its parallel reset switch shown in Figure 4.8. As the input increases and exceeds the threshold voltage of M1, and the PMOS reset switch opens, transistor M3 starts conducting, causing the output voltage to start discharging. With an input slew rate on the order of 1 V/100 ps, a dynamic current spike as high as 1 mA can result if a 1 pF capacitor C is added in parallel with the biasing source, and is given by  $C \cdot (dV_{in}/dt)$ . A fraction of this current will be available to quickly discharge node V<sub>out,1</sub> through the parasitic capacitances of M3 (mainly its gate-source capacitance). This fraction can be obtained by capacitive divider and depends on the sizing of M1, M3 and the PMOS reset switch. The addition of the reset transistors is necessary because this block is expected to respond to the rising edges only. The reset switches allow the circuit to ignore the falling end of the input pulse and detect  $T_1$  only (indicated on Figure 4.8). With a complementary input pulse,  $\overline{V_{in}}$ , the M2-M4 branch detects the time transition, T<sub>2</sub>, following a similar reasoning to the one described above. With such outputs having very fast fall times and carrying the pulse width needed in time difference form, this makes the signals to be measured in a perfect format to explore the capabilities of the time amplifier to its fullest [3].

#### 4.4.4 - Front-End Pulse-Detector Operation

To understand the behavior of the front-end pulse detector circuit shown in Figure 4.8 in response to an input differential pulse, let us consider the five different regimes of operation, as depicted graphically in Figure 4.9.



Figure 4.9 Illustration of the five different phases or regimes of the differential input pulse occurrence.

The five regimes are referred to as phases A-E, and defined depending on the input state as follows:

Initially, in the first phase referred to as phase A, when the input  $V_{in}$  is low, no current flows through the M1-M3 branch, the output  $V_{out,1}$  is initially reset to the supply, while the capacitor C is reset to ground due to its parallel reset switch. On the other hand, cell 2 has its output  $V_{out,2}$  pre-charged to the supply level. The reset switch across the capacitor is open, and the voltage across the capacitor is approximately given by the supply level minus the threshold voltage of transistor M2,  $V_{th,n}$ .

In phase B, and observing cell 1, as the input increases and exceeds the threshold voltage of M1, and the PMOS reset switch opens, transistor M3 starts conducting causing the  $V_{out,1}$  to start discharging. A dynamic current given by a fraction of  $C \cdot (dV_{in}/dt)$  will be available to quickly discharge node  $V_{out,1}$  through the parasitic capacitances of M3 (mainly its gate-source capacitance). At this point,  $V_{out,1}$  is discharged to almost 0 V, transistor M1 is off, and will remain so even in phases C and D. Cell 2 on the other hand operates differently in phase B. The capacitor will be forced to discharge quickly through its relatively large reset switch, until the voltage across the capacitor is close to 0 V. With this condition, cell 2 is brought to the same initial condition as cell 1 before phase B began. This makes cell 2 ready to detect the rising edge of  $\overline{V_{in}}$ , T<sub>2</sub>, in an identical manner to cell 1 at the occurrence of edge T<sub>1</sub>.

In phase C, little or no changes are incurred.

In phase D, cell 1 does not perform any specific function, while cell 2 behaves in an identical manner to cell 1 in phase B. So  $V_{out,2}$  discharges very quickly due to the high input slew rate of  $\overline{V_{in}}$ .

Finally, in phase E, the two halves of the front-end circuit (cells 1 and 2) are not fully symmetrical. At the end of the pulse occurrence, the two output nodes, while at almost 0 V, they do not remain at this level. In fact both cells charge up at different (slow) rates and have different final conditions. This slow rate for cell 1 is dictated by the saturation current of transistor M3 and the total value of the parasitic capacitances at node V<sub>out.1</sub>. The output voltage stops charging up when M3 turns off (around  $V_{dd} - V_{th,p}$ ). For node  $V_{out,2}$ , this rate is dictated by the rate of charge of transistor M2 and capacitor C. Here, however, the output voltage charges up linearly until M4 turns off, but then the reset switch pulls  $V_{out,2}$ further to the supply level. This step is not important for the operation of the front-end block, but is of utmost importance for the subsequent blocks, and in particular, the time amplifier where the charge up rate has to be slow enough to allow for the time amplifier to amplify its input phases (or equivalently, the output of the front-end) before they change logical state. Trade-offs exist in the choice of C to minimize the rate of charge up of V<sub>out.2</sub> to allow enough time for the subsequent blocks to respond, but also to minimize the dead time of the circuit; an important parameter in pulse measurement circuit. It is this dead time that determines how fast the frequency of occurrence of the pulses can be, with the proposed circuit resolving the correct measurement quantity.

#### 4.4.5 - Front-End Voltage-Crossing Detector Buffers

Like the edge measurement system case, progressively sized inverters are added in order to drive the loading introduced by the input stage of the time amplifiers. Also, the output of the buffers (rising outputs) are designed to be inverted with respect to the falling outputs of the voltage detectors in order to make use of an NMOS input stage for the time amplifier. As mentioned previously, the NMOS input stage gives better gain characteristics (for a given current) when compared to its PMOS counterpart.

From that point on, and as shown earlier in Figure 4.4 and Figure 4.5, both systems have outputs that are conditioned similarly; two edges of the same polarity carrying the time information to be measured. They can therefore share the same common blocks for backend time-domain processing. The remaining discussion therefore applies to both systems, regardless of whether they will be used for edge measurement or pulse detection.

#### 4.4.6 - Time Amplifier/TDC/Serial Shift-Out

Similar to the on-chip oscilloscope circuitry discussed in details in Chapter 3, here too, with two fast edges carrying the time information, the remaining blocks become identical to those discussed in the previous chapter. They consist of the time amplifier, TDC, and serial-shift-out block sequence. Detailed description of those circuit blocks can be found in Section 3.3.3 and Section 3.3.4. Here we will just highlight the final result that relates the system's digital output word to the time amplifier inputs. The 64 thermometer-coded output can be obtained from the output of the time amplifier according to

$$D_{o} = floor\left(\frac{\Delta T_{out, amplifier}}{LSB_{TDC}}\right), \qquad (4.9)$$

where floor  $\left(\frac{T}{LSB_{TDC}}\right)$  represents the quantization of T by the TDC. Combining (3.4) and (4.9), we get

$$D_{o} = floor\left(\frac{\Delta T_{in, amplifier} \cdot G_{amplifier}}{LSB_{TDC}}\right).$$
(4.10)

The TDC output is then serially shifted out using an integrated parallel-to-serial converter, to a single digital bit easily transferable off-chip. One can easily deduce the single-bit serial output of this system according to

$$D_{o, \text{ serial}} = N - D_{o}, \qquad (4.11)$$

which can then be re-written using (4.10) as

$$D_{o} = N - floor\left(\frac{\Delta T_{in, amplifier} \cdot G_{amplifier}}{LSB_{TDC}}\right), \qquad (4.12)$$

Equation (4.12) relates the serial output bit of the system to the time amplifier inputs, which in turn represent the timing information of the "conditioned" input signal. This equation shows the simplicity of the system behavior and the ease of deducing its input timing characteristics from a measured output.

## 4.5 - Systems Verification

Experimental results will be shown in the next chapter. However, a simulation section was deemed necessary to show the performance of some of the individual blocks in the system. These blocks, and in particular the front-end stages, are difficult to be experimentally tested separately, due to the additional loading effects introduced when exporting the information off-chip. A properly designed buffer can be introduced, but was avoided here.

#### 4.5.1 - Edge Measurement System

The edge measurement voltage-crossing detector, followed by the buffers, time amplifier and phase inversion detectors were all simulated using SpectreS in Cadence, using 0.18  $\mu$ m CMOS. An ideal step input, with various 45% rise times varying between 15.5 ps and 280 ps, was applied to the system. The voltage reference levels were set to 0.5 V and 1.3 V respectively. These voltages constitute the measurement range of the proposed system, corresponding therefore to a {27.7%, 73.3%} rise time measurement system. The linearity of the front-end voltage-crossing or edge-to-time converter (before amplification) is shown in Figure 4.10(a). The corresponding absolute error of the system is shown in Figure 4.10(b). While the absolute error is large, it is also linear, implying a good differential linearity of the proposed system. Consequently, the percentage differential linearity error of the front-end block is shown in Figure 4.10(c). The maximum differential linearity error incurred is ~ 7% and occurs for an input (45%) rise time of 92 ps. Below this limit, the linearity errors of the system can be as high as 31% (not shown), and it improves gradually to better than 1% as the input rise times to be measured increase.



Figure 4.10 (a) Transfer characteristics of the front-end voltage-crossing detector, (b) its absolute error, and corresponding (c) differential linearity error.

Ideally, if this system is to be used in an absolute rise time measurement environment, the slope of the input-output line,  $G_{edge\ detector}$ , should be unity or close to unity. In this case, the gain is approximately 0.75 s/s, and a calibration step is essential if absolute measurements are to be performed in order to quantify the gain error and de-embed it in subsequent measurements. The amplified time difference, at the output of the time

amplifier as a function of the 45% input rise time is shown in Figure 4.11. The inputoutput slope of the time amplifier is linear and is approximately equal to 7.5 s/s. This is in fact the product of the attenuation of the front-end block (0.75 s/s) followed by the time amplifier nominal gain of 10 s/s.



Figure 4.11 System input-output relationship, including the time amplifier.

#### 4.5.2 - Pulse Measurement System

Next, the linearity of the front-end pulse-to-edge converter was simulated. SpectreS in Cadence was used to simulate the front-end block, and which was implemented in a 0.18  $\mu$ m standard CMOS process. An input pulse with full-width-half-magnitude (FWHM) varying between 35 ps and 3 ns was applied to the system. The rise and fall times were set to 10 ps, which albeit very small for the technology under consideration, it was deemed necessary to accommodate the low-end 35 ps pulse. The input-output relationship of the front-end block, showing the absolute percentage error, is then depicted in Figure 4.12. An absolute error of no more than 1.4% can be achieved over a wide time input range of 35 ps

- 3 ns. Only time ranges up to 560 ps are shown since the error gets smaller as the input pulse width increases. It is important to note here that the pulse rise and fall times will directly affect the absolute error of the front-end system. This error, however, is not a source of concern since it translates into a simple offset to the pulse width to be measured. Here it is assumed that the pulse to be measured is always buffered by the same digital buffer, which corresponds to a constant rise/fall times, and therefore, a constant offset or absolute error in the measurement.



Figure 4.12 Simulation results for the front-end pulse detector absolute error as a function of the input FWHM.

## 4.6 - Observation and Comments

#### 4.6.1 - Noise Immunity

Of particular interest is the noise immunity that the circuit is believed to offer over other alternatives. The proposed system relies on edges that carry the information to be measured while having very fast rise times and/or slew rates. This, unlike the analog pulse stretching technique, gives it an added noise immunity advantage. Generally speaking, signals with higher slopes result in less time error than those with slower slopes, when exposed to random voltage noise, such as supply noise. To see this, consider the model [23] shown in Figure 4.13, where time detection for a voltage crossing point is desired. Two cases are considered; the first corresponding to a slowly varying input with a slope of  $s_s$ , and the second corresponding to a fast varying input with a slope of  $s_f$ . If the detection of  $V_{ideal}$ , occurring at time  $T_{ideal}$  is desired, and this ideal voltage is altered by an instantaneous voltage noise, then using simple derivation, and the assumption that the input waveforms are straight lines, the voltage errors in each of the two systems,  $\Delta V_{noise}$ , can be found according to

$$\Delta V_{\text{noise}} = s_{\text{s}} \cdot \delta T_{\text{s}} \tag{4.13}$$

for the slow system, and

$$\Delta V_{\text{noise}} = s_{\text{f}} \cdot \delta T_{\text{f}} \tag{4.14}$$

for the fast system, implying that the time error introduced by the fast system,  $\delta T_f$ , is  $(s_f/s_s)x$  smaller than  $\delta T_s$  of the slow system,

$$\delta T_{f} = \frac{s_{s}}{s_{f}} \cdot \delta T_{s}.$$
(4.15)

A similar phenomenon was observed when pulse stretching with a high pulse stretching ratio was used in [76], where the slowly rising edge was a lot more prone to "spurious oscillations" than the relatively fast varying edge, as would typically occur in a pulse stretching technique. While the presented technique can be seen as some sort of stretching due to time amplification, there is a subtle difference. The information is first converted to fast edges of identical and fast slew rates, and then amplified in time to get outputs, also with high slew rates, which is what gives the proposed system its noise immunity.



Figure 4.13 Effect of voltage noise on the time detection in the cases of signals with high and low slew rates.

Also worth noting that the time amplifier does rely on some form of stretching, so it might be prone to supply noise. However, as was briefly mentioned in Section 3.3.3, techniques can be used to disconnect the supply as it is mainly used for the sole purpose of setting the initial conditions on the four output nodes of the time amplifier differential pairs.

#### 4.6.2 - Effect of Non-Symmetry in Pulse Rise/Fall Times

Here, it is also important to note the role of the pulse rise and fall times on the final results of the pulse measurement system. The proposed system measures accurately the time difference between  $T_1$  and  $T_2$ ,  $\Delta T_{measured}$ , with respect to the threshold voltages, as shown in Figure 4.14(a). If the pulse rise and fall times are identical, then the full-width-halfmagnitude (FWHM) of the pulse,  $\Delta T_{FWHM}$ , would be identical to  $\Delta T_{measured}$ , even though the individual edge locations are shifted by a certain time amount (referred to as  $\delta_{rise}$  and  $\delta_{fall}$  in Figure 4.14(a)). If, on the other hand, they are not identical such that  $T_1$  edge is shifted by  $\delta_{rise}$  and  $T_2$  edge is shifted by  $\delta_{fall}$ , then this will give rise to an error in the measurement, as shown in Figure 4.14(b). This error could be calibrated for provided that the rise and fall times, albeit different, remain constant with different pulses. This was indeed the case in our measurement setup since the digital pulses were properly buffered on-chip before being measured.



Figure 4.14 Effect of the input pulses rise and fall times on the accuracy of the measurement. Shown here for (a) symmetrical and (b) non-symmetrical rise and fall times.

Another situation that could alter the measured results with respect to the actual pulse FWHM manifests itself when the two input pulses are skewed in time with respect to each other as illustrated in Figure 4.15. In this particular case,  $V_{out,2}$  will start to discharge, prematurely, for a short period of time. If this discharge rate is fast enough and/or if the non-overlap period is large enough, the output node can discharge to a low enough level due to a current spike, triggering therefore the subsequent stages for an erroneous measurement. The worst-case time offset needs to be considered, and the front-end designed under these worst case conditions.





Figure 4.15 Illustration of the effect of the time offset between the two differential inputs.

## 4.7 - Conclusions

In this chapter two integrated circuits were presented. The first is for the embedded characterization of digital signal rise/fall time was suggested. The same circuit can also be used to measure analog slew rates since the measurement principle is identical to that of a rise/fall time. The main advantage of such circuit is that it eliminates the need for a frontend sampler, which could become problematic when the input bandwidth lies in the GHz range. Another advantage of such proposed circuit is that for the most part, it relies on time domain processing. The advantages of time processing is two-fold; on one hand it allows designers to take advantage of the recent developments in time amplification, and on the other hand, designers can rely on standard digital cells to design low-resolution TDCs as opposed to resorting to more complex analog designs such as ADCs or TVCs. Simulation results show the feasibility and the linearity of the proposed system. Accurate relative measurements can be achieved while absolute measurements require an additional calibration step due to a gain error in the front-end voltage-crossing detector.

This same technique of voltage-to-time conversion using a front-end circuit, followed by time processing with a core block, the time amplifier, was adopted in another circuit implementation intended for very narrow embedded pulse measurement, and was also presented in details in this chapter. The system proposed relies on converting a differential pulse into (rising) edges with time separation given by the pulse width to be measured. Those edges are then stretched, in time, to levels that are easily detectable by a low resolution TDC, implemented in this work using synthesizable standard digital cells. Simulations show that a good linearity can be achieved for pulses having widths varying between 35 ps - 3ns. The upper limit is limited by the decreased gain efficiency of the time amplifier. However, at such large nanoseconds intervals, the time amplifier might not even be needed any longer, given that most TDC architectures in deep sub-micron technologies can easily detect such times. Nonetheless, a calibration step can be used to extend the usable range of the time amplifier, and therefore the proposed system beyond the few nanoseconds range. The proposed system can find applications in the non-destructive characterization of pulse compression, ultra wideband sampling schemes, pulsed radar systems, measurement of clock or periodic pulse duty cycle, among other applications. Alternatively, the system can be used as a simple extension to existing pulse measurement techniques, where the current system deals with the low-end time dynamic range of the measurement, while other techniques deal with the higher end of the time spectrum.

## **Chapter 5 - Experimental Results**

Experimental results from all three integrated circuits discussed in Chapter 3 and Chapter 4 are presented in this chapter. A 70-GHz effective sampling rate oscilloscope, capable of capturing in-situ interconnect crosstalk is presented, together with the calibration scheme and its measured performance documented. Also in this chapter, measured results show the feasibility of measuring sub-nanosecond rise times, and few tens of picoseconds pulse widths. All these are achieved at only a fraction of the power dissipation of similar techniques.

# 5.1 - On-Chip Oscilloscope IC Implementation and Experimental Results

#### 5.1.1 - IC and Test Setup

The tester core system for embedded analog waveform observability was implemented in a single-poly, six-metal, single 1.8-V supply, 0.18  $\mu$ m digital CMOS technology. An IC photograph is shown in Figure 5.1.



Figure 5.1 IC oscilloscope core photograph.

The total chip area is 1.9 mm x 1.5 mm, while the core circuit occupies an area of ~ 0.45 mm<sup>2</sup>. The main static power dissipation of the circuit stems from the time amplifier, arbiters, and the DVCDs, and is evaluated at ~ 3.5 mW. The IC was mounted on a custom-made two-layer PCB. The PCB was configured to interface with a production-oriented Teradyne A567 mixed-signal tester through separate analog and digital header connections. The use of the mixed-signal tester facilitates the task of input generation, digital back end trigger for the parallel-to-serial converter, and serial shift out, all in a coherent arrangement where one global clock is used to trigger the system. Power supply decoupling, as well as careful ground plane placement and signal routing were adopted to minimize the effects of noise and crosstalk. The PCB top layer was dedicated to the routing of the digital signals while analog signals were routed on the bottom layer. Digital and analog ground planes were used to shield the traces on the top and bottom layers

respectively and minimize therefore the crosstalk. A picture of the PCB is shown in Figure 5.2.



Figure 5.2 Test fixture for the on-chip oscilloscope IC.

#### 5.1.2 - System Calibration and Experimental Results

Based on the detailed calibration scheme outlined in Section 3.5, the measured data is presented in this section. First, the time amplifier-TDC-digital back-end path is calibrated. Experimental results for this step are shown in Figure 5.3. This data reveals a time amplifier gain of ~ 10 s/s, and a TDC time-LSB of 70 ps. This calibration step, as implemented, gives a combined time amplifier gain-LSB<sub>TDC</sub> ratio. The individual block specifications are not necessary for the behavior of the system as it is the collective gain-LSB<sub>TDC</sub> step size that matters. In another IC implementation of the same cell, the TDC step size was measured to be 70 ps, as shown in Figure 5.4, so this value is used here.



Figure 5.3 Corresponding results from step 1 in the calibration, with (a) digital serial output bit, and (b) amplified time output.



Figure 5.4 Experimental results obtained from the TDC cell alone (in another IC implementation to be presented in a later section, Section 5.2.1). The single-shot measured digitized output is shown. Superimposed is the expected serial output code, assuming a 70-ps single-stage delay.

The results from the second calibration used to characterize  $DVCD_{clk}$  are shown in Figure 5.5. Experimental results for this step of the calibration reveal a time interpolation step as small as 14 ps can be measured for an incremental voltage step controlling  $DVCD_{clk}$  equal to about 5 mV. The time difference in this step is indirectly measured using the sequence of blocks that were already calibrated in the first step. The results of this step provide the timing information needed for reconstructing the waveform to be diagnosed.


Figure 5.5 Corresponding results from step 2 in the calibration. Results shown here with the reference voltage of the reference DVCD<sub>clk</sub> cell, V<sub>ref,clk</sub>, set to 1.1 V, and V<sub>var,clk,cal</sub> varying between 1.1 V and 0.9 V in steps of 5 mV. Displayed is the differential delay of the falling edges. Rising edges were measured, and falling edges deduced.

Figure 5.6 shows the results from the final calibration step. Here the S/H -  $DVCD_{sig}$  has been characterized. A fairly linear  $DVCD_{sig}$  was obtained, with a voltage-to-time conversion gain of 3.7 ps/mV in the voltage range it was tested.



Figure 5.6 Corresponding results from step 3 in the calibration. Results shown here with the reference voltage of the reference  $DVCD_{sig}$  cell,  $V_{ref,sig}$ , set to 0.6 V, and  $V_{var,sig,cal}$  varying between ~ 0.6 V and 0.75 V.

Now that the system has been calibrated, the control bit of the analog multiplexer is then set to turn the system into its normal measurement mode. It is important to re-iterate that the advantages and importance of this calibration scheme as it relies on the already existing building blocks, minimizing therefore the additional silicon overhead and cost (with the exception of an analog multiplexer, and a digital control block). It can also be calibrated using low-end test equipment.

#### 5.1.3 - Measurement Mode

In the measurement mode, interconnect crosstalk is generated on-chip and captured using the on-chip oscilloscope. For convenience, the on-chip CUT or test vehicle is recapitulated here. The five T.L. structure adopted is shown in Figure 5.7.



Figure 5.7 CUT: Transmission-line structures for on-chip (far-end) crosstalk measurement. Digital circuitry (not shown) controls which and how many aggressor lines are switched on; either (1) and (2) alone, (3) and (4) alone, or all four aggressor lines switching simultaneously.

This CUT had to be incorporated on-chip to be able to generate and therefore prove the ability to perform the measurement of high-speed transient phenomena. The victim nearend line is set to a DC level that is externally controlled. Greater flexibility is therefore achieved when testing the structure whereby a DC level shifting capability allows for bringing the far-end crosstalk to be measured to within the dynamic range of the subsequent blocks (such as  $DVCD_{sig}$ ). Appropriate digital control blocks are added to control the level of switching activity, and therefore, the amount of crosstalk noise on the victim's far-end node. For the current experiment, three cases were considered, and are listed below in order of increasing far-end switching noise contribution: (I) aggressors 3 and 4 switching, (II) aggressors 1 and 2 switching, and (III) aggressors (1 and 2) and (3 and 4) all switching simultaneously. The measurement results are shown in Figure 5.8. The data is displayed for the three different levels of switching activity on the victim line. Also superimposed in Figure 5.8 are the simulated results for the same five T.L. structure for comparison.



Figure 5.8 Experimental results (-) for different switching activity, and 10-mV increments for DVCD<sub>clk</sub>. Three cases are considered: the two close, far, and all four lines are switching. In all cases, the simulation results (-.) are superimposed. Also shown is the setup used to capture the larger voltages of the crosstalk waveform.

123

For the simulation, a distributed RC was used to model each of the 2-mm lines, as illustrated in Figure 5.9, with unit-size R and C values estimated from the technology parameters. Apart from bandwidth limitations in the front-end sample-and-hold, good matching is achieved between the simulated and experimental results. Although these results are dependent on the modelling adopted for the T.L., the shape and increasing amplitude with increased switching activity confirm the correct functionality of the proposed system. Here it is worth mentioning that a T.L. model that includes the inductive effects is believed to yield better matching between the theoretical and experimental data. Given the high-speed nature of the signal to be captured, simulations seemed to be the only available comparison tool for the validation of the experimental data. Alternatively, once the functionality of the proposed test core is confirmed, the experimental results it then provides can be used for a more accurate transmission line modelling for simulation purposes. Of special note is the system ability to digitize waveforms with 14 ps and 4 mV resolutions in the horizontal (time) and vertical (voltage) domains, respectively. This performance was obtained with a 5-mV incremental voltage for the clock interpolation (DVCD<sub>clk</sub>). It is also important to note that the clock interpolation scheme, and therefore the effective sampling rate, is not uniform with uniform incremental voltages (as shown in Figure 5.5). This however can be easily calibrated out using the calibration scheme outlined earlier.



Figure 5.9 Corresponding distributed T.L. model used in the simulations.

| Building<br>Block<br>Parameters | G <sub>DVCD,clk</sub>                         | G <sub>DVCD,sig</sub>                         | G <sub>amplifier</sub>                              | LSB <sub>TDC</sub> | Time Range<br>before<br>Amplifica-<br>tion<br>(FS <sub>TDC</sub> /G <sub>am-<br/>plifier</sub> ) |
|---------------------------------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------------|--------------------|--------------------------------------------------------------------------------------------------|
|                                 | 2.4  ps/mV<br>(@ V <sub>ref</sub> = 1.1<br>V) | 3.7  ps/mV<br>(@ V <sub>ref</sub> = 0.6<br>V) | 10 s/s                                              | 70 ps <sup>a</sup> | N/A                                                                                              |
| System<br>Parameters            | f <sub>s,min</sub>                            | f <sub>s,max</sub>                            | Voltage Res-<br>olution<br>(∆V <sub>DVCDsig</sub> ) | Voltage<br>Range   | DC Refer-<br>ence Step<br>Size<br>(ΔV <sub>gen,min</sub> )                                       |
|                                 | N/A                                           | 71 GHz                                        | 4 mV                                                | N/A                | 5 mV                                                                                             |
| IC Details                      | Technology                                    |                                               | Active Area                                         | Power Dissipation  |                                                                                                  |
|                                 | 0.18 Micron CMOS                              |                                               | 0.45 mm <sup>2</sup>                                | 3.5 mW             |                                                                                                  |

The measured IC specifications are summarized in Table 5.1.

Table 5.1 - Summary of the measured oscilloscope specifications.

a. In the calibration scheme, the combined time amplifier gain-LSB<sub>TDC</sub> ratio is measured. The individual values listed in Table 5.1 correspond to the experimental data obtained for LSB<sub>TDC</sub> in another IC implementation. From the combined ratio and the LSB<sub>TDC</sub>, the time amplifier gain is deduced.

#### 5.1.4 - Comments

Table 5.1, when compared to Table 3.1, gives an insight as to what was experimentally achieved versus what could have been obtained. In general, good matching is observed. The maximum effective sampling rate was reduced to half what was theoretically possible. While 7 ps is the minimum theoretical time difference the proposed system could digitize, only 14 ps was possible in the experimental measurement phase. This could be attributed

to the noise present in the system, which eventhough is believed to be minimized, can not be eliminated altogether, especially at such fine time intervals.

Also the actual voltage-controlled-delay cell gain was reasonably matched to what is expected in simulations, but not identical. Process variations could have contributed to this slight gain factor difference.

The DC reference step size was chosen to equal 5 mV with no further reduction, despite the fact that 1.75 mV is theoretically possible. There was no additional advantage in using smaller voltage step sizes in this particular test setup chosen for this chip since the aggressor-victim resultant waveform was captured with enough fidelity with a 5-mV incremental step on the clock interpolation  $DVCD_{clk}$ . Decreasing the step size would be more beneficial had additional test setups with more broadband nature been incorporated on-chip.

The time amplifier gain, the TDC step size, and the voltage resolution of the system were almost identical to what is theoretically expected.

Regardless of the matching between the theoretical and experimental data, the fact that the individual block parameters in the system can be easily extracted with a simple calibration scheme is a very useful advantage the proposed system benefits from.

# 5.2 - Rise Time Measurement IC Experimental Results

To gain experimental evidence of the proposed digital techniques highlighted in Chapter 4, both integrated circuits intended for measuring rise times and pulse widths were implemented in a one-poly, six-metal, single 1.8 V supply, 0.18  $\mu$ m digital CMOS technology, with experimental results presented next.

#### 5.2.1 - IC and Test Setup

The basic cell or system comprising of the fast voltage-crossing detector, time amplifier, TDC, and parallel-to-serial converter were incorporated on a single chip. The IC

photograph is shown in Figure 5.10, occupies an area of ~  $0.5 \text{ mm}^2$ , and includes an onchip wide swing biasing circuitry, together with the digital circuitry for parallel-to-serial conversion. The total simulated static or average current dissipation of the whole chip was measured to be 5.5 mA including an on-chip resistor string for voltage generation.



Figure 5.10 Edge measurement IC photograph.

For the test setup, here too, a Teradyne A567 mixed-signal tester was used. A two-layer PCB was first designed to be interfaced and mounted onto the tester's device interface board (DIB). The same PCB was also used to test the pulse measurement system. A picture of the board is shown in Figure 5.11.



Figure 5.11 Test fixture for the edge and pulse measurement ICs.

#### **Experimental Results**

#### 5.2.2 - Measured Results

To verify the design, two steps can be performed: 1) constant rise time is maintained and variable reference voltages applied to the system, or 2) constant reference levels with variable rise times applied to the system. Step 1) has been verified in a preliminary experimental setup.

A constant unknown rise time, generated from a Teradyne A567 mixed-signal tester, was applied to the system, with the low reference level fixed at 0.6 V, and the high reference level varied between 0.9 V and 1.8 V. The input rise time of the test equipment was characterized in [35] to be 1 ns while relying on an integrated under-sampling circuit. With this rise time, and given the reference levels, the simulated curve is shown in Figure 5.12, with experimental results superimposed. Here, the results correspond to the singleshot measured data. The obtained results are shown before and after a calibration step. The calibration was deemed necessary to compensate for any constant time offset that might arise due to non-symmetrical traces or latencies that might arise throughout the input/output datapath. This initial verification step reveals the correct functionality of the proposed system. The experimental data reveals that an attenuation factor of 0.75 s/s was indeed used to obtain the calibrated plot in Figure 5.12. This is in clear agreement with the simulated data presented earlier. In the case of the experimental results, however, this attenuation includes also the PCB traces, and chip packaging and bonding wire inductive effects. Worth noting in Figure 5.12 the deviation from a linear behavior as the high reference voltage increases. This is expected since at such high voltage levels, the exponential rising tail of the input waveform takes effect, reducing the amount of dynamic current and therefore discharge capabilities of the output of the front-end detector.

A variable rise time step generator is needed to perform step 2) in the experimental procedure outlined above, and would ideally be incorporated on-chip for more high-speed measurement capabilities. The on-chip addition of the input generation was not performed in this proof-of-concept circuit implementation. This is not deemed problematic since the pulse measurement system which also relies on identical techniques for processing the information in the time domain has also been fabricated and full experimental results



obtained, as will be shown in the next section. Those results with the preliminary results from the edge measurement chip confirm the proposed design approach.

Figure 5.12 Single-shot experimental results for the edge measurement system with varying reference level (while keeping the other reference level fixed at 0.625 V) and constant rise time.

#### **5.2.3 - Comments**

While the previous section confirms the correct functionality of the edge measurement chip, it turned out to be the most prone to noise and errors in the measurement. Full characterization was not possible due to many factors. First, the long interconnect lines adopted in the layout for the digital back-end circuitry prevented testing of the four cells that were implemented. While only one cell is shown in the IC photograph of Figure 5.10, the chip itself contained four cells, each driven simultaneously with the same input, but with different low and high voltage levels for the front-end crossing detector. More characteristics and additional features were expected to be drawn from having a multi-cell design, but the long interconnects and their relatively poor layout distribution prevented the outputs of the multiple cells from being triggered simultaneously, and hence the obtained data was not conclusive. Also noise made the output data difficult to read in some cases. The biasing voltages for the front-end crossing detector were supplied from off-chip

due to some problems with the on-chip biasing. This certainly had an impact on the noise performance of this chip. Most importantly, the exclusion of an on-chip fast edge generator limited the capabilities of potentially testing this system's performance while capturing digital edges as fast as 100 ps in rise time.

Nonetheless, a proof-of-concept is established with the previously highlighted preliminary IC implementation and experimental setup. When the pulse measurement experimental results are presented in the next section, and given that the philosophy adopted for asynchronous or edge-crossing detection followed by time-based processing techniques is identical in both the pulse and edge measurement chips, the conclusions and claims made in this section will be reinforced further.

# 5.3 - Pulse Receiver IC Implementation and Experimental Results

#### 5.3.1 - IC and Test Setup

An IC photograph of the pulse measurement system is shown in Figure 5.13. The core circuit occupies an area of  $\sim 0.9 \text{ mm}^2$ . The IC was mounted on a custom-made two-layer PCB, shown earlier in Figure 5.11. The PCB was configured to interface with a Teradyne A567 mixed-signal tester through separate analog and digital header connections. Here too, the use of the mixed-signal tester facilitates the task of input generation, digital back end trigger, and serial shift out, all in a coherent arrangement where one global clock is used to trigger the system. Time differences that are tightly controlled can also be generated using the mixed-signal tester.



Figure 5.13 IC microphotograph for the pulse measurement system.

The experimental setup is illustrated graphically in Figure 5.14. For testing purposes and in an attempt to incorporate design for testability into the system, and given that it is easier to generate closely controlled edges using the mixed-signal tester, appropriate on-chip digital logic, consisting of a simple exclusive-OR (XOR) gate and digital buffers, were incorporated on-chip. This digital logic was used to convert the externally applied edges into differential pulses, which were then measured using our proposed on-chip system. Care was taken when designing the differential input traces which were drawn (on board and on the chip) to be as matched as possible.



Figure 5.14 Experimental setup example, shown here for the pulse measurement integrated circuit.

Simulations show that a minimum time difference  $\Delta T_w$  of ~ 80 ps can be converted into a differential pulse with appropriate width W. This lower limit of 80 ps is therefore a limitation on the test setup and not necessarily on the proposed circuit itself. Circuits that can generate such narrow pulses, and in particular, circuits enabling the sharpening of the pulse rise/fall time edges are necessary to test the system below the 80 ps range. This was not incorporated in the current design and could constitute future work. However, the simulation results shown earlier in Figure 4.12 confirm that if an almost ideal pulse of width as small as 35 ps is injected directly to the inputs of the proposed circuit, it can still be detected with our proposed system.

#### 5.3.2 - Measured Data and Discussion of Results

Experimental verification reveals correct functionality and good linearity. Input time differences varying from 78 ps to 546 ps, in steps of 78 ps, were applied to the system. The A567 mixed-signal tester master clock can be set to a maximum frequency of 200 MHz.

The minimum delay that can be generated with this setting is given by 200MHz/64, which in this case is equal to 78 ps. This lower limit is not problematic since it is in agreement with the lower limit that the system can detect, given the input design-for-testability strategy adopted for on-chip differential pulse generation. The digital output, serially shifted out using the on-chip parallel-to-serial converter, was captured. Of particular interest to the single bit captured from the system is the bit location where the serial output switches from logic 1 to logic 0. This was experimentally recorded for inputs varying between 78 ps and 546 ps, in steps of 78 ps, as explained earlier.

A supply of 1.6 V was used for all of our testing. The supply value allowed for a degree of control and flexibility over the back-end timing measurement circuitry, including the time amplifier, the TDC, and the serial shift. Experimental results reveal a measured time amplifier gain of 8 s/s, a TDC buffer delay of 90 ps<sup>1</sup>, and a time offset in the output equal to ~ 990 ps. The time offset is mainly due to latency effects, and in particular, the non-symmetrical loading between the Start and Stop signals at the inputs of the TDC. This is because the Stop signal drives the capacitive load of 64 DFFs, whereas the capacitive loading of the Start signal is that of a single delay stage. Appropriate on-chip buffering, or the addition of dummy capacitive load to the Start signal, could have eliminated or minimized this offset effect. Mismatches in the layout path also contribute to this offset, and in particular, before the time amplifier which will amplify it. Nonetheless, this offset is constant for all inputs and can be calibrated for (as was done in this work). The measured digital output code as a function of the input pulse width is shown in Figure 5.15.

<sup>1.</sup> The combined time amplifier gain-LSB<sub>TDC</sub> ratio is experimentally measured. The individual values listed above correspond to the expected simulated values under the supply and biasing voltage conditions adopted in this experimental setup.



Figure 5.15 Measured transfer characteristic of the proposed system, before and after calibration, using only one measurement.

Another curve that accounts for the variation in the time amplifier gain as a function of its inputs is superimposed, and will be referred to as our ideal curve. Here, it is important to re-iterate that the time amplifier characteristics, and in particular its gain, is input dependent, and degrades as the difference to be amplified increases. This is illustrated graphically in Figure 5.16 which represents a typical simulation curve showing how the gain decreases as the input time separation increases. While the curve does not represent a constant gain as one would expect from a voltage gain block, for example, it was still considered ideal since this is an effect know apriori (and reveals itself in simulations). Nonetheless, this is not problematic since in all cases, a calibration step is usually needed for the time amplifier block. With such a calibration step performed, the actual gain of the amplifier for every input time difference is stored. The advantages on the other hand lie in making small pulses easily digitized, due to the time amplification step, using low-



resolution TDCs. This makes the task of capturing and measuring fast pulses easier without resorting to the power overhead or circuit complexity of TVCs and/or ADCs.

Figure 5.16 Typical time amplifier gain transfer characteristics, as a function of its inputs time differences, showing a typical (simulated) gain decrease with increase in input time separation.

The results shown in Figure 5.15 were obtained from a single-shot measurement. To isolate the effect of any potential jitter in the input generation, or voltage supply noise, each data point on the input-output transfer curve was collected 33 times, with average (or mean) recorded. The averaged results are shown in Figure 5.17. Here, too, calibration is performed to isolate the effect of the constant time offset. A line showing the expected or simulated transfer characteristic is overlaid, showing the excellent matching between the simulation and experimental results. The worst case standard deviation ( $\sigma$ ) among all the data points collected was measured to be 1.35x the delay through a single delay stage in the TDC; 90 ps in our current design. This is equivalent to an input referred error of ~ 15 ps. This error is expected to decrease drastically with a much higher time amplifier gain. The total simulated average current dissipation of the whole system (including the time

amplifier and its associated biasing circuitry, the two phase inversion detectors, and the front-end block, which consumes mainly dynamic power) was measured to be 6.84 mW.



Figure 5.17 Measured transfer characteristic of the proposed system, before and after calibration, using the average of 33 measurements (error bars are also superimposed).

#### 5.3.3 - Additional Comments

Comparing Figure 5.15 and Figure 5.17, we find that the data from the average of 33 measurements gives superior matching between the theoretical and experimental results. Nonetheless, the results obtained from a single measurement and without any averaging does in fact confirm the noise robustness of the proposed system. While the results are improved with averaging, as one would normally expect, the single-shot single-measurement data is still good enough to provide good insight into the system and even be able to extract all the data needed from it such as the time amplifier gain, the front-end gain if different from unity, etc. The averaged results however are more useful if a higher level of measurement confidence is desired for the system parameters, or if the actual

noise contribution of the system and the external test setup needs to be quantified. As a result, both test setups were deemed necessary and were conducted in those experiments.

The pulse measurement IC was designed to be self-contained with very little external signals needed. This is a trade-off made at the submission stage of the IC for fabrication; extra external inputs and signals give a greater flexibility in testing and debugging, but could prevent testing at the speed limit the IC was designed for. On the other hand, a self-contained chip with little external inputs, and performing the differential pulse generation on-chip in this case allowed testing at-speed, but with little insight achieved when debugging. Luckily, the IC performed almost identically to what was expected, with the exception of the need for a decrease in the supply voltage from 1.8 V to 1.6 V. This was deemed necessary to have some flexibility over the time amplifier's gain, and its DC biasing settings. The resulting absolute numbers achieved per circuit block were slightly off from what was expected, but as a whole, the system performed exceptionally well, with a large immunity to noise.

### 5.4 - Experimental Results Summary

The experimental results presented in this chapter confirm the ability to perform timemode high-speed signal measurements. Some circuits performance was more superior than others. For example, the 70-GHz on-chip scope confirms the ability to digitize signals in time with a 4 mV voltage step or resolution, and measure on-chip clock interpolation time difference as small as 14 ps. Those results are close enough to what was designed for. The question arises as to whether the proposed technique can perform as well with higher required resolution and effective sampling speed. Section 3.4 highlighted some of the trade-offs involved in the choice of the different circuit parameters to improve the performance, but only experimentally would it be possible to confirm or justify the choice.

The pulse measurement system confirmed the ability to digitize an 80 ps pulse. The close matching between the averaged and the single-shot measurements implies the noise robustness of the circuit. The drawback of this system, as it was designed, was the necessity for testing at 1.6 V supply, while the typical supply should have been 1.8 V in

this technology. While it is impossible to isolate one reason for this need of supply decrease, it is believed that the time amplifier, and in particular, its biasing setup which is deduced from the supply required this decrease. The gain was also directly dependent on the supply voltage, making its variation necessary to explore different time amplifier gains and setups. Nonetheless, a single-shot measurement of a pulse with width as low as a few tens of picoseconds, while dissipating only a few milliWatts of power constitute a great advancement to the state-of-the-art techniques.

The edge measurement system, on the other hand, gave reasonable results, but only in a preliminary test setup. More elaborate tests were unfortunately not possible to be conducted in the current configuration of the IC. Noise and long interconnect wires were poorly distributed on the chip, which could have contributed to the additional noise in the circuit. The on-chip biasing had to be disconnected and external biasing voltages were provided to the chip, which possibly constitute another noise contribution affecting the final results. Nonetheless, data was captured for a single edge of 1 ns using the designed IC, and the experimental results were reasonable to confirm the correct functionality of the implemented system.

Overall, the results from the three ICs presented in this section confirm the ability to perform multi-GHz signal capture with simple circuits. The designs were all performed as an individual effort, confirming the simplicity of the circuits, and performed as reported in this chapter from a first fabrication spin, also reinforcing the simplicity of the proposed approach.

## **Chapter 6 - Concluding Remarks**

## 6.1 - Thesis Findings Summary

In the preceding chapters, a time-based approach for capturing GHz signals was presented. The proposed approach was applied to many tasks of on-chip signal capture, ranging from sub-nanosecond edge measurement, to picoseconds pulse width characterization, to a multi-GHz on-chip oscilloscope.

All three systems were developed, designed, built and tested. Experimental results reveal the correct functionality of all the proposed systems, with outstanding performance in some cases. Edges as small as 1 ns were captured in a single-shot arrangement. It is believed the rise time measurement system can easily measure rise times of few tens to hundreds of picoseconds, had the IC included an on-chip edge generation circuitry. Also, more careful routing for the different cells and appropriate digital buffering should have been used for the multi-cell approach to be useful. In the pulse measurement system, pulses as narrow as 78 ps were successfully measured. The high-end time dynamic range was limited to 546 ps, due to the TDC size, but could easily be increased with more TDC stages, or other time measurement units architectures investigated. Finally, a 70-GHz on-chip oscilloscope was successfully tested, allowing the measurement of interconnect crosstalk, with variable strengths (amplitude) and variable duration. Here too, the time dynamic range is limited due to the memory size of the TDC. The proposed system is fully

calibratable, making the building blocks non-idealities and non-linearities less of a concern.

### 6.2 - Recommendation for Future Work

From the work presented in this thesis, the time-based approach for signal processing seems like an attractive alternative, to say the least, to perform signal capture. It is believed this type of processing can open up the door to handle more aggressive design and test challenges in the future.

The work presented here constitutes the first milestone towards a full time-based on-chip test system. While many different functionalities have been demonstrated, others remain untackled. A few directions could therefore be taken for further analysis:

✤ The use of time amplification to ease the requirements on the subsequent time measurement blocks deserve more full investigation. For that, a few directions could be taken:

- Reduction in the dead time processing of the time amplifier can be attempted to allow for a real-time operation. In the current design, this dead time was around 20 ns, placing an upper limit on the incoming data rate that can be handled in a single-shot approach.
- Time amplification parallelism could also be investigated, whereby many time amplifier cells are incorporated, allowing therefore for a more throughput handling capabilities.
- Adaptive, controllable, or tunable time amplification gain is another attractive feature of the time amplifier that could be investigated for more degrees of freedom in the design and a wider range of flexibility in the applications which could make use of such time-based concepts.

✤ The on-chip oscilloscope, in its current implementation, relies on the DC and/or ramp generation off-chip. Inclusion of those on-chip is a straightforward extension of the current work. Nonetheless, it will make the on-chip oscilloscope a more complete system with all built-in components. It will also allow to investigate the effect of including those on-chip on the overall performance of the scope. Delta-sigma bitstream generation with low-pass filtering can be adopted for the DC generation. Alternatively, ramp generation techniques using an altera FPGA can also be investigated as was briefly discussed in [83].

◆ Other more long-term directions that could be taken is trying to perform the signal capture of mixed-signal waveforms without relying on a front-end sample-and-hold altogether, as was demonstrated for the digital measurement systems. This will have many advantages such as the elimination of the bandwidth and distortion constraints of the sample-and-hold, a concern that arose in the current implementation of the on-chip scope.

✤ The use of time-domain processing techniques for a multitude of other applications, in particular for jitter estimation and testing is also worth investigating, as was briefly discussed in [84].

Obviously, a lot can be done to take this work further. The work could either be extended in a straightforward manner with the inclusion of some external blocks on-chip, or in tackling a different range of applications. Either way, this dissertation verifies that applying time-based processing to a multitude of otherwise difficult tasks is possible. Experimental results are also very promising with little design effort, and while relying on simple circuit components and simple calibration techniques to correct for circuit nonlinear and far-from-ideal behaviour.

# References

| [1] | The 1997 national Technology Roadmap for Semiconductors, Semiconductors Industry Association, San Jose, California, 1997.                                                                                                                                                                                      |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [2] | B. Veillette and G. W. Roberts, "On-Chip Measurement of the Jitter Transfer Function of Charge Pump Phase-Locked Loops," <i>IEEE Journal of Solid-State Circuits</i> , vol. 33, no. 3, pp. 483-491, 1998.                                                                                                      |
| [3] | M. Oulmane and G. W. Roberts, "A CMOS Time-Amplifier for Femto-Second Resolution Timing Measurement," <i>IEEE International Symposium on Circuits and Systems</i> , pp. 509-512, 2004.                                                                                                                         |
| [4] | F. F. Tsui, LSI/VLSI Testability Design, McGraw Hill, 1986.                                                                                                                                                                                                                                                    |
| [5] | P. H. Bardell, W. H. McAnney, and J. Savir, Built-In Test for VLSI: Pseudorandom Techniques, John Wiley & Sons Inc., New York, 1987.                                                                                                                                                                           |
| [6] | B. Davis, The Economics of Automatic Testing, McGraw Hill (UK), 1982.                                                                                                                                                                                                                                          |
| [7] | M. F. Toner and G W. Roberts, "A BIST Scheme for an SNR, Gain Tracking, and Frequency Response Test of a Sigma-Delta ADC," <i>IEEE Transactions on Circuits and Systems – II: Analog and Digital Signal Processing</i> , vol. 42, no. 1, pp. 1-15, 1995.                                                       |
| [8] | A. Grochowski, D. Bhattacharya, T. R. Viswanathan, K. Laker, "Integrated Circuit Testing for Quality Assurance in Manufacturing: History, Current Status, and Future Trends," <i>IEEE Transactions on Circuits and Systems – II: Analog and Digital Signal Processing</i> , vol. 44, no. 8, pp. 610-633, 1997. |
| [9] | S. Sunder, "A Low Cost 100 MHz Analog Test Bus", <i>IEEE Very-Large-Scale-Integration Test Symposium</i> , pp. 60- 63, 1995.                                                                                                                                                                                   |

- [10] A. Osseiran, "Getting to a Test Standard for Mixed-Signal Boards," *IEEE Midwest Symposium on Circuits and Systems*, pp. 1157-1161, 1995.
- [11] M. R. DeWitt, G. F. Gross Jr., and R. Ramanchandran, "Built-In Self-Test for Analog to Digital Converters," U.S. Patent No. 5 132 685, 1992.
- [12] B. R. Veillette and G. W. Roberts, "A Built-In Self-Test Strategy for Wireless Communication Systems," *IEEE International Test Conference*, pp. 930-939, 1995.
- [13] M. M. Hafed, N. Abaskharoun, and G. W. Roberts, "A 4 GHz Effective Sample Rate Integrated test Core for Analog and Mixed-Signal Circuits," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 4, pp. 499-514, 2002.
- [14] K. F. Zimmermann, "SiPROBE A New Technology for Wafer Probing," *IEEE International Test Conference*, pp. 106-112, 1995.
- [15] J. Tierney, C. M. Rader, and B. Gold, "A Digital Frequency Synthesizer," *IEEE Transactions on Audio and Electroacoustic*, vol. 19, pp. 48-57, 1971.
- [16] L. Bruton, "Low Sensitivity Digital Ladder Filters," *IEEE Transactions on Circuits and Systems*, vol. 22, no. 3, pp. 168-176, 1975.
- [17] A. K. Lu, G. W. Roberts, and D. A. Johns, "High-Quality Analog Oscillator Using Oversampling D/A Conversion Techniques," *IEEE Transactions on Circuits and Systems - II: Analog and Digital Signal Processing*, vol. 41, no. 7, pp. 437-444, 1994.
- [18] M. F. Toner and G. W. Roberts, "Towards Built-In-Self-Test for SNR Testing of a Mixed- Signal IC," *IEEE International Symposium on Circuits* and Systems, pp. 1599-1602, 1993.
- [19] A. K. Lu and G. W. Roberts, "An Analog Multi-Tone Signal Generation for Built-In-Self- Test Applications," *IEEE International Test Conference*, pp. 650-659, 1994.
- [20] X. Haurie and G. W. Roberts, "Arbitrary Precision Signal Generation for Bandlimited Mixed-Signal Testing," *IEEE International Test Conference*, pp.78-86, 1995.
- [21] B. Veillette and G. W. Roberts, "High-Frequency Signal Generation Using Delta-Sigma Modulation Techniques," *IEEE International Symposium on Circuits and Systems*, pp. 637- 640, 1995.

- [22] E. M. Hawrysh and G. W. Roberts, "An Integration of Memory-Based Analog Signal Generation into Current DFT Architectures," *IEEE International Test Conference*, pp. 528- 537, 1996.
- [23] M. Burns and G. W. Roberts, An Introduction to Mixed-Signal IC Test and Measurement, Oxford University Press, 2001.
- [24] B. Dufort and G. W. Roberts, "On-Chip Signal Generation for Mixed-Signal Built-In Self Test," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 3, pp. 318-330, 1999.
- [25] K. P. Parker, J. E. McDermid, and S. Oresjo, "Structure and Metrology for an Analog testability Bus," *IEEE International Test Conference*, pp. 309-322, 1993.
- [26] P. Larsson, S. Svensson, "Measuring High-Bandwidth Signals in CMOS Circuits," *Electronics Letters*, vol. 29, no. 20, pp. 1761-1762, 1993.
- [27] A. Hajjar and G. W. Roberts, "A High Speed and Area Efficient On-Chip Analog Waveform Extractor," *IEEE International Test Conference*, pp. 688-697, 1998.
- [28] A. E. Stevens, R. Van Berg, J. Van Der Spiegel, and H. H. Williams, "A Time-to-Voltage Converter and Analog Memory for Colliding Beam Detectors," *IEEE Journal of Solid-State Circuits*, vol. 24, no. 6, pp. 1748-1752, 1989.
- [29] R. L. Sumner, "Apparatus and Method for Measuring Time Intervals with Very High Resolution," U.S. Patent 6 137 749, 2000.
- [30] T. E. Rahkonen and J. T. Kostamovaara, "The Use of Stabilized CMOS Delay Lines for the Digitization of Short Time Intervals," *IEEE Journal of Solid-State Circuits*, vol. 28, no. 8, pp. 887-894, 1994.
- [31] P. Chen and S. Liu, "A Cyclic CMOS Time-to-Digital Converter with Deep Sub- Nanosecond Resolution," *IEEE Custom Integrated Circuits Conference*, pp. 605-608, 1999.
- P. Dudek, S. Szczepanski, J. Hatfield, "A CMOS High Resolution Time-to-Digital Converter Utilizing a Vernier Delay Line," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 2, pp. 240-247, 2000.
- [33] J. Kang, W. Liu, R. K. Cavin III, "A CMOS High-Speed Data recovery Circuit Using the Matched Delay Sampling Technique," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 10, pp. 1588-1596, 1997.

- [34] P. Andreani, F. Bigongiari, R. Roncella, R. Saletti, P. Terreni, A. Bigongiari, and M. Lippi, "Multihit Multichannel Time-to-Digital Conversion with +/- 1% Differential Nonlinearity and Near Optimal Time Resolution," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 4, pp. 650-656, 1998.
- [35] N. Abaskharoun and G. W. Roberts, "Circuits for On-Chip Sub-Nanosecond Signal Capture and Characterization," *IEEE Custom Integrated Circuits Conference*, pp. 251-254, 2001.
- [36] J. G. Maneatis, "Low-Jitter Process Independent DLL and PLL Based on Self-Biased Techniques," *IEEE Journal of Solid-State Circuits*, vol. 31, no. 11, pp. 1723-1732, 1996.
- [37] A. Chan and G. W. Roberts, "A Jitter Characterization System Using a Component-Invariant Vernier Delay Line," *IEEE Transactions on Very Large Scale Integration*, vol. 12, no. 1, pp. 79-95, 2004.
- [38] M. Takamiya, H. Inohara, and M. Mizuno, "On-Chip Jitter-Spectrum-Analyzer for High- Speed Digital Designs," *IEEE International Solid-State Circuits Conference*, pp. 350-532, 2004.
- [39] T. Yamaguchi, M. Ishida, M. Soma, K. Ichiyama, K. Christian, K. Oshawa, and M. Sugai, "A Real Time Jitter Measurement Board for High-Performance Computer and Communication Systems," *IEEE International Test Conference*, pp. 77-84, 2004.
- [40] H. Lin, K. Taylor, A. Chong, E. Chan, M. Soma, H. Haggag, J. Huard, and J. Braat, "CMOS Built-In Test Architecture for High-Speed Jitter Measurement Technique," *IEEE International Test Conference*, pp. 67-76, 2003.
- [41] K. Taylor, B. Nelson, A. Chong, H. Nguyen, H. Lin, M. Soma, H. Haggag,
  J. Huard, and J. Braatz, "Experimental Results for High-Speed Jitter Measurement Technique," *IEEE International Test Conference*, pp. 85-94, 2004.
- [42] M. Ishida, K. Ichiyama, T. Yamaguchi, M. Soma, M. Suda, T. Okayasu, D. Watanabe, and K. Yamamoto, "Programmable On-Chip Picosecond Jitter-Measurement Circuit without a Reference-Clock Input," *IEEE International Solid-State Circuits Conference*, pp. 512-514, 2005.
- [43] B. Anulai, A. Rylyakob, S. Rylov, and A. Hajimiri, "A 10 Gb/s Eye-Opening Monitor in 0.13 μm CMOS," *IEEE International Solid-State Circuits Conference*, pp. 332-334, 2005.

- [44] A. M. Abas, A. Bystrov, D. J. Kinniment, O. V. Maevsky, G. Russell, and
  A. V. Yakovlev, "Time Difference Amplifier," *Electronics Letters*, vol. 38, no. 23, pp. 1437-1438, 2002.
- [45] B. Veillette and G. W. Roberts, "Stimulus Generation for Built-in-Self-Test of Charge- Pump Phase-Locked-Loops," *IEEE International Test Conference*, pp. 397-400, 1997.
- [46] V. Gutnik, "Analysis and Characterization of Random Skew and Jitter in a Novel Clock Network," Ph.D. Dissertation, *Massachusetts Institute of Technology*, USA, 2000.
- [47] V. Gutnik and A. Chandrakasan, "On-Chip Time Measurement," *IEEE Symposium on VLSI Circuits*, pp. 52-53, 2000.
- [48] P. Levine and G. W. Roberts, "A High-Resolution Flash Time-to-Digital Converter and Calibration Scheme," *IEEE International Test Conference*, pp. 1148-1157, 2004.
- [49] M. Hafed and G. W. Roberts, "A 5-Channel, Variable Resolution, 10-GHz Sampling Rate Coherent Tester/Oscilloscope IC and Associated Test Vehicles," *IEEE Custom Integrated Circuits Conference*, pp. 621-624, 2003.
- [50] S. Delmas-Bendhia, F. Caignet, E. Sicard, and M. Roca, "On-Chip Sampling in CMOS Integrated Circuits," *IEEE Transactions on Electromagnetic Compatibility*, vol. 41, no. 4, pp. 403-406, 1999.
- [51] E. Alon, V. Stojanovic, and M. Horowitz, "Circuits and Techniques for High-Resolution Measurement of On-Chip Power Supply Noise," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 4, pp. 820-828, 2005.
- [52] M. Nagata, M. Fukazama, N. Hamanishi, M. Shiochi, T. Lida, J. Watanabe,
  M. Murasaka, and A. Iwata, "Substrate Integrity Beyond 1 GHz," *IEEE International Solid-State Circuits Conference*, pp. 266-268, 2005.
- [53] J. Ferrario, R. Wolf, S. Moss, and M. Slamani, "A Low-Cost Test Solution for Wireless Phone RFICs," *IEEE Communications Magazine*, vol. 41, no. 9, pp. 82-88, 2003.
- [54] M. Rehani, D. Abercrombie, R. Madge, J. Teisher, and J. Saw, "ATE Data Collection – A Comprehensive Requirements Proposal to Maximize ROI of Test," *IEEE International Test Conference*, pp. 181-189, 2004.

- [55] A. Strojwas and J. Kibarian, "Design for Manufacturability in the Nanometer Era: System Implementation and Silicon Results," *IEEE International Solid-State Circuits Conference*, pp. 268-269, 2005.
- [56] M. LaPedus, "Intel's 'Casual Learning Algorithm' to reduce IC Test Costs," EE Times, May 6, 2004.
- [57] M. Li, "Is 'Design to Production' the Ultimate Answer for Jitter, Noise, and BER Challenges for Multi Gb/s ICs?," *IEEE International Test Conference*, pp. 1433, 2004.
- [58] M. M. Hafed and G. W. Roberts, "Techniques for High-Frequency Integrated test and Measurement," *IEEE Transactions on Instrumentation and Measurement*, vol. 52, no. 6, pp. 1780-1786, 2003.
- [59] M. Takamiya, M. Mizuno, and K. Nakamura, "An On-Chip 100GHz-Sampling Rate 8-channel Sampling Oscilloscope with Embedded Sampling Clock Generator," *Proc. IEEE International Solid-State Circuits Conference*, vol. 1, pp. 182-458, 2002.
- [60] E. Sicard, S. Delmas, F. Caignet, R. De Smedt, T. Steinecke, and J. G. Ferrante, "A Cooperative Research for Experimental Characterization of Signal Integrity in Deep Submicron Integrated Circuits," *Proc. IEEE International Symposium on Electromagnetic Compatibility*, vol. 1, pp. 361-364, August 1999.
- [61] Y. Zheng and K. L. Shepard, "On-Chip Oscilloscope for Noninvasive Time-Domain Measurement of Waveforms in Digital Integrated Circuits," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 11, no. 3, pp. 336-344, June 2003.
- [62] K. Inagaki, D. D. Antono, M. Takamiya, S. Kumashiro, and T. Sakurai, "A 1-ps Resolution On-Chip Sampling Oscilloscope with 64:1 Tunable Sampling Range Based on Ramp Waveform Division Scheme," *Proc. IEEE Very-Large-Scale-Integration Symposium*, pp. 61-62, 2006.
- [63] C. Taillefer and G. W. Roberts, "Process-Insensitive Modulated-Clock Voltage Comparator," *Proc. IEEE International Symposium on Circuits and Systems*, pp. 3910-3913, 2006.
- [64] H. Pekau, A. Yousif, and J. W. Haslett, "A CMOS Integrated Linear Voltage-to-Pulse-Delay-Time Converter for Time Based Analog-to-Digital Converters," *Proc. IEEE International Symposium on Circuits and Systems*, pp. 2373-2376, 2006.

- [65] C. Ljuslin, J. Christiansen, A. Marchioro, and O. Klingsheim, "An integrated 16-channel CMOS time to digital converter," *IEEE Transactions on Nuclear Science*, vol. 41, pp. 104-108, Aug. 1994.
- [66] N. Sayiner, H. V. Sorenson, and T. R. Viswanathan, "A level-Crossing Sampling Scheme for A/D Conversion," *IEEE Transactions on Signals and Systems - II: Analog and Digital Signal Processing*, vol. 43, no. 4, pp. 335-339, 1996.
- [67] E. Allier, G. Sicard, L. Fesquet, and M. Renaudin, "A New Class of Asynchronous A/D Converters Based on Time Quantization," *Proc. IEEE International Symposium on Asynchronous Circuits and Systems*, pp. 196-205, 2003.
- [68] J.-L. Huang, K. -T. Chang, "An On-Chip Short-Time Interval Measurement Technique for Testing High-Speed Communication Links," *in Proc. IEEE Very-Large-Scale Integration Test Symposium*, pp. 380-385, April/May 2001.
- [69] S. Sunter, A. Roy, and J. -F. Côté, "An Automated, Complete, Structural Test Solution for SERDES," *in Proc. IEEE International Test Conference*, pp. 95-104, October 2004.
- [70] S. L. Lin and S. Mourad, "On-Chip Rise Time Measurement," *IEEE Transactions on Instrumentation and Measurement*, vol. 53, no. 6, pp. 1510-1516, 2004.
- [71] M. J. W. Rodwell et al., "Active and Nonlinear Wave Propagation Devices in Ultrafast Electronics and Optoelectronics," *Proceedings of the IEEE*, vol. 82, pp. 1037-1059, July 1994.
- [72] E. Afshari and A. Hajimiri, "Non-Linear Transmission Lines for Pulse Shaping in Silicon," *Proc. IEEE Custom Integrated Circuits Conference*, pp. 91-94, 2003.
- [73] L. Callewaert, W. Eychmans, W. Sansen, E. Gerds, V. Budihartono, F. M. Newcomer, R. P. Van Berg, J. Van Der Spiegel, S. Tedja, and H. H. Williams, Front End and Signal Processing Electronics for Detectors at High Luminosity Colliders, *IEEE Transactions on Nuclear Science*, vol. 36, pp. 446-457, 1989.
- [74] Y. Arai, T. Matsumara, and K. Endo, A CMOS Four-Channel x 1K Time Memory LSI with 1-ns/b Resolution, *IEEE Journal of Solid-State Circuits*, vol. 27, no. 3, pp. 359-364, 1992.

- [75] E. Raisanen-Rutsalainen, T. Rahkonen, and J. Kostamovaara, Van Berg, J. Van Der Spiegel, and H. H. Williams, "A Time Digitizer with Interpolation Based on Time-to-Voltage Conversion," *in Proc. IEEE International Symposium on Circuits and Systems*, pp. 197-200, 1997.
- [76] K. Park and J. Park, "Time-to-Digital Converter of Very High Pulse Stretching Ratio for Digital Storage Oscilloscopes," *Review of Scientific Instruments*, vol. 70, no. 2, 1568-1574, 1999.
- [77] D. M. Binkley and M. E. Casey, "Performance of Fast Monolithic ECL Voltage Comparators in Constant-Fraction Discriminators and Other Timing Circuits," *IEEE Transactions on Nuclear Science*, vol. 35, no. 1, pp. 226-230, February 1988.
- [78] H. Lim and J. Park, "Comparison of Time Correction Using Charge Amounts, Peak Values, Slew Rates, and Signal Widths in Leading-Edge Discriminators," *Review of Scientific Instruments*, vol. 74, no. 6, 3115-3119, 2003.
- [79] T. Ruotsalainen, P. Palojarvi, and J. Kostamovaara, "A Wide Dynamic Range Receiver Channel for a Pulsed Time-of-Flight Laser Radar," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 8, pp. 1228-1238, 2001.
- [80] A. Kilpela, J. Ylitalo, K. Maatta, and J. Kostamovaara, "Timing Discriminator for Pulsed Time-of-Flight Laser Ragefinding Measurements," *Review of Scientific Instruments*, vol. 69, no.5, 1978-1984, 1998
- [81] T. Paulus, "Timing Electronics and Fast Timing Methods with Scintillation Detectors," *IEEE Transactions on Nuclear Science*, vol. 32, pp. 1242-1249, June 1985.
- [82] M. J. Loinaz and B. A. Wooley, "A CMOS Multichannel IC for Pulse Timing Measurements," *IEEE Journal of Solid-State Circuits*, vol. 30, no. 12, pp. 1339-1349, 1995.
- [83] P. Salib, "Test Core for On-Chip Metal-Oxide-Semiconductor Capacitance Measurement," Master's thesis, *McGill University*, Canada, August 2005.
- [84] B. Veillette, "On-Chip Characterization of Charge-Pump Phase-Locked Loops," Ph.D. thesis, *McGill University*, Canada, March 1998.