

## A 0.2-to-2GHz Time-Interleaved Multi-Stage Switched-Capacitor Delay Element Achieving 448.6ns Delay with 175.9x Programmable Range and 330ns/mm<sup>2</sup> Area Efficiency

Full-duplex (FD) wireless has seen significant interest to increase spectral efficiency as the transmitter (TX) and receiver (RX) operate at the same frequency and time [1-3]. Large (>100dB) cancellation of the TX signal arriving in-band at the RX input through limited TX-RX isolation and radar-like reflections from objects is required for deployment. Recent efforts utilized FIR time-domain equalizer-based self-interference cancellation (SIC) techniques, which employ RF delay elements and gain weighting for SIC [1-3]. While >100ns is desired to cover a full FD delay spread (50ft reflection), RF delay has been limited to <8ns [1-6] (4ft reflection), leaving TX reflections from close objects to saturate the RX before additional cancellation in the baseband is applied (e.g. radar equation shows 900MHz 27dBm TX reflects >1dBm from single car to RX at 4ft (8ns), >-27dBm at 20ft (40ns), >-42dBm at 50ft (100ns)). Achieving >100ns RF delay has been achieved in acoustic delay lines but with narrow bandwidth and no programmable delay, limiting applications to large radar and communication systems. This paper introduces a time-interleaved multi-stage switched-capacitor (TIMS-SC) delay element which, at sample rate  $F_s=3.3\text{GHz}$ , achieves (i) delay up to 448.6ns (58x compared to [1] and 264x compared to [5]) across a bandwidth of 0.2-to-2GHz, (ii) 330ns/mm<sup>2</sup> area efficiency (9x compared to [1] and 42x compared to [6]), and (iii) covers a 175.9x delay range.

Past delay element approaches include delay line (e.g. [4]), gm-C filter [6], and switched-capacitor (SC) [3] techniques (Fig. 1). Gm-C filters achieved >20x increase in area efficiency [6] compared to delay lines but are limited to <2ns delays [5]. SC delays achieved another >5x increase in area efficiency, but achieve <8ns RF delay [1], limited by complexity of RF clock generation, capacitive loading with increased number of SC cells, and sample leakage through large sampling switches. The proposed technique employs time-interleaving in multiple stages of SC circuits to achieve >100ns delay by: (i) decreasing sample leakage through time expansion in stage 2, (ii) reducing RF loading, and (iii) simplifying RF clock generation.

Fig. 2 shows the functional and timing diagram of the proposed TIMS-SC approach, shown single-ended for clarity while the implementation is differential (Fig. 3). To achieve almost 450ns RF delay at 3.3GHz  $F_s$ , >1480 RF

samples must be stored with low leakage, making direct SC delay implementation impractical. The delay element is constructed in three stages: (Stage 1) an 8-phase SC network sampling at full sample rate  $F_s$ , (Stage 2) a 186 capacitor storage element operating at  $F_s/8$  following each stage 1 sampler and enabling long time storage of RF samples, and (Stage 3) an 8-phase recombining stage operating at  $F_s$ . In this implementation, a buffer is inserted between each stage to prevent gain loss from charge sharing, but a passive implementation is possible as a tradeoff between power consumption and gain loss. The RF input is sampled sequentially onto 8 capacitors using 8-phase non-overlapping clocks  $P_0-P_7$ . Each stage 1 sample is transferred to 1 of 186 storage capacitors in the associated stage 2 sub-block, where there are  $8 \times 186$  total capacitors in stage 2. While stage 1 settling time is  $1/F_s$ , settling time expansion is created in stage 2 by allowing sample transfer from stage 1 to 2 to continue during the stage 1 hold time ( $PI_{xy}$ , where  $x$  is stage 1 path and  $y$  is stage 2 capacitor). With the expanded sample time, the sampler bandwidth required in stage 2 is greatly reduced, allowing much smaller sampling switches in stage 2 (8x in this work), enabling a large reduction in OFF state sample leakage. The leakage reduction enables an equal increase in maximum achievable hold time, key to achieving  $>100$ ns of RF delay. To reduce timing skew sensitivity, the stage 2 input clock  $PI_{xy}$  transitions prior to the stage 1 sample clock  $P_x$  such that the stage 2 input is static during clock transitions (e.g.  $PI_{10}$  before  $P_1$ ). After the programmed delay, a stage 2 output clock  $PO_{xy}$  initiates the transfer of the sample to the input of the associated stage 3 buffer, again time expanded. The stage 3 buffers output the delayed RF signal employing the same 8-phase clock timing as the stage 1 delay ( $P_x$ ). Timing skew is again mitigated by transitioning the stage 2 output clock  $PO_{xy}$  after the stage 3 output clock  $P_x$ . The input and output clocks in stage 2 are generated by two separate, but synchronous, divide-by-186 circuits (Fig. 3). The RF delay is programmed by delaying the enable timing (Fig. 3) of the stage 2 output clock  $PO_{x0}$  relative to the associated input clock  $PI_{x0}$ . The RF delay can be programmed over a range  $8/F_s$  to  $1480/F_s$  ( $8/F_s$  steps) and scales with sample frequency. High configurability and broadband delay achieved provide flexibility in operating frequency and bandwidth around  $F_s/2$  alias intervals.

Fig. 3 shows the block and circuit diagram for the proposed delay element. An inductorless LNA provides gain and differential conversion, similar to [7] but with NMOS bias sharing modifications to reduce AC coupling capacitors for

area savings ( $0.0016\text{mm}^2$  area). An input buffer provides LNA output isolation to the SC circuits and employs a push-pull output stage. Each SC circuit employs a differential  $250\text{fF}$  capacitor for small area and sampling noise. The stage 1 buffer employs an NMOS common-source with diode-connected load for unity gain matching between the 8 paths, all placed close in layout, to limit gain mismatch induced signal distortion. The stage 2 buffer employs a dynamic inverter clocked at both VSS and VDD by  $\text{PO}_{xy}$ , where 1 of 186 in each path are enabled at a time and 186 share a self-biased inverter load for common-mode stability in each of the 8 delay paths. Stage 3 buffers are placed close in layout for matching and employ dynamic common-source amplifiers with a shared resistive load. An output buffer provides balun and matching operation, employing a common-source amplifier and push-pull output stage. Clocking is provided through a divide-by-2 ( $F_{\text{clk}}=2F_s$ ) and 8-phase sample clocks are generated by two synchronous divide-by-8 circuits for low timing skew at stages 1 and 3. 8-phase clocks are pulse extended to 50% duty cycle to drive divide-by-186 circuits placed inside each stage 2 delay area for standard logic implementation compatibility.

The delay element was implemented in a 45nm SOI CMOS process with  $4\text{mm}^2$  chip area and  $1.36\text{mm}^2$  active area (Fig. 7). The RFIC was packaged in a  $5\times 5\text{mm}^2$  QFN for testing. The sample frequency was chosen to be  $F_s=3.3\text{GHz}$  ( $F_{\text{clk}}=6.6\text{GHz}$ ) for full characterization, while the device was found to operate beyond  $F_s=4.4\text{GHz}$  ( $F_{\text{clk}}=8.8\text{GHz}$ ) and below  $F_s=3.3\text{GHz}$ , providing system flexibility in clock frequency, delay range, and frequency coverage. Delay performance was verified across all delay settings at  $F_{\text{RFIN}}=1\text{GHz}$ , and at minimum/maximum delay across RF input frequency (Fig. 4). The maximum achieved delay was 448.6ns, minimum delay 2.55ns, and delay slope showed expected  $2.42\text{ns/step}$  ( $8/F_s$ ) over a 175.9x delay range. Delay DNL/INL was  $<\pm 4\text{ps}$  across all delay codes. Delay response was relatively flat at minimum and maximum delay settings across 0.2-to-2GHz, with delay variation at maximum delay  $<0.12\%$ .

Fig. 5 shows the gain, NF, S11, and power breakdown. The RFIC achieved 24dB gain, 1.1GHz 3dB bandwidth and 7.1dB NF at maximum delay setting, while  $<0.1\text{dB}$  gain/NF change was observed at minimum delay setting showing successful sample leakage mitigation. Gain flatness across any 100MHz BW was  $<\pm 0.5\text{dB}$  across 0.2-to-2GHz. Bandwidth and flatness can be further improved by increasing  $F_s$  to reduce Sinc roll-off in the zero-order hold

operation. IP1dB was -27/-25dBm at 1/2GHz, dominated by the LNA and output buffer. S11/S22 for the RF input/output were <-10dB from 0.2-to-3GHz, and S11 for the clock input, including an open-stub board match, was <-10dB from 3.5-to-9GHz. The RFIC consumed 80mW from a 1V supply.

Fig. 6 compares this work against prior state-of-the-art RF delay elements. This work is the first to achieve >8ns programmable delay at GHz frequencies with increases of 58x maximum delay, 9x area efficiency, and 4.5x delay range while maintaining comparable gain, NF, linearity, and power consumption.

### References:

- [1] A. Nagulu et al., "Full-Duplex Receiver with Wideband Multi-Domain FIR Cancellation Based on Stacked-Capacitor N-Path Switched-Capacitor Delay Lines Achieving >54dB SIC Across 80MHz BW and >15dBm TX Power-Handling," ISSCC, pp. 100-102, Feb. 2021.
- [2] K. Chu et al., "A Broadband and Deep-TX Self-Interference Cancellation Technique for Full-Duplex Cancellation Over 42MHz Bandwidth," ISSCC, pp. 314-315, Feb. 2018.
- [3] A. Nagulu et al., "A Full-Duplex Receiver Leveraging Multiphase Switched-Capacitor-Delay Based Multi-Domain FIR Filter Cancelers," IEEE RFIC, pp. 43-46, Aug 2020.
- [4] M. Li et al., "An 800-ps Origami True-Time-Delay-Based CMOS Receiver Front End for 6.5-9-GHz Phased Arrays," IEEE SSCL, vol. 3, pp. 382-385, Sep 2020.
- [5] I. Mondal et al., "A 2-GHz Bandwidth, 0.25-1.7ns True-Time-Delay Element Using a Variable-Order All-Pass Filter Architecture in 0.13 $\mu$ m CMOS," IEEE JSSC, vol. 52, no. 8, pp. 2180-2193, Aug 2017.
- [6] S. Garakoui et al., "A 1-to-2.5GHz Phased-Array IC Based on gm-RC All-Pass Time-Delay Cells," ISSCC, pp. 80-82, Feb. 2012.
- [7] P. Mak et al., "A 0.46-mm 4-dB NF Unified Receiver Front-End for Full-Band Mobile TV in 65-nm CMOS," ISSCC, pp. 172-173, Feb. 2011.



**Figure 1: Delay element techniques including gm-C all-pass filter, switched-capacitor, and the proposed time-interleaved multi-stage switched-capacitor approach.**



**Figure 2: Functional and timing diagram of proposed approach showing switched-capacitor operation and leakage reduction.**



Figure 3: Block and circuit diagram of the 0.2-to-2GHz time-interleaved multi-stage switched-capacitor delay element.



**Figure 4: Measured performance ( $F_s=3.3\text{GHz}$ ) across delay code at 1 GHz, delay DNL/INL at 1 GHz, and max/min delay across frequency.**



**Figure 5: Measured performance of the delay element ( $F_s=3.3\text{GHz}$ ): gain, noise figure, matching performance, and power breakdown.**



|                              | This Work                  | ISSCC 2021 [1]             | ISSCC 2012 [6]        | JSSC 2017 [5]         | SSCL 2020 [4]              |
|------------------------------|----------------------------|----------------------------|-----------------------|-----------------------|----------------------------|
| <b>Design</b>                | Delay Element              | SIC Receiver               | 4 Channel Beamformer  | Delay Element         | Delay Element + Attenuator |
| <b>Architecture</b>          | TI-MS Switched-Cap         | Switched-Cap               | Gm-C                  | Gm-C                  | Delay Line                 |
| <b>Delay Frequency Range</b> | 0.2GHz-2GHz                | 0.1GHz-1GHz                | 1GHz-2.5GHz           | 0.1GHz-2GHz           | 6.5GHz-9GHz                |
| <b>3dB Bandwidth</b>         | 0.2GHz-1.1GHz <sup>a</sup> | 0.1GHz-0.5GHz <sup>b</sup> | 1GHz-2.5GHz           | 0.1GHz-2GHz           | 6.5GHz-9GHz                |
| <b>Max Delay</b>             | 448.6ns <sup>a</sup>       | 7.75ns <sup>b</sup>        | 0.55ns                | 1.7ns                 | 0.8ns                      |
| <b>Delay per Unit Area</b>   | 330ns/mm <sup>2a</sup>     | 37ns/mm <sup>2b</sup>      | 7.9ns/mm <sup>2</sup> | 5.9ns/mm <sup>2</sup> | 0.4ns/mm <sup>2</sup>      |
| <b>Delay Range</b>           | 175.9x                     | 31x <sup>b</sup>           | 39.3x <sup>c</sup>    | 6.8x                  | 32x <sup>c</sup>           |
| <b>Gain</b>                  | 24dB                       | -19dB <sup>b</sup>         | 12dB                  | 0.6dB                 | 18dB                       |
| <b>Noise Figure</b>          | 7.1dB                      | -                          | 8dB                   | 23dB                  | 3.6dB                      |
| <b>IP1dB</b>                 | -27dBm                     | -                          | -21dBm                | -13dBm                | -17dBm                     |
| <b>Power</b>                 | 80mW <sup>a</sup>          | 7.4mW <sup>b</sup>         | 90mW <sup>d</sup>     | 364mW                 | 107mW                      |
| <b>Technology</b>            | 45nm SOI CMOS              | 65nm CMOS                  | 140nm CMOS            | 130nm CMOS            | 65nm CMOS                  |
| <b>Delay Active Area</b>     | 1.36mm <sup>2</sup>        | 0.21mm <sup>2b</sup>       | 0.07mm <sup>2</sup>   | 0.29mm <sup>2</sup>   | 2.25mm <sup>2</sup>        |

<sup>a</sup> $F_s=3.3\text{GHz}$ . <sup>b</sup>Max RF delay element. <sup>c</sup>Based on delay step. <sup>d</sup>Single Channel

**Figure 6: Maximum delay achieved vs area efficiency scatter plot of prior RF delay elements and comparison table.**



**Figure 7: Die micrograph.**



**Figure S1: Device test board and measurement setup showing LabView programming interface for delay settings, 3 bias currents provided by source meters, DC power supplies, and the 8340B signal generator used for clock generation in all tests. Cable and board trace loss and delay to the packaged device were calibrated out using a through on the board with equal length to the board trace length in the signal path. The N5230C vector network analyzer was used to measure S11/S22 and is 1 of 2 methods used to verify delay and gain. The N8975A noise figure analyzer was used to measure device noise figure.**



**Figure S2:** A second delay measurement method used the DSOS804A oscilloscope to capture input and output waveforms which were later processed in Matlab using signal correlation to determine delay. The RF input was applied using the E8267D signal generator pulsing function at the test operating frequency. To calibrate cable and power divider delay in the test setup, the cables at the chip test ports were connected to the through calibration ports at the bottom of the test board.



**Figure S3:** The above test setup was used for a second measurement method for gain, and for linearity measurements, using the N9010A spectrum analyzer and signal generators listed in the Figures S1-S2.