# Hardware Optimization Methodology of Multi-Step Look-Ahead Sigma-Delta Modulators

Nikos Temenos, Charis Basetas and Paul P. Sotiriadis Department of Electrical and Computer Engineering National Technical University of Athens, Greece E-mail: ntemenos@gmail.com, chbasetas@gmail.com, pps@ieee.org

Abstract-This work presents hardware optimization techniques for the implementation of Multi-Step Look-Ahead Sigma-Delta Modulators (MSLA SDMs). The MSLA SDM is a Sigma-Delta-based Modulator achieving significant improvement in noise shaping characteristics compared to classical SDM due to the higher-order noise transfer functions (NTFs) it can achieve while maintaining stability. MSLA SDM advantage comes at the cost of increased Digital Signal Processing (DSP) complexity due to multiple filters it uses. First we investigate the DSP complexity with respect to two Finite Impulse Response (FIR) filter structures, the direct and the direct transposed forms, as well as with respect to the architecture of the multi-input MSLA SDM quantizer. It is shown that the transposed direct form offers increased output clock rate at the cost of a minor hardware complexity increase compared to the direct form. The total complexity however is reduced with the complexity reduction technique introduced for the quantizer cutting the number of its slices to half. FPGA synthesis results for both the filter and the quantizer optimization methodologies are discussed and their results are presented. Moreover, a comparison in terms of Signal-to-Noise-and-Distortion Ratio (SNDR) with respect to the conventional SDM is also presented to illustrate the increased capabilities of the MSLA SDM and to consider it as a viable choice in applications where single-bit output representation is required.

*Keywords*—Sigma-delta, noise shaping, single-bit quantization, look-ahead, hardware implementation, DSP

#### I. INTRODUCTION

Current state-of-the-art electronics, tend towards digital implementations in order to be applied in a variety of frequentlyused systems, including digital to analog converters (DACs), all-digital frequency synthesizers as well as wireless telecommunications circuits (transmitters and transceivers) [1]–[3]. Therefore, economical, low area and power efficient systems are the most essential topics considered for digital design.

Evidently, in order to implement and benefit from an alldigital system, analog or mixed-signal blocks must be neglected. Some of the most important advantages of digital circuits include the small chip area, immunity to any process, voltage and temperature variations (PVT), scalability, reconfigurability and also faster re-design cycles compared to their analog or mixed-signal equivalents. In addition, regarding frequency synthesizers, the digital implementations offer fast frequency hopping, high frequency accuracy and detailed resolution.

Advancing the aforementioned, in a frequency synthesizer, single-bit output signal representation is crucial for all-digital systems in order to avoid multi-bit DACs [4]. Single-bit DACs are inherently linear and subject only to gain and offset errors, which can be easily corrected. However, traditional single-bit synthesizer architectures such as the pulse direct digital synthesizer (PDDS) and the Flying Adder (FA) suffer from frequency spurs, high deterministic jitter and high noise floor when dithering effects are added [5].

A successful approach is to apply a single-bit Sigma-Delta Modulator (SDM) after a Direct Digital Synthesizer (DDS) to shape the quantization noise out of the desired frequency band [4]. Nevertheless, the major concern about SDMs is that their noise shaping capabilities, meaning Signal-to-Noise-and-Distortion Ratio (SNDR), output bandwidth and Spurious-Free Dynamic-Range (SFDR), are limited due to stability restrictions [4]. Alternative methods for increasing the noise shaping capabilities, such as general look-ahead SDMs [6], introduce increased hardware complexity, which do not allow for real time applications to be implemented. Therefore, we propose the Multi-Step Look-Ahead SDM [7] which offers reduced hardware complexity compared to general Look-Ahead SDMs and increased SNDR over conventional ones. In this work, we focus on the hardware implementation of the MSLA SDM system architecture, optimizations and design trade-offs that meet the criteria for optimum overall performance. A comparison with the traditional SDMs is also shown to justify the superior performance of the MSLA SDMs.

In the following section, the proposed system architecture of the MSLA SDM is briefly explained. In section III, the architecture, optimizations concerned as well as FPGA synthesis results are described and compared with the conventional SDM. Finally, in section IV, the conclusion is discussed.

# II. MULTI-STEP LOOK-AHEAD SDM OPERATION

The MSLA SDM successfully improves the noise shaping characteristics as well as the stability range by taking into account both current and future quantization errors, which is accomplished by minimizing a properly formulated cost function [7]. The "look-ahead" principal, does not predict or calculate any future samples, instead the input sequence is delayed by a number of look-ahead steps used. As the number of look-ahead steps increases, the noise shaping capabilities also increase, meaning higher SNDR, SFDR and bandwidth due to the possibility to use higher-order noise transfer functions (NTFs), i.e. NTFs with higher-out-of-band gain. In Fig. 1 the MSLA SDM system architecture is presented. The input sequence is denoted as x while y is the single-bit output. The system is composed of r + 1 two-input IIR filters  $(L_j^0, L_j^1)$ , which produce r + 1 outputs  $u_j$ ,  $k - r \le j \le k$ , where k is the number of look-ahead steps used and r + 1 is the number of partial quantization error costs. It is evident that the number of look-ahead steps is directly associated with the order of each filter  $(L_j^0, L_j^1)$ , namely j + k. The loop filter equations are:

$$L_j^0(z) = \sum_{i=0}^{j+\ell-1} c_{j,i} z^{j-i} + G(z) \sum_{i=0}^{m-1} d_{j,i} z^{-i}$$
(1)

$$L_j^1(z) = -\sum_{i=j+1}^{j+\ell-1} c_{j,i} z^{j-i} - G(z) \sum_{i=0}^{m-1} d_{j,i} z^{-i}$$
(2)

with  $k - r \le j \le k$ , l and m the filter numerator and denominator orders respectively. G(z) stands for the z transform of the NTF transfer function and is given by:

$$G(z) = \frac{1 - NTF(z)}{NTF(z)} = \frac{\sum_{i=1}^{\ell} b_i z^{-i}}{1 + \sum_{i=1}^{m} a_i z^{-i}},$$
(3)

while  $c_{j,i}$  and  $d_{j,i}$  are constant coefficients derived from G(z). Consequently, the filter outputs  $u_j$ , are afterwards fed to an (r + 1)-input quantizer which is described by a function  $f : \Re^{r+1} \to \{\pm 1\}$ , with argument  $\mathbf{u}_n = [u_{k-r,n}, u_{k-r+1,n}, \ldots, u_{k,n}]$ , where n is the discrete time index. For further mathematical analysis and proof of the derivation of G(z) and the constant coefficients, the reader is referred to [7].



Fig. 1. The MSLA SDM system diagram.

# III. THE MSLA SDM HARDWARE SYSTEM ARCHITECTURE

In this section, first and foremost, the parameters used for hardware implementation are described. These parameters remain stable in order to prove the impact of the design methods, regarding the filters and the quantizer, on the MSLA SDM. More specifically, we used k = 3 look-ahead steps and r = 3 quantization error costs, an 8-th order NTF with central frequency  $f_0 = 0.32 \cdot f_{clk}$  and an Oversampling Ratio (OSR) of OSR = 128. The NTF is designed using the Delta Sigma Toolbox [8] with the maximum possible out-of-band gain, meaning  $||NTF||_{\infty} = 1.73$ , that allows stable operation while maintaining performance for any sinusoidal input signal with amplitude up to 0.4. In the next subsections, the design of the filters, the implementation of the quantizer as well as the MSLA SDM hardware architecture are described. All the designs were implemented and synthesized in a Xilinx Kintex-7 KC705 Evaluation Kit.

#### A. Filter Structure Architecture and Optimizations

As aforementioned, using k = 3 look-ahead steps results to 4 loop filters for  $0 \le j \le 3$ . The loop filters are implemented using 32-bit fixed-point arithmetic, with the RTL code generated by Simulink. The complete filter design is shown in Fig. 2. Apart from the loop filter corresponding to j = 0, the other ones consist of 3 finite impulse response (FIR) filters with orders j, l-2 and m-1. Moreover, an additional infinite impulse response (IIR) filter, e is used to calculate the error feedback. The bitwidth used for the representation of the signals corresponding to the IIR filter must be highly enough, in order to avoid quantization errors, due to the fact that they are accumulated on every sample and therefore may lead to instability [9].

Regarding the FIR filters, 2 design structures were implemented via Simulink; the direct form and the direct form transposed. In Fig. 3 the 2 structures are shown. Although, in system level there is no significant difference regarding these 2 structures [9], practically translating into hardware exists. In the transposed form, the delay units are placed between the adders and therefore the multipliers are fed directly from the input signal [10]. In contrast, in the direct form, an extra shift register is required in order to achieve the same throughput [10]. Translating this into hardware, a higher output clock rate at the cost of increased slices in the transposed structure opposes the slightly lower slice number and reduced clock rate in the direct form. This is considered a DSP design tradeoff and hence must be exploited accordingly to the system specification needs. In Table I, FPGA synthesis results for the MSLA SDM with the FIR filter structures are shown and justify the aforementioned.

## B. Quantizer Optimizations

In previous section, it was mentioned that the multi-input single-bit output quantizer is described by a function  $f(\mathbf{u})$ . In detail, function  $f(\mathbf{u})$  is a static function [7] which means that each combination of the filter outputs  $u_j$  does not need to be calculated in real time to produce the single-bit output and therefore occupy more DSPs and slices from the FPGA. Instead, all the values can be pre-calculated for every possible input and stored in a Look-Up Table (LUT).

Each filter input  $u_j$  is represented by a number of bits in order to achieve optimum performance while maintaining stability. From extensive simulations it has been proven that 3-4 fractional bits are enough for the representation of each of the quantizer inputs [2], [11]. In addition, 2-3 more bits are required for the sign and the integer part of each input as well [2], [11]. Depending on the number of bits used for the representation of each of the inputs  $u_j$ , the LUT complexity can be reduced and optimized. The first optimization implies

TABLE I MSLA FIR FILTER STRUCTURE HARDWARE RESOURCES

|                                | MSLA SDM with Direct form FIR filters | MSLA SDM with Transposed form FIR filters |
|--------------------------------|---------------------------------------|-------------------------------------------|
| Max. output rate [Msamples/s]  | 13.15                                 | 15.15                                     |
| Slice LUTs [Used / Util.]      | 24,637 / 12.09%                       | 25,252 / 12.39%                           |
| Slice Registers [Used / Util.] | 1,832 / 0.45%                         | 2,841 / 0.70%                             |
| F7 Muxes [Used / Util.]        | 2,346 / 2.30%                         | 2,340 / 2.30%                             |
| F8 Muxes [Used / Util.]        | 254 / 0.50%                           | 657 / 1.29%                               |
| DSP Blocks [Used / Util.]      | 278 / 33.10%                          | 268 / 31.90%                              |



Fig. 2. MSLA Loop Filter architecture



Fig. 3. FIR Filter structures: a) Direct Form b) Direct Form Transposed

that filter  $u_0$  directly affects the output; for certain  $u_0$  filter inputs, the output remains the same [7]. Therefore, these values are excluded from the LUTs, since they do not need to be calculated.

The second optimization originates from the odd behavior of function  $f(\mathbf{u})$  [7]. This means that for a given set of values for filter outputs  $u_j$ , i.e  $u_{j,n}$  =  $[u_{k-r,n}, sgn(u_{k-r+1,n}), sgn(u_{k-r+2,n}), sgn(u_{k-r+3,n})],$ the opposite sign values  $\mathbf{u}_{\mathbf{i}.\mathbf{n}}$ produce -sqn(y) output. Thus, it is evident that the LUT size



Fig. 4. MSLA LUT architecture

can be halved and reduce the overall slice number required for the implementation.

From technical perspective, the 4 filters combined result in a 24-bit input address which is then reduced to 17 bits after using the first optimization. In our original design, all these values were stored in 38 sub-LUTs in total due to synthesis constraints that do not allow for more than a specific number of combinations for each sub-LUT to be stored. The LUT partitioning scheme is shown in Fig. 4. However, by exploiting the odd functionality of the quantizer, the sub-LUT number is reduced to 19; depending on the  $u_0$  value range, an AND gate and a multiplexer select the -sqn(y) output accordingly. In Fig. 5 the proposed optimized architecture is depicted. As a proof of the aforementioned, table II, shows the FPGA synthesis results with reduced utilization of the 19 sub-LUT design compared to the original setup with 38 sub-LUTs.

## C. Performance Comparison of the MSLA and the Conventional SDM

It is mentioned in previous section that the MSLA SDM succeeds in improving the noise shaping characteristics compared to the conventional SDM. The increased performance  $[-u_{k-r,n}, -sgn(u_{k-r+1,n}), -sgn(u_{k-r+2,n}), -sgn(u_{k-r+3,n})]$  originates from the advanced hardware complexity, which means that the MSLA SDM utilizes on average more FPGA

|                                | Original MSLA SDM with 38 sub-LUTS | Proposed MSLA SDM with 19 sub-LUTS |
|--------------------------------|------------------------------------|------------------------------------|
| Max. output rate [Msamples/s]  | 13.15                              | 13.67                              |
| Slice LUTs [Used / Util.]      | 24,637 / 12.09%                    | 19,221 / 9.43%                     |
| Slice Registers [Used / Util.] | 1,832 / 0.45%                      | 1,504 / 0.37%                      |
| F7 Muxes [Used / Util.]        | 2,346 / 2.30%                      | 1,674 / 1.64%                      |
| F8 Muxes [Used / Util.]        | 254 / 0.50%                        | 539 / 1.06%                        |
| DSP Blocks [Used / Util.]      | 278 / 33.10%                       | 278 / 33.10%                       |





Fig. 5. MSLA LUT optimized architecture

resources than the conventional SDM and thus needs more area to be implemented. In order to justify the increased overall performance of the MSLA SDM, we also synthesized a conventional one with the same parameters, meaning the same OSR, NTF and central frequency. Regarding their performance, the MSLA SDM exhibits almost 6dB higher dynamic range, which is translated into a SNDR of 131 dB compared to the 120 dB of the conventional one. In Fig. 6 the output spectrum results from the implemented FPGA are depicted.

## **IV. CONCLUSION**

In this work, the principal operation of the MSLA SDM was briefly discussed and its hardware implementation was presented. Digital signal processing techniques prove that an increase in performance, regarding the output clock rate, can be achieved at the cost of increased overall hardware utilization on the FPGA. In addition, quantizer optimization techniques furtherly reduce the hardware needs for the implementation of the MSLA SDM. Finally, the superior performance of the MSLA SDM compared to the conventional one proves that



Fig. 6. SNDR values and output spectrum of the MSLA SDM vs a Conventional SDM  $\,$ 

it can be a viable choice in applications where single-bit quantization with higher SNDR is required.

#### REFERENCES

- A. S. Kamath and B. Chattopadhyay, "A wide output range, mismatch tolerant sigma delta dac for digital pll in 90nm cmos," in *IEEE Int. Symp. on Circ. and Systems (ISCAS)*, May 2012, pp. 69–72.
- [2] C. Basetas, P. P. Sotiriadis, and N. Temenos, "Wide-band frequency synthesis using hardware-efficient band-pass single-bit multi-step lookahead sigma-delta modulators," in *IEEE Int. Freq. Control Symp. & Europ. Freq. and Time Forum*, Besançon, France, 2017.
- [3] —, "Frequency synthesis using low-pass single-bit multi-step lookahead sigma-delta modulators in quadrature upconversion scheme," in *IEEE Int. Freq. Control Symp. & Europ. Freq. and Time Forum*, Besançon, France, 2017.
- [4] R. Schreier and G. C. Temes, Understanding Delta-Sigma Data Converters, S. V. Kartalopoulos, Ed. John Wiley & Sons, Inc., 2005.
- [5] P. P. Sotiriadis and K. Galanopoulos, "Direct all-digital frequency synthesis techniques, spurs suppression, and deterministic jitter correction," *IEEE Trans. Circuits Syst. I*, vol. 59, no. 5, pp. 958–968, May 2012.
- [6] E. Janssen and A. van Roermund, Look-Ahead Based Sigma-Delta Modulation. Springer, 2011.
- [7] C. Basetas, T. Orfanos, and P. P. Sotiriadis, "A class of 1-bit multi-step look-ahead Σ-Δ modulators," *IEEE Trans. Circuits Syst. I*, vol. 64, no. 1, pp. 24–37, Jan. 2017.
- [8] R. Schreier. (2011, Dec.) Delta sigma toolbox. [Online]. Available: http://www.mathworks.com/matlabcentral/fileexchange/19delta-sigma-toolbox
- [9] A. V. Oppenheim and R. W. Schafer, *Discrete-Time Signal Processing*, 2nd ed. Prentice-Hall, 1999.
- [10] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays, 4th ed. Springer, 2014.
- [11] N. Temenos, C. Basetas, and P. P. Sotiriadis, "Noise shaping advantages of band-pass multi-step look-ahead sigma-delta modulators over conventional ones in signal synthesis," in *IEEE 4th Panhellenic Conference* on Electronics and Telecommunications, Xanthi, Greece, 2017.