# An Area-Efficient, Analog Integrated Image Edge Detector based on the Robert's Cross Operator

Georgios Gennis, Argyro Kamperi, Vassilis Alimisis, Christos Dimas and Paul P. Sotiriadis

Department of Electrical and Computer Engineering

National Technical University of Athens, Greece

E-mail: giorgosyennis@gmail.com, argykaberi@gmail.com, alimisisv@gmail.com, chdim@central.ntua.gr, pps@ieee.org

Abstract—Edge detection is an useful tool utilized by various Computer Vision applications. A prime application example for locating the spatial edges of an image is to then separate and identify the included objects. Similarly to most Computer Vision tools, edge detection is computation hungry, but the whole procedure can be highly parallelized. Therefore hardware application specific implementations present multiple benefits compared to traditional central processing unit approaches. To this end, in this work an implementation of a hardware friendly variation of the Robert's Cross operator is presented, that along with the utilized building blocks achieves a high area and power efficiency. The edge detector was evaluated on 3 images involving various different applications, achieving a mean Structural Similarity Index Metric of 0.75 while requiring  $956\mu m^2$  per pixel. The presented edge detector was designed and simulated in a TSMC 90nm CMOS process, using the Cadence IC Suite.

Index Terms—Analog integrated, edge detector, Gaussian function circuits, low-power architecture, Robert's Cross operator

# I. INTRODUCTION

Computer Vision's (CV's) goal is to enable machines to visualize their surrounding environments in a manner similar to the human perception [1], [2]. While hearing or other types of sensing are also included, the most developed part of CV is related to the human vision. However, real-time automatic extraction, analysis and process on the huge amount of data involved in even the simpler CV tasks require unprecedented performance. So far, engines that manage to partially tackle this demand include Graphic Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) [3]. Nonetheless, in batterydependent devices that require CV, such as automatic vehicles or mobile robotic systems [4], ASICs (and especially analog ASICs) are inevitably superior due to their power and area efficiency [5]. To this end, a low power and area efficient, voltage-mode version of the analog integrated image edge detector proposed in our previous work [6] is presented here.

Unlike in [6], the new edge detector cell is focused in area efficiency and hence a larger edge detector array can be directly integrated next to a photodiode array without increasing the chip area to an impractical size. Also, the voltage-mode circuit that replaces the previous current-mode one, greatly improves the sensitivity of the detector while requiring only a fraction of the previous power consumption. Both architectures are compared in terms of performance and the similarity of their results to the ones of a Robert's Cross Operator (RCO) implemented in software [7]. The presented analog edge detector consumes 14nW per pixel, with a pixel size of  $956\mu m^2$ , achieving an average Structural Similarity Index Metric (SSIM) of 0.75 [8].

In the literature, except from [6], only a few other works that involve analog integrated-based image edge detection exist. Among these works, the one presented in [9] achieves the best power management along with a very small area per pixel. Other designs that also focus on the systems efficiency, implement either current [10] or voltage-mode [11]–[13] convolution filters. On the other hand, [14] and [15] focus on implementing a more accurate algorithm, namely the Sobel operator for edge detection [16], at the cost of area and power consumption. Finally, a non-traditional approach is presented in [17], where a morphological edge detector that utilizes the erosion and the dilation operators is presented.

The rest of this work is organized as follows. The background regarding the edge detector and our previous work is summarized in Section II. The proposed implementation is discussed in Section III. In Section IV, the simulation results are presented and compared to our previous work. Finally, concluding remarks are provided in Section V.

## II. BACKGROUND

The RCO detects regions with high spatial frequency in the diagonal direction, which is similar to the human perception [7]. Assuming an image with a  $N \times M$  resolution,  $x_{i,j}$  denotes the light intensity of a pixel (i, j) (in a grayscale image), for each i < N, j < M. In this case, the RCO approximates the calculation of the image's gradient (spatial frequency)  $z_{i,j}$ , as shown here:

$$z_{i,j} = \sqrt{(\sqrt{x_{i,j}} - \sqrt{x_{i+1,j+1}})^2 + (\sqrt{x_{i+1,j}} - \sqrt{x_{i,j+1}})^2}.$$
(1)

However, these calculations require complex circuits to be derived in analog hardware. Hence, in [6] a hardware-friendly version of the RCO is presented to facilitate the benefits of Bump circuits (a type of Gaussian function generation circuit) [18] in implementing an analog integrated image edge detector. In this algorithm, the RCO's formula, equation (1), is transformed to:

$$\hat{z}_{i,j} = 2\pi \cdot \sigma^2 \cdot \mathcal{N}(y_{i,j} \| y_{i+1,j+1}, \sigma^2) \cdot \mathcal{N}(y_{i+1,j} \| y_{i,j+1}, \sigma^2),$$
(2)

where  $\mathcal{N}(x \| \mu, \sigma^2)$  is the univariate Gaussian function and is given by:

$$\mathcal{N}(x\|\mu,\sigma^2) = \frac{1}{\sqrt{(2\pi)\cdot\sigma^2}} \ e^{-\frac{1}{2}\cdot\frac{(x-\mu)^2}{\sigma^2}}.$$
 (3)

Here,  $\mu$  and  $\sigma$  denote the mean value and the variance of the Gaussian function. Finally, a simple threshold circuit can be used to derive a binary output (edge/non-edge).

In [6] the basic building block is a Bump circuit that implements the equation (2). It is composed of two neuron cells and a symmetrical current correlator biased by a cascode current mirror [18]. Unlike a typical Bump circuit [18], where one of the differential pair's voltage inputs acts as a constant parameter, there, based on (2), both  $I_{in1}$  and  $I_{in2}$  are in fact inputs to the circuit. These circuit can produce a high quality and controllable Gaussian curve. However, as it is investigated in this work, such an increased performance is not necessary for the RCO to produce significant results.

#### Threshold Circuit Edge Detector $V_{DD}$ Mp6 Mp5 Mp4 Mp2 Mn Mp3 Mn11 $\frac{1}{V}$ Mn6 Mn5 Mn9 Mn4 Mn7 Mn8 Mn10 Mn3 V<sub>SS</sub>

#### **III. PROPOSED ANALOG EDGE DETECTOR**

Fig. 1: The analog implementation of the RCO is based around a 2-D cascaded Bump circuit. Due to the external bias  $V_{bias}$  the drain voltage  $M_{p5}$  is in fact in a semi-digital format (either close to the positive or negative supply voltage). The quality of this digital format is further improved through the use of a simple digital inverter.

In this work, we replicate the architecture introduced in [6] utilizing a far more power and area efficient Bump circuit [19] as well as replacing the previously implemented threshold circuit with a much simpler design. It should be noted that all transistors presented in the following schematics operate in the sub-threshold domain with power supply rails set to  $V_{DD} = -V_{SS} = 0.3V$ .

#### A. Edge Detector Architecture

In this work, we utilize Delbruck's Bump circuit [19] depicted in Fig. 1. It is composed of a differential pair and a simple current correlator biased by a simple current mirror. The differential pair produces two drain currents  $I_1$  and  $I_2$ , similarly to the two neurons in [6], [18]. Given these sigmoidal currents, the output current produced by the correlator resembles a Gaussian curve. The bias current  $I_{bias}$  controls the height of the Gaussian curve and in this topology, the variance of the Gaussian curve can only be altered by changing the sizes of the transistors. However, control over the Gaussian function's variance is not necessary for the RCO.



Fig. 2: Layout of the simplified RCO cell.

In equation 2, the RCO's result is calculated as a product of two Gaussian functions. Bump circuits are selected because they can efficiently perform multiplication without the use of additional components. In this case, that two bump circuits are involved, if we bias the second Bump circuit with the first Bump circuit's output current, its output current equals the product of their individual Gaussian curves [20]. In this configuration, only the first Bump circuit is biased with a specified external bias current ( $I_{bias}$ ). Unlike in [6], in this work, in order to reduce the circuit's footprint, the current mirrors are removed from the second Bump circuit and its design is replaced with a PMOS-based one, as shown in Fig. 1.

The threshold circuit is based on the drain voltage of the transistor  $M_{p5}$ . The set external bias voltage  $V_{bias}$ , ensures that the drain voltage is either close to the positive or negative supply voltage, due to the difference in the currents entering this node. Therefore, the voltage of this node is in fact in a digital format. However, the quality of this digital output is not ideal and a simple digital inverter is used to improve it. It should be noted, that the  $V_{bias}$  can be easily generated using a single current mirror and the existing  $I_{bias}$  for all the threshold circuits. Additionally, by changing the bulk voltage of the transistor  $M_{p5}$ , one can essentially control the threshold value of the circuit, hence locating more or less edges depending on the application-in-question. Finally, all transistors' dimensions are summarized in Table I.

## B. System-Level Architecture

In [6] several system-level architectures were proposed, that offered different trade-offs between their computation speed and their area and power efficiency. However, the architectures that achieved high frame-per-second (FPS) values, which is a

| TABLE I: MOS | Transistors' | Dimensions | (Fig. 1). |  |
|--------------|--------------|------------|-----------|--|
|--------------|--------------|------------|-----------|--|

| Transistors         | W/L $(\mu m/\mu m)$ | Transistors      | W/L $(\mu m/\mu m)$ |
|---------------------|---------------------|------------------|---------------------|
| $M_{n1}, M_{n2}$    | 0.4/0.4             | $M_{n5}, M_{n6}$ | 0.4/5.2             |
| $M_{n3}$            | 0.4/1.6             | $M_{n7}, M_{n8}$ | 1.6/5.2             |
| $M_{n4}$            | 1.2/1.6             | $M_{n9}$         | 1.2/0.4             |
| $M_{p1}$ - $M_{p8}$ | 0.4/1.6             | -                | -                   |



Fig. 3: Conceptual system-level architecture, where a multi-cell analog edge detector is shifted on the entire image.

main purpose of an ASIC-based edge detector, were impractical in terms of chip area. In this work, the RCO's cell is reduced by more that half, as shown by the layout presented in Fig. 2 and the threshold circuit is almost diminished. To minimize the fabrication mismatch effects and for various manufacturing considerations, the layout, shown in Fig. 2, is designed based on the common-centroid technique and therefore extra dummy transistors are used. In a system level implementation, these dummy transistors could be decreased or replaced with active ones. Additionally, the gaps that are visible in this layout can be filled by either integrating the photodiodes and/or multiple cells next to each other, achieving an even greater reduction in total chip area. Therefore, architectures like the one presented in Fig. 3 can include tenths of RCO cells before reaching an impractical size.

# **IV. SIMULATION RESULTS**

In this Section, a comparison between two analog integrated implementations (this work and our previous related one [6]) and a software implementation of the RCO in various different images is provided. Both analog architecture and their simulations results are conducted in a TSMC 90nm CMOS process, using the Cadence IC suite.

To account for various different applications, the comparison is taking place over 3 different images regarding human skin detection, road navigation and satellite image processing. The produced binary images for all three implementations are shown in Fig. 4. Similarly to our previous work, 3 figures of merit are used to highlight the benefits of the proposed design; the Structural Similarity Index Metric (SSIM), the layout area per pixel (LAP) and the power consumption per pixel (PCP). Table II summarizes these results, excluding the LAP and PCP metrics that are invariant of the selected image and are included in table III. However, the quality of the images generated by the analog circuits can also be assessed visually by inspecting the areas that the human perception would identify as "edges". It is evident that regardless of the proposed simplifications on the RCO cell, both designs capture the anticipated areas of interest with similar accuracy.

TABLE II: Performance Summary for Analog Edge Detectors

| Image                  | Design           | Resolution                                                      | SSIM           | Total Power<br>Consumption |
|------------------------|------------------|-----------------------------------------------------------------|----------------|----------------------------|
| Satellite<br>Satellite | [6]<br>This work | $\begin{array}{c} 544 \times 593 \\ 544 \times 593 \end{array}$ | $0.76 \\ 0.61$ | 10.9mW<br>4.5mW            |
| Road<br>Road           | [6]<br>This work | $356 \times 533$<br>$356 \times 533$                            | $0.90 \\ 0.82$ | 6.4mW<br>2.7mW             |
| Moles<br>Moles         | [6]<br>This work | $\begin{array}{c} 239 \times 450 \\ 239 \times 450 \end{array}$ | $0.86 \\ 0.81$ | 3.7mW<br>1.5mW             |

Finally, a Monte-Carlo analysis for N = 200 points is conducted to test the circuit's sensitivity in PVT variations. In particular, the subject-under-test is the circuit's threshold boundary which is translated to voltage difference between to diagonal pixels. The mean value of this distance under PVT variations is  $\mu = 50$ mV with a standard deviation of  $\sigma = 5.5$ mV. As expected this work is a lot less sensitive in PVT variations than our previous one [6].

Since the aim of this work is to minimize the LAP and PCP metrics (while maintaining a high quality product), a comparison between this work and other analog edge detectors that exist in the literature is provided in Table III. Unfortunately, the quality of the produced image cannot be assessed fairly. The proposed work outperforms the rest in terms of PCP and in maximum possible FPS when a fully parallel architecture is employed. However, despite the reduction in chip area, there are still other works that require an even smaller footprint.

TABLE III: Performance Summary for Analog Edge Detectors

| ref.      | Technology   | Supply<br>Voltage | РСР          | LAP            | FPS  |
|-----------|--------------|-------------------|--------------|----------------|------|
| This work | 90nm         | 0.6V              | 14nW         | $956 \mu m^2$  | 100K |
| [6]       | 90nm         | 0.6V              | 33nW         | $2392 \mu m^2$ | 100K |
| [17]      | $0.5 \mu m$  | 1.8V              | 0.368mW      | $8600 \mu m^2$ | N/A  |
| [14]      | 150nm        | 1.8V              | $790 \mu W$  | $140 \mu m^2$  | 75   |
| [15]      | $0.35 \mu m$ | 3.3V              | $26.8 \mu W$ | $1125 \mu m^2$ | 2000 |
| [10]      | $0.6 \mu m$  | 1.8V              | $3.6 \mu W$  | $100 \mu m^2$  | 50   |
| [11]      | $0.35 \mu m$ | 3.3V              | $5.8 \mu W$  | $1125 \mu m^2$ | N/A  |
| [12]      | 250nm        | N/A               | $1.2 \mu W$  | $633 \mu m^2$  | N/A  |
| [9]       | 180nm        | 1.8V              | $0.9 \mu W$  | $225 \mu m^2$  | 1300 |

## V. CONCLUSION

An analog edge detector was presented in this work, achieving an area per pixel that allows for highly parallel architectures which can process even high definition images. In a fully parallel configuration the proposed edge detector can process images at rates as high as 100 K FPS, consuming only 14nW per pixel. This was mainly achieved due to the utilized voltage-mode and area-efficient Bump circuit. Because



Fig. 4: Images provided by: Left: the software-based edge detector. Center: the analog circuit-based edge detector presented in [6] Right: the analog circuit-based edge detector proposed here.

of various test constraints, 3 medium resolution images that involve different Computer Vision applications were used to evaluate the proposed architecture. Simulation results conducted in a TSMC 90nm CMOS process, confirm the quality of the produced "edge" images. Concluding, the presented architecture is a prime candidate as a pre-processing block in many CV related systems that require edge detection.

## REFERENCES

- [1] R. Klette, Concise computer vision. Springer, 2014, vol. 233.
- [2] R. Szeliski, Computer vision: algorithms and applications. Springer Nature, 2022.
- [3] X. Feng, Y. Jiang, X. Yang, M. Du, and X. Li, "Computer vision algorithms and hardware implementations: A survey," *Integration*, vol. 69, pp. 309–320, 2019.
- [4] R. Zecca, D. L. Marks, and D. R. Smith, "Symphotic design of an edge detector for autonomous navigation," *IEEE Access*, vol. 7, pp. 144836– 144844, 2019.
- [5] D. Moolchandani, A. Kumar, and S. R. Sarangi, "Accelerating cnn inference on asics: A survey," *Journal of Systems Architecture*, vol. 113, p. 101887, 2021.
- [6] G. Gennis, V. Alimisis, C. Dimas, and P. P. Sotiriadis, "A general purpose, low power, analog integrated image edge detector, based on a current-mode gaussian function circuit," *Analog Integrated Circuits* and Signal Processing, pp. 1–12, 2022.
- [7] L. Roberts, "Machine perception of 3-d solids, optical and electro-optical information processing," 1965.
- [8] A. Rehman and Z. Wang, "Reduced-reference image quality assessment by structural similarity estimation," *IEEE transactions on image processing*, vol. 21, no. 8, pp. 3378–3389, 2012.
- [9] M. Nam and K. Cho, "Implementation of real-time image edge detector based on a bump circuit and active pixels in a cmos image sensor," *Integration*, vol. 60, pp. 56–62, 2018.

- [10] R. Njuguna and V. Gruev, "Low power programmable current mode computational imaging sensor," *IEEE Sensors Journal*, vol. 12, no. 4, pp. 727–736, 2011.
- [11] N. Massari, M. Gottardi, L. Gonzo, D. Stoppa, and A. Simoni, "A cmos image sensor with programmable pixel-level analog processing," *IEEE Transactions on Neural Networks*, vol. 16, no. 6, pp. 1673–1684, 2005.
- [12] J.-H. Kim, J.-S. Kong, S.-H. Suh, M. Lee, J.-K. Shin, H. B. Park, and C. A. Choi, "A low power analog cmos vision chip for edge detection using electronic switches," *ETRI journal*, vol. 27, no. 5, pp. 539–544, 2005.
- [13] L. Dron, "The multiscale veto model: A two-stage analog network for edge detection and image reconstruction," *International Journal of Computer Vision*, vol. 11, no. 1, pp. 45–61, 1993.
- [14] C. Soell, L. Shi, J. Roeber, M. Reichenbach, R. Weigel, and A. Hagelauer, "Low-power analog smart camera sensor for edge detection," in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 4408–4412.
- [15] J. Dubois, D. Ginhac, M. Paindavoine, and B. Heyrman, "A 10 000 fps cmos sensor with massively parallel image processing," *IEEE Journal* of Solid-State Circuits, vol. 43, no. 3, pp. 706–717, 2008.
- [16] O. R. Vincent, O. Folorunso *et al.*, "A descriptive algorithm for sobel image edge detection," in *Proceedings of informing science & IT education conference (InSITE)*, vol. 40, 2009, pp. 97–107.
- [17] L. A. S. Gaspariano and A. D. Sánchez, "Analog cmos morphological edge detector for gray-scale images."
- [18] V. Alimisis, M. Gourdouparis, G. Gennis, C. Dimas, and P. P. Sotiriadis, "Analog gaussian function circuit: Architectures, operating principles and applications," *Electronics*, vol. 10, no. 20, p. 2530, 2021.
- [19] T. Delbrueck and C. Mead, "Bump circuits," in *Proceedings of Interna*tional Joint Conference on Neural Networks, vol. 1, 1993, pp. 475–479.
- [20] V. Alimisis, M. Gourdouparis, C. Dimas, and P. P. Sotiriadis, "A 0.6 v, 3.3 nw, adjustable gaussian circuit for tunable kernel functions," in 2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI). IEEE, 2021, pp. 1–6.