

# An ultra low power analog integrated radial basis function classifier for smart IoT systems

Vassilis Alimisis<sup>1</sup> · Georgios Gennis<sup>1</sup> · Christos Dimas<sup>1</sup> · Marios Gourdouparis<sup>1</sup> · Paul P. Sotiriadis<sup>1</sup>

Received: 27 November 2021 / Revised: 16 February 2022 / Accepted: 13 April 2022 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

#### Abstract

A low-power, integrated, fully-analog classification system, appropriate for battery-dependent operation is proposed. It is comprised of an analog feature-extraction block and an ultra-low power (112–520 nW) and area-efficient analog radial basis function classifier and can be directly connected to an analog sensor avoiding power costly data conversion. The classifier consists of a proposed bump circuit and the Lazzaro Winner-take-all circuit and it is evaluated using a real-world dataset achieving 87.6% classification accuracy, which is only 1.4% less than the that of the theoretical software-based model. The classifier was designed and post-layout simulated in a TSMC 90 nm CMOS process in the Cadence IC Suite.

**Keywords** Analog integrated implementation  $\cdot$  Bump circuits  $\cdot$  Classification system  $\cdot$  Low-power design  $\cdot$  Radial basis functions

#### 1 Introduction

Internet of Things (IoT) is a system of interrelated devices connected to the internet to transfer and receive data over a wireless network without human intervention [1-3]. A typical IoT system contains multiple sensors to sense different physical variables. In the recent years, there is a significant improvement in sensor manufacturing technology which leads to sensor miniaturization and significant cost reduction thus allowing a tighter integration with an IoT device [4]. Except for multiple sensors an IoT system consists of an analog front-end (amplifiers, filters and

 Vassilis Alimisis alimisisv@gmail.com
 Georgios Gennis giorgosyennis@gmail.com
 Christos Dimas chdim@central.ntua.gr
 Marios Gourdouparis mariosgourd97@gmail.com
 Paul P. Sotiriadis pps@ieee.org
 Department of Electrical and Computer Engineering, National Technical University of Athens,

15780 Zografou, Athens, Greece

converters), a digital back-end (digital processor, memory, digital transmission and others) and sometimes a central storage system (data center).

The continuing progress in technology leads to new crucial requirements of a smart sensor IoT system. These requirements are autonomy, area efficiency and smartness [5]. Autonomous systems are based on low-power consumption in order to mitigate the effects of limited power supply or battery dependencies with no online re-charge-ability. This means that they should operate autonomously for long periods of time (long duration of the stored energy). Area efficiency is achieved through the continuing progress in Integrated Circuit (IC) technologies [4]. This leads to more complex computations within the same chip area. As a result, the implemented systems become smarter and more efficient (increased sensor integration and complex computations are achieved).

With the aim of achieving the previous requirements, IoT is combined with Machine Learning (ML) algorithms and new computation techniques [6, 7]. In particular, ML algorithms and tools offer an easy and efficient solution to process data from multiple sources (for example sensors). In this way, real-time monitoring on raw data can easily extract the useful information and prevent unnecessary data transfer. This improves the response times and saves, the usually limited, bandwidth of many different applications.



Fig. 1 Classification system comparison.  $\mathbf{a}$  All-digital inference.  $\mathbf{b}$  Analog feature extraction with a digital classifier.  $\mathbf{c}$  Proposed classification system, combining Analog feature extraction with an Analog classifier, eliminating most digital circuitry

By employing edge computing techniques and directly integrating ML algorithms on sensor systems, the aforementioned benefits are greatly increased by avoiding transfering data to data centers [8–10]. This promising alternative increases processing speed by bringing the computation units near to the source.

However, existing IoT implementations rely on power hungry digital accelerators, which are usually limited by remote power constraints [11]. Typical all-digital inferences, shown in Fig. 1(a), consume from several  $\mu$ W to a couple mW [12, 13]. A new candidate, capable of further reducing the power consumption is analog computing [14, 15]. By incorporating application specific mathematical approaches and sub-threshold region techniques [16], analog integrated circuits can provide high speed, high accuracy and nano-Watt power consumption processing. In classification-based applications, there is a trend in which the digital feature extraction (FE) blocks are replaced with analog ones, shown in Fig. 1(b) [15, 17–21]. In these works, however, the classifier is still implemented using digital engines.

To this end, in this work we propose an ultra-low power classification system that takes advantage of analog feature extraction along with an analog classifier, to minimize the system's power consumption [22]. Such pure analog architectures completely eliminate the need for digital back-end and hence Analog-to-Digital converters (ADC). The whole classification system is illustrated in Fig. 1(c). This article extends the authors' previous work [23], which shares a 0.6 V, 3.3 nW, adjustable Gaussian circuit for tunable kernel functions. Specifically, it brings updated related work, refines the Gaussian circuit and applies the proposed circuit to an ultra-low power (520 nW) analog integrated Radial Basis Function (RBF)-based classifier tested on a real-world application [24].

The remainder of this paper is organized as follows. In Sect. 2 the necessary background for RBFs is provided. The proposed architecture along with the utilized building blocks are presented in Sect. 3. The proper operation of the proposed classifier is tested on a real-world classification dataset and the results are provided in Sect. 4. A performance summary of existing analog and mixed-mode classifiers is discussed in Sect. 5. Section 6 concludes the article.

#### 2 Radial basis functions

RBFs are real and positive-valued functions that depend on the distance between a fixed point and an input vector [25, 26]. The closer the input is to the fixed point, the higher the value of the RBF. Such functions are the Gaussian, the Multiquadratic, the polyharmonic spline and others [25]. RBFs are commonly used for mathematical approximations, for interpolations, as activation functions in Neural Networks (NN) or as kernels in ML algorithms [27]. A multivariate Gaussian RBF (GRBF) with a diagonal covariance matrix (mutually exclusive dimensions) is given by:

$$\mathcal{N}(X \mid M, \Sigma) = \frac{1}{\sqrt{(2\pi)^N \cdot \det(\Sigma)}} e^{-\frac{1}{2}(x-M)^T \Sigma^{-1}(X-M)}$$
$$= \prod_{n=1}^N \frac{1}{\sqrt{2\pi \cdot \sigma_n^2}} e^{-\frac{1}{2}(x_n - \mu_n)^2 / \sigma_n^2},$$
(1)

where, *X* is a *N* dimensional (*N*-D) input vector, *M* and  $\Sigma$  are the mean value and covariance matrices, respectively and  $x_n$ ,  $\mu_n$  and  $\sigma_n$  are the *n*-th entry of input vector *X*, the mean value and diagonal covariance matrices *M* and  $\Sigma$ , respectively.

For a GRBF network (GRBFN)-based classifier [28], in a classification problem with  $N_{cla}$  classes, the distance  $d_c(X)$  of the input X from the fixed template corresponding to a class c is calculated as:

$$d_c(X) = 1 - \mathcal{N}(X \mid M_c, \Sigma_c).$$
<sup>(2)</sup>

Here,  $M_c$  and  $\Sigma_c$  is the fixed template (mean value) and the covariance matrix of the GRBF belonging in class c, respectively. The output of the classifier is the class with the smallest distance from the input vector:

$$y = \underset{c \in \{1,...,N_{cla}\}}{\operatorname{argmin}} \{ d_c(X) \} = \underset{c \in \{1,...,N_{cla}\}}{\operatorname{argmin}} \{ 1 - \mathcal{N}(X \mid M_c, \Sigma_c) \}.$$
(3)

Equation (3) can be simplified to:

$$y = \underset{c \in \{1, \dots, N_{cla}\}}{\operatorname{argmax}} \{ \mathcal{N}(X \mid M_c, \Sigma_c) \}.$$
(4)



**Fig. 2** Proposed RBF-based classifier. (left) Each RBF class cell corresponds to a specific class. The currents  $[I_{ci}]_{i=1}^{4}$  depict the GRBF values. Each class cell requires 7  $V_{in}$ ,  $V_r$  and  $V_c$  voltages, each one for each feature. (right) A Lazzaro WTA composed of 4 neurons; one for each class. The output currents  $[I_i]_{i=1}^{4}$  indicate the winning class in the form of an one-hot-vector current

### **3 Proposed RBF classifier**

In this Section, the proposed GRBF classifier and its basic building blocks; a circuit generating a Gaussian function and a circuit implementing the argmax function, are thoroughly analyzed. This classifier's architecture, shown in Fig. 2, is aimed for a 4 class and 7 features classification problem. By changing the number of the RBF class cells, the architecture can be adapted for more or less classes. The currents  $[I_{ci}]_{i=1}^4$ , generated from the RBF class cells, depict the value of the 7-D GRBF describing each class *i*. The Winner-take-all (WTA) circuit applies the argmax operator on these 4 currents, based on (4), and indicates the winning class (largest 7-D GRBF value) as an one-hot-vector  $[I_1, I_2, I_3, I_4]$ . To address battery-dependent applications (low power consumption requirements), the power supply rails are set to  $V_{DD} = -V_{SS} = 0.3V$  and all transistors operate in the subthreshold region for the entire classifier.

#### 3.1 Gaussian function circuit

A typical Gaussian function circuit [23] (Bump circuit) is shown in Fig. 3 and produces a univariate GRBF, shown in Fig. 4. The output of the differential pair  $(M_{n1} - M_{n4})$ when changing the input voltage  $V_{in}$  consists of 2 complementary sigmoidal currents. The current correlator  $(M_{p1} - M_{p6})$ , correlates these currents and outputs a belllike (Bump) curve that resembles the Gaussian function. The parameter voltages  $V_r$  and  $V_c$  are used to alter the mean value and the variance of the GRBF, respectively. The bias current  $I_{bias}$  controls the height of the GRBF. This architecture can be easily expanded to generate multivariate GRBFs [23, 29].

By implementing multivariate Bump circuits, a range of applications is available [30]. In particular, Bump circuits



**Fig. 3** A typical bulk-controlled Gaussian function (Bump) circuit with a non-standard differential pair and a symmetrical current correlator. The mean value, the variance and the height of the GRBF are controlled by the parameter voltages  $V_r$ ,  $V_c$  and the bias current  $I_{bias}$ , respectively



**Fig. 4** The output current of a typical Bump circuit, for  $I_{bias} = 5$  nA,  $V_c = -300$  mV and  $V_r = 0$  V

can be utilized in ML, neuromorphic, smart sensor and fuzzy or neuro-fuzzy systems. In ML, there are multiple models that require Gaussian Kernels, like Support Vector Machines (SVMs), K-means-based models, Self Organized Maps (SOMs) or simply as an activation function in a RBF NN. In neuromorphic systems, Gaussian functions can be used as a similarity metric instead of Euclidean ones, since they have infinite derivatives. In literature, there are smart sensor systems which also use Gaussian functions instead of Euclidean ones, due to their high operation speed and ease of implementation. Fuzzy and neuro-fuzzy systems, usually prefer bell-shaped functions as membership



**Fig. 5** The proposed bulk-controlled Bump circuit with a modified differential pair, a cascode current mirror and a symmetrical current correlator. The mean value, the variance and the height of the GRBF are controlled by the parameter voltages  $V_r$ ,  $V_c$  and the bias current  $I_{bias}$ , respectively

functions (with many applications in controllers and pattern recognition).

In this work, the proposed Bump circuit, shown in Fig. 5, is a modification of our previous work [23], shown in Fig. 3. The PMOS diode connected transistors  $(M_{p7}, M_{p8})$ are replaced by NMOS ones  $(M_{n5}, M_{n6})$  in order to shift the variance range of the Bump circuit to smaller values. Smaller variances are usually desired in applications with multiple GRBFs. The simple current mirror is changed to a cascode one to enhance mirroring even for small current values, which is essential for multivariate Bump circuits. Since the output current of a Bump is smaller that the input one, the bias current of the last Bump is significantly reduced in a multivariate Bump circuit. The aim of both the improvements is to produce a higher quality Gaussian function and increase the classification accuracy of the proposed classifier. All transistors' dimensions are summarized in Table 1.

Each Bump circuit outputs a univariate Gaussian curve, shown in Fig. 6. By connecting two or more Bump circuits in a cascaded format, the output of the last Bump is equal to their multiplication and therefore one can obtain a multivariate GRBF, based on (1). In this topology only the first Bump has a preset bias current  $I_{bias}$ , while the rest are

Table 1 MOS transistors' dimensions (Fig. 5)

| Differential block | W/L $(\mu m/\mu m)$ | Current correlator  | W/L $(\mu m/\mu m)$ |
|--------------------|---------------------|---------------------|---------------------|
| $M_{n1} - M_{n4}$  | 1.2/0.4             | $M_{p1}, M_{p2}$    | 0.8/1.6             |
| $M_{n5}, M_{n6}$   | 4.8/0.4             | $M_{p3}$ - $M_{p6}$ | 0.4/1.6             |
| $M_{n7} - M_{n9}$  | 0.4/1.6             | -                   | _                   |
| $M_{n10}$          | 1.6/1.6             | _                   | -                   |



**Fig. 6** The output current of the proposed Bump circuit, for  $I_{bias} = 5$  nA,  $V_c = -300$  mV and  $V_r = 0$  V (post-layout simulation)



**Fig. 7** The proposed 7-D Bump circuit composed of 7 Bump circuits in a cascaded format. Each Bump has its own input and parameter voltages  $V_{in}$ ,  $V_r$  and  $V_c$ 

biased with the output current of the previous Bump cell. Each Bump cell has a unique input voltage  $V_{in}$  and parameter voltages  $V_r$  and  $V_c$ . A 7-D Bump circuit is shown in Fig. 7 and a proof of concept 2-D illustration is shown in Fig. 8. A typical 2-D Bump circuit is composed of 2 Bump circuits which are connected in a cascaded form, same as in Fig. 7. This topology constitutes a RBF Class cell and can be easily scaled for higher or lower feature dimensionality.

The three main characteristics of a univariate GRBF (mean value, variance, height) are controlled via the circuit's parameters [30]. In particular, the voltage parameter  $V_r$  is equal to the mean value, and hence the maximum of the output curve is achieved when  $V_{in} = V_r$ . Similarly, the bias current  $I_{bias}$  is equal to the height of the GRBF; a



**Fig. 8** The output current of a 2-D Bump circuit with bias current  $I_{bias} = 5 \text{ nA}$  and the parameter voltages  $V_{r_{1,2}} = 0 \text{ V}$  and  $V_{c_{1,2}} = -300 \text{ mV}$  is a 2-D Gaussian Function



**Fig. 9** A parametric simulation over  $V_r$  (mean value adjustment) of the proposed Bump circuit's output current, for  $I_{bias} = 5$  nA and  $V_c = 300$  mV (post-layout simulation)

decrease in the bias current results in the same decrease of the output current's peak. The relation between the voltage parameter  $V_c$  and the variance of the GRBF is a complex non-linear monotonically decreasing function. The appropriate figures are shown in Figs. 9, 10 and 11. This behavior is similar for multivariate Bump circuits.

The proposed circuit's sensitivity behavior is evaluated using the Monte-Carlo analysis tool for N = 200 points. In Fig. 12, the corresponding histograms for the three main characteristics of the Gaussian curve are depicted. The mean values and the standard deviations (std) for each histogram are summarized on Table 2. For comparison purposes the same procedure is repeated for the typical Bump circuit [23] and the results are shown in Fig. 13. Considering the desired values for each characteristic,  $I_{height} = 2nA$ ,  $V_{mean} = 0V$  and  $V_{std} = 108mV$ , the proposed circuit is consistently better than the typical one.



**Fig. 10** A parametric simulation over  $I_{bias}$  (height scaling) of the proposed Bump circuit's output current, for  $V_r = 0$  V and  $V_c = 300$  mV (post-layout simulation)



**Fig. 11** A parametric simulation over  $V_c$  (variance tuning) of the proposed Bump circuit's output current, for  $I_{bias} = 5$  nA and  $V_r = 0$  V (post-layout simulation)

#### 3.2 Winner-take-all circuit

The second building block, is the Lazzaro WTA circuit [31]. In general, for an  $N_{cla}$  classification problem, this analog building block has  $N_{cla}$  pairs of input-output ports  $[I_{in_i}, I_{out_i}]_{i=1}^{N_{cla}}$  and implements the argmax function [31]. We say that each pair  $(I_{in_i}, I_{out_i})$  constitutes a neuron, with a typical example shown in Fig. 14. In practice, given a set of  $N_{cla}$  input signals and assuming that there is a single maximum among them, located at index  $j \leq N_{cla}$ , the output  $I_{out_j}$  has a non-zero value (winner), whereas the rest are zero. If there is no single maximum, the WTA circuit operates in the linear region, where more than one winners may occur.

For the aforementioned classification problem, a WTA circuit composed of  $N_{cla} = 4$  neurons, shown in Fig. 15, is required. To better illustrate the behavior of this circuit a random representative case, regarding the input currents  $[I_{in_i}]_{i=1}^4$ , is considered, shown in Fig. 15 (left). The output currents, shown in Fig. 15 (right), accurately indicate the largest input. The variable parameter  $I_x$  is used to tune the input currents. For each neuron, the transistors' dimensions are equal to  $W/L = 0.4 \,\mu\text{m}/1.6\mu\text{m}$ .



Fig. 12 Monte-Carlo simulation of the proposed Bump circuit, for 200 points. Three histograms are provided, regarding (right) the height, (middle) the mean value and (left) the standard deviation of

the Gaussian curve. The desired corresponding values are  $I_{height} = 2$  nA,  $V_{mean} = 0$  V and  $V_{std} = 108$  mV, respectively

Table 2Monte-Carlo histogrammetrics

| Characteristic     | Proposed mean value | Proposed std value | Typical mean value | Typical std value |
|--------------------|---------------------|--------------------|--------------------|-------------------|
| Height             | 1.82 nA             | 0.15 nA            | 2.79 nA            | 0.38 nA           |
| Mean value         | – 1.87 mV           | 1.03 mV            | 8.31 mV            | 0.71 mV           |
| Standard variation | 109.4 mV            | 1.8 mV             | 107.2 mV           | 1.7 mV            |



Fig. 13 Monte-Carlo simulation of the typical Bump circuit, for 200 points. Three histograms are provided, regarding (right) the height, (middle) the mean value and (left) the standard deviation of the

Fig. 14 A standard Lazzaro neuron cell



# 4 Application example and simulation results

In this Section, the proposed architecture is tested on a real world protein's cellular localization sites dataset [24], to confirm its proper operation. In particular, the Ecoli data

Gaussian curve. The desired corresponding values are  $I_{height} = 2$  nA,  $V_{mean} = 0$  V and  $V_{std} = 108$  mV, respectively

set from the University of California, Irvene (UCI) machine learning repository [32] is used. The data are separated into a training and a test set, consisting of 234 and 102 7-D features regarding E. Coli proteins, respectively. The values of the employed data can be easily derived from an analog smart sensor system. This means that the proposed classifier can be used as a main building block in the classification system shown in Fig. 1(c). The 102 E. Coli proteins of the test set are classified into 4 possible classes (localization sites) by the proposed analog integrated architecture achieving 87.6% accuracy.

The parameters of the hardware-based classifier are copies of a software-based one trained in a software environment. The classifier is trained only once and then, the parameters are stored in an analog on-chip memory. On the contrary, the input signals originate from a previous analog feature extraction stage as rectangular pulses. Therefore, the training is performed in software to reduce the area and power footprint of the hardware-based training circuitry. Hence, the analog classifier is used only for



generated from a 4 neuron

the input currents (right) the

7

6

of Iterations w b u

9<sup>2</sup>

1

0



Fig. 17 Classification accuracy histogram of a software-based classifier over 20 iterations

prediction and is capable of a maximum operation speed of  $170K \frac{classifications}{2}$ second

For comparison purposes and to highlight the benefits of the proposed modifications, two RBF-based classifiers are presented. The first, called Proposed CLF I, uses the bump circuit implemented in our previous work [23] as its basic building block. The second, called Proposed CLF II, utilizes the proposed bump circuit instead. Both implementations are also compared with a software-based one in terms of classification accuracy. In particular, the classification results

Fig. 18 Classification accuracy histogram of the Proposed CLF I over 20 iterations (post-layout simulations)

**Classification Accuracy** 

over 20 separate training iterations are depicted in Figs. 17 18 and 19. A summary of the results for the three implementations is provided in Table 3. For this application, the Proposed CLF II outperforms the Proposed CLF I (17-20% increase in accuracy) and its performance is very close to the software-based implementation (1.4% decrease in accuracy). Additionally, the Proposed CLF II circuit's sensitivity behavior is evaluated using the Monte-Carlo analysis



Fig. 19 Classification accuracy histogram of the Proposed CLF II over 20 iterations (post-layout simulations)

Table 3 Classification accuracy (over 20 iterations)

| Method          | Best  | Worst | Mean  | Std.  |
|-----------------|-------|-------|-------|-------|
| Software        | 0.931 | 0.824 | 0.890 | 0.030 |
| Proposed CLF I  | 0.755 | 0.637 | 0.691 | 0.032 |
| Proposed CLF II | 0.922 | 0.833 | 0.876 | 0.020 |



Fig. 20 Post-layout Monte-Carlo simulation histogram of the Proposed CLF II architecture (for one of the previous 20 iterations)

tool for N = 100 points for one of the 20 previous iterations. The corresponding histogram with a mean value of  $\mu_M = 0.877$ , and a standard deviation of  $\sigma_M = 0.011$  is presented in Fig. 20. Both tests confirm the proper performance and sensitivity of the proposed architecture.

The architecture has been designed and simulated in a TSMC 90nm CMOS process using the Cadence IC suite.

All simulation results are conducted on the layout (postlayout simulations) presented in Fig. 21. The implementation of the layout is based on common-centroid technique and extra dummy transistors are used in order to avoid mismatches and manufacturing considerations [33].

#### 5 Performance summary and discussion

In this Section, a summary of recent analog and mixedmode classifiers, along with this work (Proposed CLF II) is provided. All the classifiers presented in this work incorporate Gaussian function circuits as their basic building block. Nonetheless, it is worth mentioning that a fair comparison between hardware-based ML implementations is not possible since there are numerous aspects that need to be considered combinatorially, such as the implemented technology, the application, power and area specifications, the computation speed and so forth. A performance summary for recent existing RBF-based classifiers is provided in Tables 4 and 5. These classifiers are based on ML algorithms or models that are suitable (provide more advantages) for specific applications. These are RBF NNs [34–36], RBF Vector quantizers (VQs) [22, 37, 38], SVM [39, 40], a Deep ML (DML) engine [41] and a SOM [42]. In general, RBFs are preferred over other non-linear functions, since they can easily model a vast variety of data following the Normal distribution [43]. RBF-based classifiers are only a small portion of the existing classifiers, for example random forests, DNNs, RNNs, LSTMs [26, 44]. Despite being less accurate than other state-of-the-art classifiers, their ease of implementation makes them more desirable in analog integrated ML applications.

The aim of this work is the implementation of an ultralow power and area efficient RBF classifier. Since the Bump circuit is used in a cascaded format multiple times, it is dominant regarding the area and power efficiency of the whole classifier. The aforementioned modifications of the proposed Bump circuit greatly increase the accuracy of the classifier (> 17%) while maintaining a low power consumption. In particular, the Proposed CLF I architecture consumes from 4.3 to 14.8 nW per Bump circuit, whereas the Proposed CLF II consumes from 4 to 17.5 nW per Bump circuit (depending on the system's input). Additionally, by designing a pure analog classifier the total power consumption is reduced in comparison to power hungry mixed-mode, shown in Tables 4, 5, or digital implementations, for example [12, 13]. Moreover, a fully analog classification system, shown in Fig. 1(c), provides a digital output without the need for Analog-to-Digital converters.

Fig. 21 Layout of the proposed classifier



**Table 4** Analog integrated MLclassifiers summary

|           | Technology | Architecture | Classifier | No. of dimensions  |
|-----------|------------|--------------|------------|--------------------|
| This work | 90 nm      | Analog       | GRBFN      | 7                  |
| [34]      | 180 nm     | Mixed-mode   | RBF NN     | 1                  |
| [35]      | Discrete   | Analog       | RBF NN     | N/A                |
| [36]      | 130 nm     | Mixed-mode   | RBF NN     | $*1280 \times 720$ |
| [22]      | 0.5 µm     | Analog       | RBF VQ     | 2                  |
| [37]      | 0.6 µm     | Mixed-mode   | RBF VQ     | 16                 |
| [38]      | 2 µm       | Analog       | RBF VQ     | 16                 |
| [39]      | 180 nm     | Analog       | SVM        | 2                  |
| [40]      | 180 nm     | Analog       | SVM        | 64                 |
| [41]      | 130 nm     | Mixed-mode   | DML engine | 8                  |
| [42]      | 180 nm     | Analog       | SOM        | 3                  |
| *Pixels   |            |              |            |                    |

Table 5Analog integrated MLclassifiers summary

|           | Energy per classification | Power consumption per bump | Area                    |
|-----------|---------------------------|----------------------------|-------------------------|
| This work | 0.75–3.3 pJ               | 4–17.5 nW                  | $0.050\mathrm{mm^2}$    |
| [34]      | N/A                       | *13.5 nW                   | **0.013 mm <sup>2</sup> |
| [35]      | N/A                       | *4.1 µW                    | $**10 \mu m^2$          |
| [36]      | N/A                       | 10.5 μW                    | $0.140\mathrm{mm^2}$    |
| [22]      | 25.2-89.6 nJ              | 90–160 μW                  | $2.250\mathrm{mm}^2$    |
| [37]      | 60–600 nJ                 | N/A                        | $20.250\text{mm}^2$     |
| [38]      | N/A                       | N/A                        | $4.950\mathrm{mm^2}$    |
| [39]      | 252.9 pJ                  | N/A                        | $0.060  \text{mm}^2$    |
| [40]      | N/A                       | N/A                        | N/A                     |
| [41]      | 1.37 nJ                   | N/A                        | $0.360\mathrm{mm^2}$    |
| [42]      | 352–769 nJ                | N/A                        | 0.240mm <sup>2</sup>    |

\*Minimum power consumption

\*\*Area per bump

## 6 Conclusion

An analog integrated GRBF-based classifier was presented to be used as an ultra-low power block for a fully analog classification system. It utilizes the proposed Bump circuit and a 4 neuron Lazzaro WTA circuit. The proposed classification system includes an area efficient, 112–520 nW analog classifier which can replace a power hungry digital engine. A real-world classification problem confirmed the proper operation of the proposed architecture, achieving 87.6% classification accuracy. All post-layout simulation results were extracted using the Cadence IC Suite in a TSMC 90nm technology.

Availability of data and materials: Data used in the experiments has been generated through publicly available simulators. Related simulation files have been shared through the links given in the paper in order to fully reproduce the presented results.

#### Declarations

Ecoli Data Set: https://archive.ics.uci.edu/ml/datasets/ecoli

#### References

- Rahman, L. F., Ozcelebi, T., & Lukkien, J. (2018). Understanding iot systems: A life cycle approach. *Procedia Computer Science*, *130*, 1057–1062.
- 2. Madakam, S., Lake, V., Lake, V., Lake, V., et al. (2015). Internet of things (iot): A literature review. *Journal of Computer and Communications*, *3*(05), 164.
- Kashani, M. H., Madanipour, M., Nikravan, M., Asghari, P., & Mahdipour, E. (2021). A systematic review of iot in healthcare: Applications, techniques, and trends. *Journal of Network and Computer Applications*, 192, 103164.
- 4. Alioto, M. (2017). Enabling the internet of things: From integrated circuits to integrated systems. Springer.
- 5. Kyung, C.-M., Yasuura, H., Liu, Y., & Lin, Y.-L. (2017). Smart sensors and systems. Springer.
- Ha, N., Xu, K., Ren, G., Mitchell, A., & Ou, J. Z. (2020). Machine learning-enabled smart sensor systems. *Advanced Intelligent Systems*, 2(9), 2000063.
- Tahsien, S. M., Karimipour, H., & Spachos, P. (2020). Machine learning based solutions for security of internet of things (iot): A survey. *Journal of Network and Computer Applications*, 161, 102630.
- Salman, O., Elhajj, I., Kayssi, A., & Chehab, A. (2015). Edge computing enabling the internet of things. In 2015 IEEE 2nd World forum on internet of things (WF-IoT) (pp. 603–608). IEEE.
- Shi, W., & Dustdar, S. (2016). The promise of edge computing. Computer, 49(5), 78–81.
- Zhou, F., & Chai, Y. (2020). Near-sensor and in-sensor computing. *Nature Electronics*, 3(11), 664–671.
- Talib, M. A., Majzoub, S., Nasir, Q., & Jamal, D. (2021). A systematic literature review on hardware implementation of artificial intelligence algorithms. *The Journal of Supercomputing*, 77, 1897–1938.
- De Vita, A., Pau, D., Parrella, C., Di Benedetto, L., Rubino, A., & Licciardo, G.D. (2020). Low-power hwaccelerator for ai edgecomputing in human activity recognition systems. In 2020 2nd IEEE international conference on artificial intelligence circuits and systems (AICAS) (pp. 291–295). IEEE.
- 13. Lin, S.-K., Wang, L.-C., Lin, C.-Y., Chiueh, H., et al. (2018). An ultra-low power smart headband for real-time epileptic seizure detection. *IEEE Journal of Translational Engineering in Health and Medicine*, *6*, 1–10.
- Haensch, W., Gokmen, T., & Puri, R. (2018). The next generation of deep learning hardware: Analog computing. *Proceedings of the IEEE*, 107(1), 108–122.
- Zhang, Y., Mirchandani, N., Onabajo, M., & Shrivastava, A. (2020). Rssi amplifier design for a feature extraction technique to detect seizures with analog computing. In 2020 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
- 16. Wang, A., Calhoun, B. H., & Chandrakasan, A. P. (2006). Subthreshold design for ultra low-power systems 95. Springer.
- Yang, M., Liu, H., Shan, W., Zhang, J., Kiselev, I., Kim, S.J., Enz, C., & Seok, M. (2021). Nanowatt acoustic inference sensing exploiting nonlinear analog feature extraction. *IEEE Journal of Solid-State Circuits*, 56(10), 3123–3133.
- Villamizar, D. A., Muratore, D. G., Wieser, J. B., & Murmann, B. (2021). An 800 nw switched-capacitor feature extraction

filterbank for sound classification. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 68(4), 1578–1588.

- Miller, R. (2007). Theory of the normal waking eeg: From single neurones to waveforms in the alpha, beta and gamma frequency ranges. *International Journal of Psychophysiology*, 64(1), 18–23.
- 20. Kim, S., Yan, L., Mitra, S., Osawa, M., Harada, Y., Tamiya, K., Van Hoof, C., & Yazicioglu, R. F. (2013). A 20μw intra-cardiac signal-processing ic with 82db bio-impedance measurement dynamic range and analog feature extraction for ventricular fibrillation detection. In 2013 IEEE international solid-state circuits conference digest of technical papers (pp. 302–303). IEEE.
- Yang, M., Yeh, C.-H., Zhou, Y., Cerqueira, J. P., Lazar, A. A., & Seok, M. (2018). A 1µw voice activity detector using analog feature extraction and digital deep neural network. In 2018 IEEE international solid-state circuits conference-(ISSCC) (pp. 346–348). IEEE.
- 22. Peng, S.-Y., Hasler, P. E., & Anderson, D. V. (2007). An analog programmable multidimensional radial basis function based classifier. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 54(10), 2148–2158.
- Alimisis, V., Gourdouparis, M., Dimas, C., & Sotiriadis, P. P. (2021). A 0.6 v, 3.3 nw, adjustable gaussian circuit for tunable kernel functions. In 2021 34th SBC/SBMicro/IEEE/ACM symposium on integrated circuits and systems design (SBCCI) (pp. 1–6). IEEE.
- 24. https://archive.ics.uci.edu/ml/datasets/ecoli
- 25. Buhmann, M. D. (2003). Radial basis functions: Theory and implementations 12. Cambridge University Press.
- Bishop, C. M. (2006). Pattern recognition. *Machine Learning*, 4(4), 738, New York, Springer.
- Xie, T., Yu, H., & Wilamowski, B. (2011). Comparison between traditional neural networks and radial basis function networks. In 2011 IEEE international symposium on industrial electronics (pp. 1194–1199). IEEE.
- 28. He, Q., Shahabi, H., Shirzadi, A., Li, S., Chen, W., Wang, N., et al. (2019). Landslide spatial modelling using novel bivariate statistical based naïve bayes, rbf classifier, and rbf network machine learning algorithms. *Science of the Total Environment*, 663, 1–15.
- Gourdouparis, M., Alimisis, V., Dimas, C., & Sotiriadis, P. P. (2021). An ultra-low power, pm0.3 v supply, fully-tunable gaussian function circuit architecture for radial-basis functions analog hardware implementation. AEU-International Journal of Electronics and Communications, 136, 153755.
- Alimisis, V., Gourdouparis, M., Gennis, G., Dimas, C., & Sotiriadis, P. P. (2021). Analog gaussian function circuit: Architectures, operating principles and applications. *Electronics*, *10*(20), 2530.
- Lazzaro, J., Ryckebusch, S., Mahowald, M. A., & Mead, C. A. (1988). Winner-take-all networks of o (n) complexity.
- Blake, C. (1998). Uci repository of machine learning databases. https://archive.ics.uci.edu
- 33. Sharma, A. K., Madhusudan, M., Burns, S. M., Mukherjee, P., Yaldiz, S., Harjani, R., & Sapatnekar, S. S. (2021). Commoncentroid layouts for analog circuits: Advantages and limitations.. In *Proceedings of the DATE*. IEEE.
- Mohamed, A. R., Qi, L., Li, Y., & Wang, G. (2020). A generic nano-watt power fully tunable 1-d gaussian kernel circuit for artificial neural network. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 67(9), 1529–1533.
- Dorzhigulov, A., & James, A. P. (2019). Generalized bell-shaped membership function generation circuit for memristive neural networks. In 2019 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.

- Lee, K., Park, J., & Yoo, H.-J. (2019). A low-power, mixed-mode neural network classifier for robust scene classification. *Journal* of Semiconductor Technology and Science, 19(1), 129–136.
- Yamasaki, T., & Shibata, T. (2003). Analog soft-pattern-matching classifier using floating-gate mos technology. *IEEE Transactions on Neural Networks*, 14(5), 1257–1265.
- 38. Cauwenberghs, G., & Pedroni, V. (1995). A charge-based cmos parallel analog vector quantizer.
- 39. Kang, K., & Shibata, T. (2009). An on-chip-trainable gaussiankernel analog support vector machine. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 57(7), 1513–1524.
- Zhang, R., & Shibata, T. (2012). Fully parallel self-learning analog support vector machine employing compact Gaussian generation circuits. *Japanese Journal of Applied Physics*, 51(4S), 04–10.
- 41. Lu, J., Young, S., Arel, I., & Holleman, J. (2014). A 1 tops/w analog deep machine-learning engine with floating-gate storage in 0.13 mum cmos. *IEEE Journal of Solid-State Circuits*, 50(1), 270–281.
- 42. Li, F., Chang, C.-H., & Siek, L. (2009). A compact current mode neuron circuit with Gaussian taper learning capability. In 2009 IEEE international symposium on circuits and systems (pp. 2129–2132). IEEE.
- 43. Montgomery, D. C., Runger, G. C., & Hubele, N. F. (2009). Engineering statistics. Wiley.
- 44. Haykin, S. (2004). Kalman filtering and neural networks 47. Wiley.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Vassilis Alimisis received the B.Sc. in Physics (top 1%) and the M.Sc. degree in Electronics and Communications from the University of Patras, Greece, in 2017 and 2019 respectively. Currently, he is pursuing Ph.D. degree at the National Technical University of Athens, Greece, under the supervision of Professor Paul P. Sotiriadis. He is a Teaching Assistant in undergraduate and graduate courses and supervises diploma theses. He has authored and co-au-

thored several conference papers and journal articles. His main research interests include analog microelectronic circuits, low power electronics, analog computing and integrated circuit architectures with applications in artificial intelligence and machine learning. He has received the Best Paper Award in the IEEE Int. Conf. on Microelectronics 2020 and the Best Paper Award in IEEE Symposium on Integrated Circuits and Systems Design (SBCCI) 2021. He is a regular reviewer for many journals and conferences and an IEEE student member.



Georgios Gennis is a Senior Graduate Student in Electrical and Computer Engineering Department at the National Technical University of Athens, Greece. Currently, he is pursuing his Diploma Thesis, under the supervision of Prof. Paul P. Sotiriadis. He has authored and co-authored several conference papers and a journal article. His main research interests include analog microelectronic circuits, low power electronics, analog computing and integrated circuit

architectures with applications in artificial intelligence and machine learning. He is an IEEE Student member.



Christos Dimas received the diploma degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece, in 2016, and is currently working toward the Ph.D. degree at the same department, under the supervision of Prof. Paul P. Sotiriadis. His PhD thesis research subject "Image Reconstruction is Approaches and Circuit Modeling in Electrical Impedance Tomography". He is a Teaching Assistant in undergraduate and

graduate courses and supervises diploma theses. His main research includes electrical impedance tomography, bio-impedance measurement, modeling and instrumentation. He has authored and co-authored several conference papers and journal articles. He has received a Best Paper Award in the IEEE Int. Conf. on Microelectronics 2020 and a Best Paper Award in IEEE Symposium on Integrated Circuits and Systems Design (SBCCI) 2021. He is a regular reviewer for many journals and conferences and an IEEE student member.



Marios Gourdouparis received the diploma degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece, in 2021, and is currently working toward the Ph.D. degree at TU Delft. His main research interests include analog microelectronic circuits, low power electronics, analog computing and integrated circuit architectures with applications in artificial intelligence and machine learning. He has received the Best Paper

Award in IEEE Symposium on Integrated Circuits and Systems Design (SBCCI) 2021. He is an IEEE Student member.



**Paul P. Sotiriadis** (SM'18) is a Professor of Electrical and Computer Engineering of the National Technical University of Athens, Greece, the Director of the Electronics Laboratory of the University and a governing board member of the Hellenic Space Center, the National space center of Greece. He runs a team of 30 researchers. He received the Diploma degree in Electrical and Computer Engineering from same University, the M.S. degree in Electrical

Engineering from Stanford University, USA and the Ph.D. degree in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology, USA, in 2002. In 2002, he joined the faculty of the Johns Hopkins University Electrical and Computer Engineering Department and in 2012 he joined the faculty of the Electrical and Computer Engineering Department of the National Technical University of Athens. He has authored and coauthored more than 160 research publications, holds one patent, and has contributed several chapters to technical books. His research interests include the design, optimization, and mathematical modeling of analog, mixed-signal, RF and microwave integrated and discrete circuits, sensors and instrumentation architectures, biomedical instrumentation, interconnect networks and advanced frequency synthesis. He has led several projects in these fields funded by U.S. organizations and has collaborations with industry and national labs. He has received several awards, including the 2012 Guillemin-Cauer Award from the IEEE Circuits and Systems Society, a Best Paper Award in the IEEE International Symposium on Circuits and Systems 2007, a Best Paper Award in the IEEE Int. Frequency Control Symposium 2012, a Best Paper Award in the IEEE Int. Conf. on Modern Circuits and Systems Tech. 2019, a Best Paper Award in the IEEE Int. Conf. on Microelectronics 2020 and a Best Paper Award in IEEE Symposium on Integrated Circuits and Systems Design (SBCCI) 2021. Dr. Sotiriadis is an Associate Editor of the IEEE Sensors Journal, has served as an Associate Editor of the IEEE Trans. on Circuits and Systems-I (2016-2020) and the IEEE Trans. on Circuits and Systems-II (2005-2010) and has been a member of technical committees of many conferences. He regularly reviews for many journals and conferences and serves on proposal review panels.