 Open Access
 Total Downloads : 9
 Authors : Vigneshwar A, Dr. Sathish Kumar G A
 Paper ID : IJERTCONV4IS14039
 Volume & Issue : NCSPC – 2016 (Volume 4 – Issue 14)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Approximate Multiplier for Low Power Applications
Vigneshwar A
Electronics and Communication Engineering Sri Venkateswara College of Engineering Chennai, India
Dr. Sathish Kumar G A
Electronics and Communication Engineering Sri Venkateswara College of Engineering Chennai, India
Abstract Imprecise computing is best suited for error resilient applications, such as signal processing and multimedia. Imprecise computing provides meaningful and faster results with lower power consumption; that is particularly attractive for arithmetic circuits. In this paper, a new design is proposed to exploit the partitions of partial products using recursive multiplication by compressorbased approximate multipliers. Three multiplier designs are proposed using 4:2 approximate compressors. Extensive simulation results show that the proposed design achieve significant accuracy improvement together with power and area reduction compared to previous approximate multiplier designs. The first two mutiplier design uses first approximate compressor. The proposed multiplier design (1) is simulated and synthesized in xilinx software with less accuracy, less power and area compared with existing multipliers. The proposed multiplier design (2) is also simulated and synthesized in Xinlix software with high accuracy at the cost of more area and power than the proposed design (1). The proposed multiplier design (3) uses second approximate compressor. It is simulated and synthesized using Xilinx software. It uses less area and power compared to design (1) and design (2).
KeywordsApproximate computing, compressor, multiplier

INTRODUCTION
Many scientific and engineering problems are computed using accurate, precise and deterministic algorithms. However, in many applications involving signal/image processing and multimedia, exact and accurate computations are not always necessary, because these applications are error tolerant and produce results that are good enough for human perception [1]. In these error resilient applications, a reduction in circuit complexity, and thus, area, power and delay is very important for the operation of a circuit. Hence, approximate computing can be used in error tolerant applications by reducing accuracy, but still providing meaningful results faster and/or with lower power consumption [2]. Addition and multiplication are often used in these applications. For addition, full adders have been analyzed in detail and a number of approximate designs have been proposed [1]. In [3], several new metrics are proposed and a comparison is made among some of the adder designs. The error distance (ED) is defined as the arithmetic distance between an erroneous and the correct outputs for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed. Recently, approximate multipliers have also gained significance because of their importance in arithmetic operations [410]; several
approximate 4:2 compressors have been proposed in the reduction of the partial products of a Dadda tree. In this paper, the approximate compressors of [10] are utilized to design 8×8 bit multipliers by a novel partition of the partial products. The newlydesigned approximate multipliers are more accurate than the ones proposed in [10] and require approximately the same power and delay; it is shown that the improvement in accuracy is significant, albeit at a slightly increase in area. This paper is organized as follows. Section II reviews approximate multipliers and the compressors used in the proposed designs. Section III presents the proposed multipliers. Section IV provides the simulation results for the multipliers and compares the proposed design with [10]. Section V presents an image processing application using the approximate multipliers and Section VI gives the conclusion.

PRELIMINARIES AND REVIEW

Approximate multipliers
An error tolerant multiplier (ETM) uses accuracy as a design parameter and divides the operands into two parts multiplication and nonmultiplication, depending on the required accuracy [4]. It performs the multiplication only in the first part, thus saving power and delay at the cost of accuracy. A novel 2×2 bit under designed multiplier (UDM) is proposed and used to build a larger multiplier [5]. [6] presents a 6×6 bit broken array multiplier (BAM), that is faster than an accurate array multiplier. [7] proposes a 4×4 imprecise counterbased multiplier (ICM) that uses a 4:2 inaccurate counter to reduce the partial product stages of a Wallace tree multiplier. It leads to a power efficient design, which can then be used to implement multipliers of large sizes. Four different modes of an approximate Wallace tree multiplier (AWTM) are presented in [8]. This design uses a carryin prediction method, resulting in hardware reduction and thus, less power, area and delay compared to the accurate Wallace tree multiplier. Also, AWTM uses the simple recursive multiplication technique that has also been used in this paper and explained in Section II.C. [9] proposes a fast and powerefficient multiplier based on an approximate adder that can process data in parallel by cutting the carry propagation chain. Two new approximate 4:2 compressors and four approximate multipliers are proposed in [10]. Similar compressors have been used in the partial product reduction stage in the multipliers proposed in this paper. Most of the approximate multipliers aim for a tradeoff in accuracy, power, delay and area.

Recursive multiplication
The technique used in this paper for designing 8 x 8 multipliers using 4 x 4 multipliers is known as recursive multiplication. Suppose there are 2 numbers A & B of 2a bits each. It is possible to break the two numbers into two halves
i.e. most significant a bits and least significant a bits. So Ah denotes the upper a bits of A. Al denotes lower a bits of A and similarly, Bh and Bl denotes upper and the lower a bits of B respectively. Then instead of performing a 2a x 2a multiplication, four a x a multiplications are performed (AhBh, AhBl, AlBh, AlBl) and added to get the final output as shown in fig. 1.
Fig. 1. Format of the inputs for recursive multiplication

Accurate compressor
Compressors are used to reduce the number of partial product stages. The basic structure of an accurate 4:2 compressor chain utilized in the partial product reduction is shown in Fig. 2. A 4:2 compressor produces a sum for the same order of the next stage, and a carry for one order higher in the next stage. Also, a carry out (Cout) is generated and becomes the carry in (Cin) of the n ext higherorder compressor. A 4:2 accurate compressor is implemented using two full adder circuits as shown in Fig. 3. There are many other designs for implementing the accurate compressor. [11] describes a design for a 4:2 accurate compressor using three XORXNOR gates , one XOR gate and two 2:1 multiplexers. The logic equations for the three outputs of the compressor are as follows:
Fig. 2. Adjacent compressors with in a chain in the partial product reduction stage
Fig. 3. An accurate compressor by using two full adders

Approximate compressors utilized
The two designs of inaccurate compressors as proposed in
[10] have been used in this paper when designing multipliers. Both designs are based on the modification of the truth table of the accurate compressor to reduce the hardware. In design 1, the carry signal is directly connected to the signal and the columns of the sum and signals are modified to reduce the hardware, and hencereducing the delay. The logic functions for design 1 are given as:In design 2, Cout is completely removed hence there is no need for fifth input as well. Hence this design further simplifies the circuit and gives better results interms of accuracy. The logic function of design 2, is given as
The circuit diagrams for the two inaccurate designs are shown in the Fig. 4.
Fig. 4. Two approximate compressors
TABLE 1
TRUTH TABLE OF APPROXIMATE COMPRESSOR 1
TABLE 2
TRUTH TABLE OF APPROXIMATE COMPRESSOR 2


PROPOSED DESIGNS
In this section, the proposed multiplier designs are presented. Since the technique of recursive multiplication issused, multipliers are required 8 x 8for the implementation of the product. Hence,the multiplier designs are presented 4 x 4 too. The method of recursive multiplication is shown using a partial product tree to illustrate its difference from a conventional design.

4X4 bit designs
Three 4 x 4 bit multipliers have been implemented and further used in the 8 x 8 bit multiplication. All three designs are implemented using the Dadda tree technique by making use of different 4:2 compressors in the reduction stage. Using the compressors, the 4 x 4 bit product requires one reduction stage, making the product calculation faster.
For the first design, Mul44_1, the design 1 compressor shown in Fig. 4 (a) is used in the partial product reduction stage. The Dadda tree implementation of Mul44_1 is shown in Fig. 5 (a); only two compressors are required in the partial product reduction stage.
Similarly, for the second design, Mul44_2, design 2 compressor shown in Fig. 4 (b) is used in the reduction stage and as design 2 does not have a carry to the next stage, the design is a bit different from Mul44_1. The Dadda tree implementation of Mul44_2 is shown in Fig. 5 (b). Only one compressor is required in the reduction stage, which significantly simplifies the design.
For the accurate 4×4 multiplier, Mul44_acc, the Dadda tree implementation is the same as Mul44_1, because the design 1 compressor and the accurate compressor use the same types of circuits. Hence, only accurate compressors need to be used in place of the design 1 compressors. In this multiplier, two accurate compressors are required in the reduction stage.
Fig. 5. Use of compressors for partial product reduction

8X8 bit designs
4 x 4 multipliers are used in the implementation of 8 x 8 multipliers. The partial product tree of the 8 x 8 multiplication are broken down to 4 products of 4×4 modules
using the technique of recursive multiplication, as shown in Fig. 6. The advantage of breaking the products is to obtain smaller multiplication blocks that are performed in parallel and thus faster. Then, they merely need to be added, according to Fig. 1 to obtain the final product.
The proposed multiplier Mul881 uses Mul441 for the computation of all the four partial products. For high accuracy designs, Mul44acc can be used for the three approximate
compressor design can be used for the least significant product, The proposed multiplier Mul882 uses MUL 441 for least significant product and MUL44ACC for other three products. The proposed multiplier MUL883 uses MUL442 for all the four partial products. Its accuracy is high compared to MUL881.
Fig. 6. 8 x 8 bit multiplication broken down into four parts of 4 by 4 bit multiplications (using recursive multiplication)


SIMULATION RESULTS
In this section, the designs of the proposed multipliers as explained in Section III are evaluated. The proposed multipliers are simulated and synthesized using Xilinx software.
TABLE 3
PERFORMANCE COMPARISION OF PROPOSED MULTIPLIER USING APPROXIMATE COMPRESSOR 1
POWER
NUMBER OF
NUMBER
DESIGN
CONSUMPTION
EQUIVALENT
OF
(mW)
GATES
LUTs
EXISITING
MULTIPLIER 1
33.38
1919
169
EXISTING
MULTIPLIER 2
34.67
2737
237
MUL881
26.40
1755
160
MUL882
28.15
2094
184
We can see from the above table area and power consumption of proposed multiplier is drastically reduced compared to existing multipliers.
TABLE 4
PERFOMANCE PARAMETERS OF PROPOSED MULTIPLIER USING APPROXIMATE COMPRESSOR 2
POWER CONSU MPTION
(mW)
NUMBER
OF EQUIVA LENT GATES
NUMBER OF LUTS
MUL881
26.40
1755
160
MUL883
19.94
1563
148
We can see from the above table that proposed multiplier using approximate compressor 2 consumes less area and power compared to proposed multipliers using approximate compressor 1.

CONCLUSION
In this paper three approximate multipliers were proposed. The proposed approximate multiplier design (1) uses first approximate compressor in all four partial products. It requires less area and power than the existing multipliers. Its accuracy is less compared to proposed approximate multiplier design (2). The proposed approximate multiplier design (2) uses accurate compressors in three most significant partial products and first approximate compressor in least significant product. It requires less power compared to existing multipliers. Its accuracy is higher than the existing multipliers. Depending on the application requirements one can select the proposed approximate multiplier design (1) or multiplier design (2). The proposed approximate multiplier design (2) is very much suitable for high accuracy applications. The proposed multiplier design (3) uses second approximate compressor in all the four partial products. Its accuracy is high compared to the proposed multiplier approximate multiplier design (1).
REFERENCES

J. Liang, J. Han, and F. Lombardi, New metrics for the reliability of approximate and Probabilistic Adders, IEEE Trans. Computers, vol. 63, no. 9, pp. 17601771, Sep. 2013.

V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, IMPACT: IMPrecise adders for lowpower approximate computing, in Proc. Int. Symp. Low Power Electron. Design, Aug. 2011, pp. 409414.

S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. S. Akgul, and
L. N. Chakrapani, A probabilistic CMOS switch and its realization by exploiting noise, presented at the IFIP Int. Conf. Very Large Scale Integ., Perth, Australia, Oct. 2005.

H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, Bioinspired imprecise computational blocks for efficient VLSI implementation of softcomputing applications, IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 57, no. 4, pp. 850862, Apr. 2010.

M. J. Schulte and E. E. Sartzlander Jr., Truncated multiplication with correction constant, in Proc. Workshop VLSI Signal Process. VI, 1993, pp. 388396.

E. J. King and E. E. Swartzlander Jr., Data dependent truncated scheme for parallel multiplication, in Proc. 31st Asilomar Conf. Signals, Circuits Syst., 1998, pp. 11781182.

P. Kulkarni, P. Gupta, and M. D. Ercegovac, Trading accuracy for power in a multiplier architecture, J. Low Power Electron., vol. 7, no. 4, pp. 490501, 2011.

C. Chang, J. Gu, and M. Zhang, Ultra lowvoltage low power CMOS 42 and 52 compressors for fast arithmetic circuits, IEEE Trans. Circuits Syst., vol. 51, no. 10, pp. 19851997, Oct. 2004.

D. Radhakrishnan and A. P. Preethy, LowPower CMOS pass logic 42 compressor for highspeed multiplication, in Proc. IEEE 43rd Midwest Symp. Circuits Syst., 2000, vol. 3, pp. 12961298.

Z. Wang, G. A. Jullien, and W. C. Miller, A new design technique for column compression multipliers, IEEE Trans. Comput., vol. 44, no. 8, pp. 962970, Aug. 1995.

J. Gu and C. H. Chang, Ultra lowvoltage, lowpower 42 compressor for high speed multiplications, in Proc. 36th IEEE Int. Symp. Circuits Syst., Bangkok, Thailand, May 2003, pp. v321v 324.

M. Margala and N. G. Durdle, Lowpower lowvoltage 42 compressors for VLSI Applications, in Proc. IEEE Alessandro Volta Memorial Workshop LowPower Design, 1999, pp. 8490.

B. Parhami, Computer Arithmetic; Algorithms and Hardware Designs, 2nd ed. London, U.K.: Oxford Univ. Press, 2010.