# AREA EFFICIENT LINEAR-PHASE FIR DIGITAL FILTER STRUCTURES

Suganya.S<sup>1</sup> | Latha.P<sup>2</sup> | Naveenkumar.R<sup>3</sup> | Abiramasundari.S<sup>4</sup>

<sup>1</sup>(Department of ECE, Anna University, VSBCETC, Coimbatore, India, suganya.ece07@gmail.com)
 <sup>2</sup>(Department of ECE, Anna University, VSBCETC, Coimbatore, India, veejaylatha@gmail.com)
 <sup>3</sup>(Department of ECE, Anna University, VSBCETC, Karur, India, naveentamil256@gmail.com)
 <sup>4</sup>(Department of ECE, Anna University, VSBCETC, Coimbatore, India, abiramasundariece@gmail.com)

**Abstract**— The propounded work brings a new parallel FIR filter structures beneficial to symmetric coefficients in terms of the hardware cost, under the situation that the taps in the filter structure is a multiple of 2 or 3. The propounded parallel FIR structures exploits the inherency of symmetric coefficients breaking down the number of multipliers in sub-filter section at the expense of additional adders in the processing blocks to half the number. Upon interchanging multipliers with adders is recommended because adders weigh less on comparing to multipliers in terms of silicon area; in addition the aloft from the additional adders in pre processing and post processing blocks remains unmoved and do not increase along with the length of the FIR filter, whereas the number of reduced multipliers enlarges along with the length of the FIR filter. For example, in a four-parallel 72-tap filter, the propounded structure rescues 216 multipliers at the outlay of 11 adders. On the whole, the proposed parallel FIR structures leads to significant hardware reduction for symmetric convolutions from the current FFA parallel FIR filter, especially when the length of the filter is large.

Keywords—Digital Signal Processing (DSP); Fast finite-Impulse Response (FIR) Algorithms (FFAs); Co-ordinate FIR; Symmetric-Convolution; Very Large Scale Integration (VLSI)

## 1. INTRODUCTION

Due to the tremendous growth in multimedia application, the insistence for high-performance and low-power digital signal processing (DSP) is getting sophisticated. Finiteimpulse response (FIR) digital filters are one of the most widely used elemental devices performed in DSP systems, fluctuating from wireless communications to video and image processing. Some applications need the FIR filter to function at high frequencies such as video processing whereas other applications request high turnout with a lowpower circuit equally as multiple-input multiple-output (MIMO) systems used in cellular wireless communication. Furthermore when cramped progression-band characteristics are needed, the higher order in the FIR filter is unavoidable. For example, a 576-tap digital filter used in video ghost canceller for broadcast television reduces the effect of multipath signal imitation. On the other contrary, parallel and pipelining are the two techniques used in DSP applications which can both be exploited to reduce the power utilization. Pipelining shortens the critical path by interposing the pipelined latches along the data path, at the price of increasing the number of latches and the system dormancy whereas parallel processing increases the inspecting rate by replicating hardware so that multiple inputs can be processed in parallel and outputs are generated at the expense of increased area. Both the techniques reduce the power consumption by decreasing the supply voltage where the sampling speed remains constant. In this paper, parallel processing in the digital FIR filter structures are discussed. Due to its continuous increase in the hardware decapitation cost conducted by the increase in the block size L the parallel processing its advantage practical technique gives up in

implementation. There are papers proposing a number of ways to reduce the complexity of the parallel FIR filter in the past [1]–[5]. In [1]–[4], poly-phase decomposition is mainly employed, where the tiny parallel FIR filter structures are acquired first and then the bulkier structures are constructed by cascading or iterating small-sized parallel FIR filtering blocks. Fast FIR algorithms (FFAs) initiated in [1]-[3] shows that it can furnish a L-parallel filter using roughly (2L-1) sub-filter blocks each of about length N/L. FFA architectures strongly crack the restraint that the hardware utilization cost of a parallel FIR filter has a continuous development along with the block size L. It can reduce the required number of multipliers to (2N-N/L) from LXN. In [5], the fast linear convolutions are utilized to develop the tiny filter structures and then a long convolution is broken down into several short convolutions, i.e., greater block-sized filter structures are build through iterations of the tiny filtering structures. Nonetheless in both categories when it reaches to symmetric convolutions the similarity of coefficients is not taken into consideration for the design of structures which can lead to a momentous preserving in hardware cost. In this paper, a new parallel FIR filter structures based on FFA subsisting of auspicious poly-phase decompositions, which reduces the amount of multiplications in the sub-filter section by exploiting the ingrained nature of the symmetric coefficients contradicted to the current FFA fast parallel FIR filter structure are provided. This paper is organized as follows. Detailed introductions of FFA structures are given in Section II. In Section III, the prospective parallel FIR filter structures are presented. Section V gives the conclusion.



IJRE - International Journal of Research in Electronics

# 2. FAST FIR ALGORITHM (FFA)

Examine an N-tap FIR filter expressed in the general form as

$$Y(n) = \sum_{i=0}^{N-1} h(i) x(n-i), n=0, 1, 2, 3... \infty$$
(1)  
where {x (n)} is an boundless length input sequence and {h  
(i)} is the length-N FIR filter coefficient. Then the  
conventional L-parallel FIR filter can be imitative using  
poly-phase decomposition as [3]

$$\sum_{P=0}^{L-1} Yp(zl)z^{\wedge} - p = \sum_{P=0}^{L-1} Yp(zl)z^{\wedge} - q$$
 (2)  
for p, q, r = 0,1,2....L-1. From this FIR filtering equation, it shows that the conventional FIR filter will require L- FIR sub-filter blocks of length N/L for implementation.

A. 
$$2X2 \ FFA \ (L=2)$$

According to (2) a two-parallel FIR filter structure can be revealed as

 $Y_0+Z^{-1}Y_1=(H_0+Z^{-1}H_1)(X_0+z^{-1}X_1)$ 

$$=H_0X_0+z^{-1}(H_0X_1+H_1X_0)+z^{-2}H_1X_1$$
(3)  
Implying that

 $Y_0 = H_0 X_0 + z^{-2} H_1 X_1 Y_1 = H_0 X_1 + H_1 X_0$ (4)Equation (4) shows the conventional two-parallel filter structure which will lack 4 length-N/2 FIR sub filter blocks two pre processing adders, and totally 2N multipliers and



Fig.1. Two parallel FIR filter implementation using FFA



Fig.2.Three parallel FIR filter implementation using FFA However, (4) can be written  $Y_0 = H_0 X_0 + Z^{-2} H_1 X_1$ 

 $Y_1 = (H_0 + H_1) (X_0 + X_1) - H_0 X_0 - H_1 X_1$ 

The implementation of (5) lacks three FIR sub filter blocks of dimension N/2, one pre-processing and three post processing adders and 3N/2 multipliers and 3(N/2-1)+4 adders which scales down approximately 1/4 over the traditional two-parallel filter hardware cost from (4). The two-parallel (L==2) FIR filter utilization using FFA obtained from (5) is shown in Fig. 1.

## B. 3X3 FFA (L=3)

By the related access, a three-parallel FIR filter using FFA can be asserted as

**Research script | IJRE** Volume: 04 Issue: 01 January 2017

$$Y_{0}=H_{0}X_{0}-z^{-3}H_{2}X_{2}+z^{-3}X [(H_{1}+H_{2}) (X_{1}+X_{2})-H_{1}X_{1}]$$

$$Y_{1}=[(H_{0}+H_{1}) (X_{0}+X_{1})-H_{1}X_{1}]-(H_{0}X_{0}-z^{-3}H_{2}X_{2})]$$

$$Y_{2}=[(H_{0}+H_{1}+H_{2}) (X_{0}+X_{1}+X_{2})]$$

$$-[(H_{0}+H_{1}) (X_{0}+X_{1})-H_{1}X_{1}]$$

$$-[(H_{1}+H_{2}) (X_{1}+X_{2})-H_{1}X_{1}]$$
(6)
The herdware utilization of (6) holes on longth N(2, E)

The hardware utilization of (6) lacks six length-N/3 FIR sub-filter blocks, 3- pre processing and 7- post processing adders, and three N multipliers and 2N+4adders which reduced approximately  $1/3^{rd}$  over the conventional threeparallel filter implementation cost. The exertion structure obtained from (6) is shown in Fig. 2.

# 3. PROPOUNDED FFA STRUCTURES FOR SYMMETRIC CONVOLUTIONS

To exploit the similarity of coefficients the main idea behind the proposed structures are actually pretty perceptive, manipulating the poly-phase decomposition to gain as many sub-filter blocks as possible containing symmetric coefficients so that limited statistic of multiplications in the single sub-filter block can be reiterated for the multiplications of whole taps which is similar to the case that agreed set of symmetric coefficients require exclusively half the filter length of would multiplications in a single FIR filter. Accordingly for an Ntap L-parallel FIR filter the total amount of conserved multipliers would be the number of sub-filter blocks that encompass symmetric coefficients times partly the statistic of multiplications in a single sub-filter block (N/2L).

## A. 2X2 Proposed FFA structure (L = 2)

From (4), a two-parallel FIR filter structure is also written as



 $Y_{0} = \{ \frac{1}{2} [(H_{0} + H_{1})(X_{0} + X_{1}) + (H_{0} - H_{1})(X_{0} - X_{1})] - H_{1}X_{1} \} + z^{2}H_{1}X_{1},$  $Y_1 = 1/2[(H_0 + H_1)(X_0 + X_1) - (H_0 - H_1)(X_0 - X_1)]$ (7)If we consider a set of even symmetric coefficients, (7) one more sub-filter block containing symmetric coefficients than (5) the actual FFA parallel FIR filter is obtained. Fig. 3 speaks for the implementation of the proposed 2-parallel FIR filter based on (7).

Example 1: Consider a 24-tap FIR filter with a set of symmetric coefficients applied to the proposed 2-parallel FIR filter

 $\{h(0), h(1), h(2), h(3), h(4), h(5), h(6), h(6$ 

where

(5)

h(0)=h(23),h(1)=h(22),h(2)=h(21)...h(11)=h(12),applying it to the proposed two-parallel FIR filter structure, and the above two sub filter blocks will be as

 $H_0 \pm H_1 = \{h (0) \pm h (1), h (2) \pm h (3) h (4) \pm h(5), h(6) \pm h(6) \}$  $h(7)...h(8) \pm h(9),$ 

 $h(20) \pm h(21), h(22) \pm h(23)$ 

Where

 $h(0) \pm h(1) = \pm(h(22) \pm h(23))$  $h(2) \pm h(3) = \pm (h(20) \pm h(21))$ 

# © Researchscript.com



$$h(4) \pm h(5) = \pm(h(18) \pm h(19))$$

$$h(6) \pm h(7) = \pm(h(16) \pm h(17))$$

As observed from the example raised, two of three subfilter blocks from the recommended two-parallel FIR filter structure,  $H_0$ - $H_1$  and  $H_0$ = $H_1$  prevail with symmetric coefficients now as (8) which aids that the sub-filter block has been accomplished by Fig. 4, with only half the number of multipliers required. Each of the output of multipliers responds to two taps. Note that the reordered direct-form FIR filter is hired. Correlated to the existing FFA twoparallel FIR filter structure the newly said FFA structure leads to one more sub filter block containing symmetric coefficients. However, the amount of adders in preprocessing and post processing blocks are more in number. In this caisson, two additional adders are required for L=2.



Fig. 4. Sub-filter block implementation with symmetric coefficients



Fig.5. Suggested 3-parallel FIR filter implementation



Fig.6.Comparison of sub-filter blocks between actual FFA and the suggested FFA three parallel FIR structure.



Fig.7.Proposed four parallel FIR filter implementation

# B. 3x3 Proposed FFA (L = 3)

With the identical access from (6), a 3-parallel FIR filter can also be drafted as (9). Fig.5 shows application of the suggested three-parallel FIR filter. When the number of symmetric coefficients N is the multiple of 3, the recommended three-parallel FIR filter structure conferred in (9) empowers four sub-filter blocks with symmetric coefficients in total, whereas the current FFA parallel FIR filter structure has only two ones out of six sub-filter blocks. A comparison figure is shown in Fig. 6, where the obscurity blocks stands for the sub-filter blocks which contain

$$\begin{split} Y_0 &= 1/2 [(H_0 + H_1)(X_0 + X_1) + (H_0 - H_1)(X_0 - X_1)] - H_1 X_1 \\ &+ z^{-3} \{ (H_0 + H_1 + H_2)(X_0 + X_1 + X_2) - (H_0 + H_2)(X_0 + X_2) \\ &- 1/2 [(H_0 + H_1)(X_0 + X_1) - (H_0 - H_1)(X_0 - X_1)] - H_1 X_1 \} \\ Y_1 &= 1/2 [(H_0 + H_1)(X_0 + X_1) - (H_0 - H_1)(X_0 - X_1)] \\ &- z^{-3} \{ 1/2 [(H_0 + H_2)(X_0 + X_2) + (H_0 - H_2)(X_0 - X_2)] \end{split}$$

$$-1/2[(H_0+H_1)(X_0+X_1)+(H_0-H_1)(X_0-X_1)]+H_1X_1\}$$

### C. Suggested Descending FFA

The suggested descending process for the bulkier recommended parallel FIR filter is similar to that popularized in [1]. However, a small alteration is endorsed here for lower hardware consumption. As we can see that the proposed parallel FIR structure enables the rephrase of multipliers in sectors of the sub-filter blocks but it also imports more adder cost in pre processing and post processing blocks. When descending the proposed FFA parallel FIR structures for greater parallel block factor L, the increase of adders can grow into larger. Therefore other than applying the suggested FFA FIR filter structure to all the decomposed sub-filter blocks, the actual FFA structures which have more compact operations in pre-processing and post processing blocks were employed for those sub-filter blocks that accommodate no symmetric coefficients whereas the proposed FIR filter structures are still applied to the rest of sub filter blocks with symmetric coefficients. An interpretation of the recommended cascading process for a four-parallel FIR filter (L=4) as an example is shown in Fig. 7, From Fig. 7 it is clarion to see that the proposed four-parallel FIR structure earns three more subfilter blocks containing symmetric coefficients than the current FFA one which means 3N/8 multipliers can be saved for an N-tap FIR filter at the liability of 11 additional adders in preprocessing and post processing blocks. By this descending accession parallel FIR filter structures with greater block factor L can be realized. The proposed six-parallel FIR filter will result in 6 more symmetric subfilter blocks, equivalently N/2 multipliers saved for an N-tap FIR filter than the current FFA at the expense of an additional 32 adders. Further the proposed eight-parallel FIR filter will lead to seven more symmetric subfilter blocks, equivalently 7N/16 multipliers saved for an N-tap filter, than the existing FFA, with the aloft of additional 54 adders. When

## © Researchscript.com



IJRE - International Journal of Research in Electronics

an L-parallel FIR filter comes with a set of symmetric coefficients of length N, the number of mandatory multipliers for the proposed parallel FIR filter structures is provided by (10) and (11)

Case 1:

When 
$$N/\prod_{i=1}^{r} L_i - 1$$
 is even  
 $M = N/\prod_{i=1}^{r} L_i - 1 (\prod_{i=1}^{r} M_i - S/2)$  (10)  
Case 2:

When  $N/\prod_{i=1}^{r} L_i$ -1 is odd

 $M = N / (\prod_{i=1}^{r} L_i - 1) (\prod_{i=1}^{r} M_i - S/2) (N / \prod_{i=1}^{r} L_i - 1)$ (11)

 $L_i$  is the small parallel block size such as (2x2) or (3x3) FFA. r is the number of FFAs used.  $M_i$  the number of sub filter blocks resulted from i-th FFA. S is the number of sub filter blocks containing symmetric coefficients. The number of the demanded adders in sub filter section can be accustomed by

$$A_{sub} = \prod_{i=1}^{r} M_i \left[ (N / \prod_{i=1}^{r} L_i) - 1 \right]$$
(12)

## 4. EXPERIMENTAL RESULTS

The existing FFA structures and the proposed FFA structures have been implemented in Verilog with word length 16-bit and filter length of 24. Carry Save, Carry Select, Binary to Excess-1 is used to implement the sub-filter blocks. Parallel FIR Filter Structure simulation result is shown in Fig.8.Detailed comparison of area, LUT's, Power, Delay, Maximum Frequency are shown in Table I, Table II, Table III, Table IV, Table V.



Fig.8. Parallel FIR Filter Simulation Result

TABLE.1. COMPARISON OF AREA

|                  |                 | Area                   |                          |                                            |                                   |
|------------------|-----------------|------------------------|--------------------------|--------------------------------------------|-----------------------------------|
| Length           | Structure       | Carry<br>Save<br>Adder | Carry<br>Select<br>Adder | Square<br>root<br>carry<br>select<br>adder | Binary<br>to<br>excess<br>1 adder |
| 24 -tap<br>(L=2) | Existing<br>FFA | 42417                  | 35502                    | 33369                                      | 31881                             |
| 24 -tap<br>(L=2) | Proposed<br>FFA | 24437                  | 28853                    | 26282                                      | 26264                             |
| 24 -tap<br>(L=2) | Existing<br>FFA | 78587                  | 77333                    | 74791                                      | 78407                             |
| 24 -tap<br>(L=2) | Proposed<br>FFA | 49782                  | 50494                    | 49164                                      | 49080                             |

Research script | IJRE Volume: 04 Issue: 01 January 2017 TABLE.2. COMPARISON OF LUT'S

|                  |                 | LUT's                  |                          |                                            |                                |  |
|------------------|-----------------|------------------------|--------------------------|--------------------------------------------|--------------------------------|--|
| Length           | Structure       | Carry<br>Save<br>Adder | Carry<br>Select<br>Adder | Square<br>root<br>carry<br>select<br>adder | Binary to<br>excess 1<br>adder |  |
| 24 -tap<br>(L=2) | Existing<br>FFA | 5413                   | 4375                     | 4163                                       | 3904                           |  |
| 24 -tap<br>(L=2) | Proposed<br>FFA | 2941                   | 3673                     | 3258                                       | 3256                           |  |
| 24 -tap<br>(L=2) | Existing<br>FFA | 9454                   | 9088                     | 8623                                       | 9353                           |  |
| 24 -tap<br>(L=2) | Proposed<br>FFA | 6146                   | 6028                     | 5827                                       | 5820                           |  |

TABLE.3. COMPARISON OF POWER

| Length           | Structure       | Power (mW)             |                          |                                            |                                |
|------------------|-----------------|------------------------|--------------------------|--------------------------------------------|--------------------------------|
|                  |                 | Carry<br>Save<br>Adder | Carry<br>Select<br>Adder | Square<br>root<br>carry<br>select<br>adder | Binary<br>to excess<br>1 adder |
| 24 -tap          | Existing        |                        |                          |                                            |                                |
| (L=2)            | FFA             | 3250                   | 2869                     | 2608                                       | 2402                           |
| 24 -tap<br>(L=2) | Proposed<br>FFA | 2356                   | 2440                     | 2421                                       | 2399                           |

TABLE.4. COMPARISON OF DELAY

|                     |                 | Delay(ns)              |                          |                                            |                                |
|---------------------|-----------------|------------------------|--------------------------|--------------------------------------------|--------------------------------|
| Length              | Structure       | Carry<br>Save<br>Adder | Carry<br>Select<br>Adder | Square<br>root<br>carry<br>select<br>adder | Binary to<br>excess 1<br>adder |
| 24 -tap             | Existing        |                        |                          |                                            |                                |
| (L=2)               | FFA             | 34.849                 | 34.849                   | 34.849                                     | 34.849                         |
| 24 -tap             | Proposed        |                        |                          |                                            |                                |
| (L=2)               | FFA             | 37.109                 | 33.905                   | 36.080                                     | 36.190                         |
| 24 -tap             | Existing        |                        |                          |                                            |                                |
| (L=2)               | FFA             | 34.849                 | 34.849                   | 34.849                                     | 34.849                         |
| 24 - tap<br>(I = 2) | Proposed<br>FFA | 37.584                 | 37,135                   | 38 486                                     | 38 659                         |

TABLE.5. COMPARISON OF FREQUENCY (MHZ)

|                  |                 | Μ                      | Maximum Frequency(MHz)   |                                            |                                |  |
|------------------|-----------------|------------------------|--------------------------|--------------------------------------------|--------------------------------|--|
| Length           | Structure       | Carry<br>Save<br>Adder | Carry<br>Select<br>Adder | Square<br>root<br>carry<br>select<br>adder | Binary to<br>excess 1<br>adder |  |
| 24 -tap<br>(L=2) | Existing<br>FFA | 79.73                  | 129.33<br>6              | 137.98                                     | 120.525                        |  |
| 24 -tap<br>(L=2) | Proposed<br>FFA | 59.96                  | 137.91<br>2              | 125.188                                    | 137.912                        |  |

# 5. CONCLUSION

In this paper, we have conferred advanced parallel FIR filter structures which are profitable to symmetric convolutions when the sum of taps is the collective of 2 or 3. Multipliers are the dominant excerpt in hardware expenditure for the parallel FIR filter implementation. The

### © Researchscript.com



suggested new structure accomplishes the essence of even symmetric coefficients and save a momentous amount of multipliers at the obligation of additional adders. Since multipliers surpass adders in hardware cost, it is lucrative to exchange multipliers with adders. Further the number of increased adders stops over still when the length of FIR filter becomes large whereas the number of diminished multipliers boosts along with the length of FIR filter. As a consequence the larger the length of FIR filters is the more the proposed structures can save from the actual FFA structures with respect to the hardware cost. Comprehensively in this paper we have contributed a new parallel FIR structures consisting of auspicious poly-phase decompositions handling with symmetric convolutions analogously better than the existing FFA structures in terms of hardware utilization.

#### REFERENCES

- D. A. Parker and K. K. Parhi, "Low-area/power parallel FIR digital filter implementations," *J. VLSI Signal Process. Syst.*, vol. 17, no. 1, pp. 75–92, 1997.
- [2] J. G. Chung and K. K. Parhi, "Frequency-spectrum-based lowarea low-power parallel FIR filter design," *EURASIP J. Appl. Signal Process.*, vol. 2002, no. 9, pp. 444–453, 2002.
- [3] K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999.
- [4] Z.-J. Mou and P. Duhamel, "Short-length FIR filters and their use in fast nonrecursive filtering," *IEEE Trans. Signal Process.*, vol. 39, no.6, pp. 1322–1332, Jun. 1991.
- [5] J. I. Acha, "Computational structures for fast implementation of L-path and L-block digital filters," *IEEE Trans. Circuit Syst.*, vol. 36, no. 6, pp.805–812, Jun. 1989.