This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Synthesis and Array Processor Realization of a 2-D IIR Beam Filter for Wireless Applications Rimesh M. Joshi, Student Member, IEEE, Arjuna Madanayake, Member, IEEE, Jithra Adikari, Member, IEEE, and Len T. Bruton, Fellow, IEEE Abstract—A broadband digital beamforming algorithm is proposed for directional ? tering of temporally-broadband bandpass space-time plane-waves at radio frequencies (RFs). The enhancement of desired waves, as well as rejection of undesired interfering plane-waves, is simulated. A systolic- and wavefront-array architecture is proposed for the real-time implementation of second-order spatially-bandpass (SBP) 2-D in? nite impulse response (IIR) beam ? lters having potential applications in broadband beamforming of temporally down-converted RF signals.

The higher speed of operation and potentially reduced power consumption of the asynchronous architecture of wavefront-array processors (WAPs) in comparison to the conventional synchronous hardware has emerging applications in radio-astronomy, radar, navigation, space science, cognitive radio, and wireless communications. Further, the bit error rate (BER) performance improvement along with the reduced computational complexity of the 2-D IIR SBP frequency-planar digital ? lter over digital phased array feed (PAF) beamformer is provided.

A nominal BER versus signal-to-interference ratio (SIR) gain of 10–16 dB compared to case where beamforming is not applied, and a gain of 2–3 dB at approximately half the number of parallel multipliers to digital PAF, are observed. The results of application-speci? c integrated circuit (ASIC) synthesis of the digital ? lter designs are also presented. Index Terms—Array processors, bit error rate (BER), digital phased array feed (PAF), ? eld-programmable gate array (FPGA), multidimensional digital ? ters, spatial modulation, systolic, wavefront, wireless. I. INTRODUCTION LTRA-WIDEBAND (UWB) wireless communications [1]–[4], cognitive radio [5]–[8], cooperative wireless sensor networks [9], [10] require highly directional and electronically steerable smart antenna arrays capable of broadband plane-wave (PW) ? ltering at RFs to improve the bit-error rate (BER) caused due to interference from multiple users U Manuscript received May 25, 2011; revised September 05, 2011; accepted October 13, 2011. R. M. Joshi and A.

Madanayake are with the Department of Electrical and Computer Engineering, University of Akron, Akron, OH 44325-3904 USA (e-mail: [email protected] edu; [email protected] edu). J. Adikari is with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: jithra. [email protected] ca). L. T. Bruton is with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada (e-mail: [email protected] ca). Color versions of one or more of the ? gures in this paper are available online at http://ieeexplore. eee. org. Digital Object Identi? er 10. 1109/TVLSI. 2011. 2174167 and multipath fading. These antenna arrays typically employ beamforming using analog delay-and-sum networks, fractional delay based delay-and-sum digital networks [1], digital phased array feeds (PAFs) [11]–[13] and multi-dimensional ? nite-impule response/in? nte impulse response (FIR/IIR) digital ? lters [5], [14]. Digital signal processing (DSP)-based broadband smart antenna arrays have potential applications in UWB wireless communications [1], [2], [15], cognitive radio [5]–[8], software-de? ed radio [16], microwave imaging [17], space science and radio astronomy [18]–[21], remote-sensing and navigation [22], [23]. The systolic-array and scanned-array implementation of 2-D and 3-D IIR broadband frequency-planar ? lters for digital beamforming have been proposed in [25]–[27]. These ? lters are highly suitable for high-speed ? ltering of broadband ST PWs based on their direction of arrivals (DOAs). For example, a 2-D IIR beam ? lter has recently been practically veri? ed for balanced antipodal Viraldi antennas (BAVAs) [3], [28] using non-real time software algorithms.

We propose a second-order 2-D IIR digital ? lter for the directional enhancement of temporally-broadband bandpass ST PWs [5], [29] (see Fig. 1). It is shown that the ? lter operates at an intermediate frequency (IF) leading to lower-speed VLSI circuits. We show that the proposed ? lters have lower computational complexity compared to the conventional delay-and-sum beamformers (approximately 70% less number of multipliers for similar performance [30], [31]) and are also of lower circuit complexity compared to 2-D FIR beamformers such as fan and trapezoidal ? lters [29], [32]–[34].

The lower computational complexity, closed-form design approach, broadband performance, electronic steerability, and availability of rapidly recon? gurable programming logic realizations make these emerging 2-D ST digital ? lters attractive and promising for cognitive radio applications [5]–[8]. A massively-parallel systolic-array and wavefront-array architectures are proposed for the real-time VLSI implementations of the proposed digital ? lter. Systolic-array processors are well-known for the implementation of real-time high throughput beamforming algorithms [35]–[37].

These processors have arrays of identical processors which are highly modular, regular and highly interconnected, making them suitable for VLSI realizations for high speed, especially RF, applications [38]. This paper presents the BER performance of a second-order 2-D IIR spatially-bandpass (SBP) beam ? lter [39] and compares with that of a digital PAF beamformer for a 16-element uniform linear array (ULA). The asynchronous implementations 1063-8210/$26. 00 © 2011 IEEE This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS which seek to address the design complexities, power consumption and timing issues affecting modern digital circuits [40], is also explored here by extending the direct-form-I realization of a second-order 2-D IIR SBP beam ? lter using novel clock-free asynchronous quasi delay insensitive (a-QDI) logic devices [40]. The removal of the global clock in the design leads to reduction in design complexity, lower chip area, lower power consumption, and increased speed of operation compared to the synchronous implementations [40].

The wavefront-array [41]–[43] a-QDI architecture of the second-order IIR ? lter is implemented on a Speedster SPD60 asynchronous ? eld-programmable gate array (FPGA) from Achronix Semiconductor [44], which uses so-called picoPIPE acceleration technology [44], [45] to deliver an improved speed performance. Consider a time varying signal propagating in the far ? eld in the 3-D space . This signal can be approximated by a 4-D ST PW signal [46] which is of the form given by (1) where is the unit vector specifying the DOA in the 3-D space , is the 1-D temporal intensity function in the DOA and ms is the speed of propagation.

When a 4-D continuous ST PW signal is received by a ULA of sensors, spaced apart, it reduces to the 2-D ST PW signal with spatial DOA de? ned by the angle between the normal to the -axis and the normal to the 2-D wavefront as shown in Fig. 1, where and . The time-synchronously sampled 2-D ST PW, sampled every seconds using an array of analog-to-digital converters (ADCs) clocked at Hz, represented by [47] is given by (2) where and in the continuous domain, and represent desired ST PW and undesired interferences from multi-users with additive noise.

Let us consider Gaussian modulated cosine (GMC) signals given by (3) where is the carrier frequency and is a constant which is chosen such that the signal has the required bandwidth (double-sided bandwidth is ). Therefore, the desired 2-D ST PW signal is given by II. 2-D IIR SBP BEAM FILTER The second-order 2-D IIR SBP digital beam ? lter [39] has been proposed for implementation in RF smart antenna applications for ? ltering temporally-broadband bandpass signals obtained from a down-converted (or bandpass sampled [48]) array of antennas. These 2-D IIR ? ters have possible new applications in wireless communication base-stations [49] due to their high directional selectivity and temporal broadband nature, as well as being fully steerable and free from fractional delays. The digital implementation of the second-order 2-D IIR SBP frequency-planar ? lter is shown in Fig. 1. The broadband PWs received by the linear array of sensors are low-noise ampli? ed (LNAd), bandpass ? ltered (BPFd), synchronously down-converted to baseband and low pass ? ltered (LPFd), uniformly and synchronously time-sampled and amplitude-quantized and then ? ally digitally processed using the 2-D IIR SBP beam ? lter. Fig. 1. Second-order 2-D IIR spatially-bandpass digital beam ? lter for asynchronous FPGA implementation [24] showing ULA of BAVAs [3] and SC block at each antenna (consisting of LNA, local oscillator for down-conversion, LPF for image rejection and ADC). The outputs from the SC blocks are fed to an asynchronous FPGA implementing an array of PPCMs (described in detail in Figs. 8 and 10) for ? ltering a plane-wave having desired DOA , . Here, the uncertainty in the DOA is indicated by angle . (5) 4) is the DOA of the desired PW. where Similarly, the undesired signals at the communication receivers often contain noise and interference from other signals. Let us denote be the number of interfering signals having DOAs , . Therefore, the undesired signal can be expressed as This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. JOSHI et al. : SYNTHESIS AND ARRAY PROCESSOR REALIZATION OF A 2-D IIR BEAM FILTER 3 C. Second-Order 2-D IIR SBP Beam Filter The 2-D IIR SBP beam ? ter is an extension of a 2-D IIR broadband frequency-planar ? lter [25] for carrier modulated signals, capable of selectively enhancing broadband PWs depending on their DOAs. The transfer function (TF) of a 2-D IIR frequency-planar beam ? lter [25] based on a ? rst-order resistively terminated passive prototype network which are practical bounded-input-bounded-output (P-BIBO) stable [50], [51] is given by (7) where (8) for ; , and sets the selectivity of the ? lter. The 2-D frequency response of the IIR beam ? lter (7) has a ROS along a line centered on the origin of 2-D frequency axis.

The IF broadband beamforming applications, on the other hand require beam shaped 2-D passbands centered on a particular spatial frequency other than the 2-D frequency origin (0,0) (as in the case here for GMC PW signal). The TF of the second-order 2-D IIR SBP ? lter [39] in partially separable form, described later in (10), can therefore be obtained by applying spatial modulation to the impulse response of (7) multiplying the impulse response by , where is the desired spatial shift from the center of the 2-D frequency spectrum [14], [39]. Let be the impulse response of the ? ter (7). Applying spatial modulation on the impulse response, we get the desired impulse response of the spatially modulated 2-D IIR ? lter (9) be the TF of the Let desired ? lter. Using the linear system of modulated -transforms, for [48], where and , after simpli? cation, we obtain the TF of the second-order 2-D IIR SBP frequency-planar beam ? lter [39] as (10) for , and where Fig. 2. (a) 2-D frequency-domain ROS of the broadband bandpass plane-wave and (b) the ROS of the plane-wave following down-conversion and down sam(normalized to pling, where spatio-temporal DOA ).

Alternatively, we could employ bandpass sampling at the LNA output using a BPF, thereby removing the need for down-conversion [48]. A. ROS of the Space-Time Plane-Wave The region of support (ROS) of the 2-D frequency response of the ST PW lie along a line oriented at an angle , passing though the center of the 2-D space-time frequency , where and are spatial and temporal frequencies, respectively [50]. The angle is referred to as the spatio-temporal (ST) DOA of the PW. If there is certain uncertainty in the spatial DOA of the PW, say , then the 2-D ROS of the ST PW occupies a trapezoidal region as shown in Fig. , where is the variation in spatio-temporal DOA due to the uncertainty in the spatial DOA . The temporal downsampling by a factor of causes the ST DOA to change to an angle given by [29] (6) The trapezoidal ROS [29] of the broadband ST PW following down-conversion and down-sampling is shown in Fig. 2, where [39] is the spatial shift in the frequency of the PW (4). B. Shape of the 2-D Filter Passband A double-trapezoidal 2-D FIR ? lter has been proposed in [5], [29] for ? ltering temporally-broadband bandpass PWs (shown in Fig. 2). These FIR ? ters achieve high performance directional enhancement of band-limited signals but are of very high order (typically 32 [5]), leading to a higher circuit complexity compared to the proposed IIR counterparts. The 2-D IIR ST digital beam ? lter encompasses the desired trapezoidal passband of the PW centered on as shown in Fig. 2(b), while achieving a much lower computational complexity compared to the FIR ? lters [5]. (11) This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 3. Magnitude frequency response of second-order 2-D IIR SBP beam ? lter , , and . Here, is given by (6). for Fig. 4. 2-D magnitude frequency response of input signal containing desired signal having spatial DOA 30 and two other interfering waves having DOAs 10 and 60 . Here, 2-D discrete input and 2-D discrete output , where , . Therefore, the complete TF of the second-order 2-D IIR SBP ? lter is given by The ? lter coef? cients are given by and (12) as computed in [14], [39] Fig. 5. 2-D magnitude frequency response of the ? ltered output signal showing the directional enhancement of the desired PW.

Two other interfering PWs identical to the desired PW, except for their spatial DOAs and , are considered for the veri? cation of the rejection of the undesired PWs. Therefore, the signal received (2) by the ULA (considering a noise-free case for simplicity) is given by (14) (13) where we express the ? lter coef? cients in terms of required statio-temporal DOA (after down-sampling) and beam-width (selectivity). Here, is the desired spatial frequency shift from the origin of the 2-D frequency spectrum. The magnitude frequency response of the second-order 2-D IIR SBP beam ? lter [39] as given by the TF (12) is shown in Fig. , which encompasses the trapezoidal ROS of the desired down-converted down-sampled (DCDS) PW shown in Fig. 2. III. VERIFICATION: AN EXAMPLE OF BROADBAND INTERFERENCE REJECTION Let us consider a element ULA and a partiallybroadband GMC PW signal (3) with amplitude at a carrier frequency of having spatial DOA , the single-sided bandwidth 250 MHz (double-sided bandwidth 500 MHz) and sampling frequency 2. 5 GHz. The chosen PW signal has a fractional bandwidth (FBW) of (50%). where and is the sample delay. The received signal (14) is down-converted using oscillators at 1 GHz, low-pass ? tered for image rejection and down-sampled before feeding the signal to the ? lter. Here, a down-sampling factor of is used which reduces the required clock frequency to 500 MHz , expanding the baseband spectrum of the input signal by 5 times as shown in Fig. 4. The 2-D magnitude frequency response of the resulting output signal from the second-order 2-D IIR SBP ? lter is shown in Fig. 5 which shows the directional enhancement of the PW having DOA , while suppressing PWs with other DOAs. The temporal cross-correlation of the input and output signals with a reference Gaussian wave representing the ideal desired signal is shown in Fig. , which demonstrates the directional This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. JOSHI et al. : SYNTHESIS AND ARRAY PROCESSOR REALIZATION OF A 2-D IIR BEAM FILTER 5 Fig. 6. Temporal cross-correlation of the input signal (dashed line) and the ? ltered output signal (solid line) with a reference Gaussian pulse showing the attenuation of undesired signals. enhancement capability of the 2-D IIR digital beamformer. Observe that PWs with DOA 10 and 60 have been attenuated by 40 and 38. dB respectively, enhancing the PW with desired DOA. This veri? es the ideal performance of the 2-D IIR SBP beam ? lter. IV. REVIEW OF DIGITAL PHASED ARRAY FEED BEAMFORMER The delay-and-sum beamformer is based on the concept that if we have a ULA with broadband antenna elements, then the output of sensor at is differing only by a time delay . Therefore, if the output of each antenna is delayed appropriately (with proper weight vector [31] applied) and summed together, the effective radiation pattern of the array is reinforced in the desired direction while suppressing the waves coming from other directions [31], [52].

The continuous-time output of the delay-and-sum beamformer is given by [30], [47], [52], [53] (15) where (16) and is the spatial DOA of the desired PW as shown in Fig. 7. The delay-and-sum beamformer can be implemented in both time and frequency domain [54]. In time domain, the beamformer works by performing time-based delay and sum operations, delaying the incoming signal from each array element by a certain fractional amount of time and then ? nally adding them together.

The time domain beamformer requires fractional delays, the digital implementation of which require accurate approximation of the fractional delays leading to high computational complexity of the digital fractional-delay based delayand-sum beamformer [14]. The temporal-frequency domain delay-and-sum beamformer, on the other hand, applies different complex phasor multipliers to each frequency bin of the 1-D frequency response of the signal from each sensors. The beamformer is steered to a speci? c direction by selecting appropriate phases for each sensor. The resulting Fig. 7.

Phased-array delay-sum beamformer implementation illustrating transformation of the data at each sensor into frequency domain along with the weighted combinations. array and beamformer is termed a PAF beamformer [11] since the output of each sensor is phase shifted prior to summation [31]. The time domain output of the digital delay-and-sum beamformer is given by (17) is the sampling period of the ADCs at each where sensor and is the spacing between the sensors satisfying the Nyquist criterion [25]. The 1-D discrete Fourier transform (DFT) of (17) is given by (18) , are where the bin frequencies for the -point fast Fourier transform (FFT).

The digital PAF implementation for ? ltering a partially-broadband GMC PW is shown in Fig. 7. The phase compensation required at RF (before down-conversion) for each FFT bins of the sensors for the beamformer is given by (19) The baseband frequency after down-conversion and downsampling becomes , where is the carrier frequency and is the down-sampling factor, while the phase remains the same as in RF (20) where ; . This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

The 1-D frequency response of the output of the digital PAF is therefore given by (21) Expressing the output frequency response in terms of frequency bins and complex multiplier coef? cients, we get (22) where (23) The time domain output of the digital PAF beamformer can be obtained by taking the inverse fast Fourier transform (IFFT) of (22). The frequency of operation of the -point digital PAF beamformer circuit has been reduced from sampling frequency to . The BER performance of the digital PAF beamformer for a GMC partially broadband PW and its computational complexity is compared with the proposed second-order 2-D IIR SBP beam ? ter in Section VII. V. HARDWARE ARCHITECTURE OF THE SECOND-ORDER 2-D IIR SBP BEAM FILTER The proposed ? lter having closed-form coef? cients (13) can be implemented in digital VLSI hardware employing the difference equation, which is obtained by inverse 2-D -transform of (12) under zero initial conditions (ZICs), and given by Fig. 8. Block diagram of a PPCM of the second-order 2-D IIR SBP beam ? lter is reused in multiple realizations of with the CP shown. The block marked PPCMs with different CPs. as shown in Fig. 9.

This leads to the systolic-array processor implementation [35], [36] in which the 2-D non-separable ? lter is implemented using an array of PPCMs and the separable 1-D component is implemented trivially using a delay buffer and adder at the output of . The normalized 2-D space-time difference equation for the implementation of is given by (25) Normalized to (24) The ? nal ? ltered 1-D output signal using the modi? ed directform-I implementation of the ? lter is obtained by feeding the output of the last PPCM in the array through a 1-D FIR ? lter having TF and is given by (26) VI.

SPEED OPTIMIZATION OF THE ARRAY PROCESSOR The feed-forward path of (24) can be pipelined by adding ? rst-input ? rst-output (FIFO) blocks in between combinational logic blocks in order to increase the speed of operation of the ? lter. Likewise, look-ahead (LA) optimization can be applied for pipelining the feedback paths [55]. A. Intra- and Inter-PPCM Pipelining The PPCMs can be pipelined in order to obtain critical paths (CPs) equal to that of a single multiply operation. The intraThe architectural block diagram for implementing (24) in direct-form-I implementation using a systolic interconnection (see Fig. ) of parallel processing core modules (PPCMs) is shown in Fig. 8. A cascaded interconnection of the PPCM blocks lead to the desired massively-parallel array processors for real-time RF throughputs. It should be noted that the number of PPCMs for the ? lter circuit should equal the number of sensors used in the implementation. Systolic-array implementations allow a throughput of one frame per clock cycle (OFPCC), unlike the throughput of one pixel per clock cycle (OPPCC) in scanned-array implementations [25], [27]. Concurrent architectures for 2-D digital IIR ? ters proposed in [38] utilize 1-D block processing for raster-scanned image processing, and are suitable for video processing applications. The modi? ed direct-form-I implementation of the ? lter TF (10) in partially-separable form, achieves lower computational complexity in terms of number of adders in the design (13 adders compared to 19 adders in direct-form-I implementation) This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. JOSHI et al. : SYNTHESIS AND ARRAY PROCESSOR REALIZATION OF A 2-D IIR BEAM FILTER 7

Let us de? ne 1-D -transforms Therefore, (12) can be expressed in the form (27) where is the temporal feedback loop shown in Fig. 8 Fig. 9. Block diagram of a PPCM in modi? ed direct-form-I implementation, is the temporal feedback loop as shown in Fig. 8. where PPCM pipelining consists of FIFO buffers in between the combinational logic blocks in the feed-forward signal paths , , , and . The pipelining latency for all the feed-forward paths is as shown in Fig. 8. The depth of pipelining is increased until the CP is determined by the temporal feedback loop in the PPCM.

The CP for the temporal feedback loop can also be reduced to a single multiply operation using deep intra-PPCM pipelining using LA discussed in Section VI-B. The intra-PPCM pipelining of latency is complemented by inter-PPCM pipelines, obtained by inserting FIFO buffer of length at the inputs and . Similarly, FIFO buffer of length at the inputs and length at the inputs , . The pipeline compensation FIFO buffers are also placed at the outputs of the PPCMs for aligning the output of the 2-D ? lter. B. Look-Ahead to 2-D Circuits LA is an optimization technique for clocked 1-D IIR digital ? ters in VLSI circuits [55]. In this paper, LA optimization techniques for higher order IIR ? lters such as clustered LA (CLA) and scattered LA (SLA) [25], [55], usually used for synchronous logic, is used for the novel application in asynchronous feedback loops. This helps reduce the critical path delay (CPD) by introducing extra pipelining stages in the feedback path, thereby increasing the speed of operation. The original systolic-array implementation of the 2-D IIR SBP beam ? lter is now converted to a fully asynchronous massively-parallel processor architecture, as shown in Fig. , where number of PPCMs are interconnected via FIFO pipelines, in order to realize the recursive computation of the difference equation. Such asynchronous parallel processors are known as WAPs [41]–[43]. The 1-D LA optimization technique for higher-order TF [55] has been extended here to the 2-D case despite the non-separability of the input-output TF given in (12). The wavefrontarray implementation of the 2-D IIR ? lter, using interconnected PPCMs, however allows 1-D LA optimization of non-separable, practical-BIBO stable multi-dimensional ? lters [25], [39]. The second-order TF in (27) has double zeros at and double poles at (since and ).

Here, as given by (8) ensuring the poles are within the temporal frequency unit circle , satisfying the 1-D digital stability criteria [48]. The PPCMs are identical to each other and have the same second-order TF which can be optimized for speed using 1-D LA for higher order IIR ? lters. 1) Clustered LA Optimization: CLA pipelining [55] is based on the addition of cancelling poles and zeros to the TF (27) such that the coef? cients of in the denominator of TF are zero for a -stage CLA pipelining. The output of (24) can then be written in terms of two past outputs and for a second-order ? ter, leading to a loop consisting of delay elements and a single multiplication operation. The CLA pipelining of certain order/delay could produce an unstable ? lter even if the original ? lter (without LA) was stable at the ? rst place. But, it has been shown that CLA produces a stable ? lter at some critical delay such that the stability is assured for [55]. So, if the desired CLA pipeline order does not produce a stable ? lter, it should be increased to an order until a stable ? lter is obtained [55]. a) CLA of Order 2 [24]: Multiplying both numerator and denominator of (27) by , we get (28) where ; .

The single-delay feedback path of the feedback loop implied by (12) which has a CP of multiply-then-add operation as shown in Fig. 2 has now been reduced to a single multiplier because of the additional delay it added to the feedback path, thereby reducing the CP from to [55]. In practice, . b) CLA of Order 3 and Higher [24]: The 3-stage CLA optimized TF can be obtained by multiplying both numerator This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS are poles of .

Here, . where , The -stage SLA optimized TF can be described by [55] (32) It can be shown from (32) that 2-stage SLA leads to (33) Similarly, the 3-stage SLA optimized is given by (34) VII. FILTER PERFORMANCE IN A WIRELESS COMMUNICATION SYSTEM The multipath and interference suppression capabilities of the 2-D IIR frequency-planar beam ? lter [25] in a multi-user environment has been described in [4] with potential applications in UWB communication systems. Here, the performance of the second-order 2-D IIR SBP beam ? lter [39] for varying levels of interferences was evaluated by conducting BER simulations involving the ? ter and the element ULA. The ability of the ? lter to reject the undesired multi-user interferences and improve the signal-to-interference ratio (SIR) of the desired signal was assessed and the result was compared with the delay-and-sum beamformer (implemented as a digital PAF in the frequency-domain) and the non-beamformer case. The result showed reduction in BER relative to both digital PAF beamformer (of similar computational complexity) and non-beamformer implying potential applications in wireless communication basestations. A series of Monte Carlo simulations of the 2-D IIR SBP frequency planar ? ter and the digital PAF beamformer were carried out for element sensor array. The test signal considered here was a partially-broadband GMC pulse (3). The composite signal received by the sensors constituted one desired PW with spatial DOA and four other interfering identical partially-broadband PWs with spatial DOAs to which are bi-phase modulated (BPSK modulation) by random streams of data bits [49]. The composite partially-broadband PW signal received by the sensor array was ? rst down-converted to intermediate baseband, then low-pass ? ltered for image rejection, down-sampled and then ? ally applied to the 2-D IIR SBP beam ? lter to get the desired signal at the output. The resulting DCDS signal from each sensor is given by [15], [49] (35) where is the total number of users, is the total number of symbols, is the number of samples per symbol and is the random data streams for modulating the input signals, . represents the Fig. 10. CLA optimized PPCM hardware architectures of 2-D IIR spatiallybandpass beam ? lter in direct-form-I realization of stages 2, 3, and 4 correis described in sponding to (28)–(30), respectively [24]. For brevity, block Fig. 8. and denominator of (27) by by and is given 29) and . where This leads to an ideal CP of [55]. Similarly, multiplying both numerator and denominator of (27) by , we get the LA optimization of order given by (30) and , which where allows four levels of pipelining within the loop, leading to an ideal CP of [55]. The resulting signal ? ow graphs (SFGs) of three 2-D IIR SBP ? lter hardware circuits after implementing CLA optimization of stage 2, 3, and 4, respectively are shown in Fig. 10 (stage 5 CLA not shown). 2) SLA Optimization: SLA pipelining [55] requires the denominator of the TF (27) to be transformed in a way that it contains two terms and .

The output of (24) can then be written in terms of two past outputs and for a second-order ? lter. SLA optimization always leads to stable realizations, provided that the original TF is BIBO stable. We can express from (27) as (31) This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. JOSHI et al. : SYNTHESIS AND ARRAY PROCESSOR REALIZATION OF A 2-D IIR BEAM FILTER 9 Fig. 11. BER curve for varying levels of SIR for different beamforming cases and . for Fig. 13. BER curve for varying levels of SIR for different beamforming cases and . or Fig. 12. BER curve for varying levels of SIR for different beamforming cases and . for desired signal, while and represents the undesired interference from other users. is the additive white Gaussian noise (AWGN), at a level 18 dB relative to the received signal, modeled for the effect of quantization noise at the ADCs. A. BER Simulation Example Let us consider a 16-element ULA and a partially-broadband GMC PW signal (3) at a carrier frequency of 1 GHz having desired spatial DOA with the singlesided bandwidth 500 MHz double-sided bandwidth 1 GHz (such that ).

Let us consider four other interfering identical partially-broadband PWs with spatial DOAs to as 20 , 50 , 65 , and 80 , respectively. Let us choose a sampling frequency of 3 GHz satisfying the Nyquist sampling frequency , a down-sampling factor of reducing the clock frequency to 1 GHz (which is equal to , allowing the implementation at a lower clock frequency), and samples per symbol having a bit rate of 50 Mbps. For the detection, we used a cross-correlation detector at the output of the beam ? lter which is at the spatial location , while the cross-correlation detector for the non-beamformer case was used at the ? st sensor location . To compare the performance of the second-order 2-D IIR SBP ? lter against the conventional phased array beamformer, the simulations for the same number of element (i. e. , 16 element) digital PAF beamformer (described in Section IV) for -point and 8-point FFT were carried out. The simulated BER versus SIR plot for PW having FBW of 100% with desired spatial DOA is shown in Fig. 11. It is observed from the ? gure that the gain due to second-order 2-D IIR SBP digital ? lter is approximately 17 dB compared to the non-beamforming case, while a gain of 4. and 4 dB compared to 4-point and 8-point FFT digial PAFs respectively for a BER of . The BER performance of the IIR ? lter for a PW of 100% FBW but with different desired spatial DOAs 50 and 65 is shown in Figs. 12 and 13, which shows almost similar performance of the 2-D IIR ? lter compared to 8-point FFT digital PAF, but a gain of 2 dB compared to the 4-point FFT digial PAF for a BER of . For the case of desired PW spatial DOA of 50 , the interferers were chosen to have the spatial DOA of 20 , 35 , 65 , and 80 . Likewise, the interfering PWs were chosen with spatial DOA of 20 , 35 , 50 , and 80 for a desired spatial DOA of 65 .

Similar sets of simulation for desired DOAs of 35 , 50 , and 65 were carried out for PWs having FBW of 50% ( 250 MHz, 1 GHz at ). The corresponding BER versus SIR plots for various cases of beamforming (for PWs This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS Fig. 14. BER curve for varying levels of SIR for different beamforming cases and . for Fig. 16. BER curve for varying levels of SIR for different beamforming cases and . or It is observed from Table I that as the desired DOA angle increases, the gain of the beamformer reduces because of the high steering angle of the beamformer which increases the warping effect of the 2-D IIR beam ? lter [25]. On the other hand, the BER performance of the beamformer is better for PWs having smaller fractional bandwidth. The BER for various practical combinations of ? nite internal word lengths and ADC precisions for 2-D IIR beam ? lters with TF (7) have been investigated in [56]. But here, for the proposed second-order 2-D IIR SBP beam ? ter, the internal precision level of the processors have been chosen large enough such that the effects on BER due to internal quantization noise is negligibly small. The effects of lower precision on BER of the proposed 2-D IIR SBP ? lter due to quantization effects is therefore a topic for future research work. B. Computational Complexity The computational complexity of a circuit directly depends on the number of adder and multiplier blocks it requires. The direct-form implementation of the second-order 2-D IIR SBP beam ? lter [39] requires 13 multipliers per PPCM as described in Section V, totaling 208 multipliers for a 16-element ULA.

The digital PAF beamformer as described in Section IV requires additional complex multipliers along with multipliers for a -point FFT [48], per each element. The conventional method of complex number multiplication requires four real multipliers and two real additions. Note that the Gauss complex multiplication algorithm [57] requires only three real multiplication and ? ve real additions. Therefore, a 4-point FFT digital PAF beamformer requires multipliers per element. The computational complexity in terms of the number of multipliers for the above-mentioned beamformers is shown in Table II, which shows that the second-order 2-D IIR SBP ? ter has a better BER performance compared to the 4-point FFT digital PAF with similar computational complexity. Whereas, Fig. 15. BER curve for varying levels of SIR for different beamforming cases and . for with ) are shown in Figs. 14–16. From Fig. 14, it is observed that the gain of the 2-D IIR ? lter is about 17 and 1 dB compared to non-beamformer and 4-point/8-point FFT digital PAF, respectively for a BER of . The BER performance for PW with desired DOA of 50 and as shown in Fig. 15 is similar to the case where and . Whereas, the performance of the 2-D IIR ? ter for PW with desired DOA of 65 at FBW of 50%, as shown in Fig. 16, is observed to be similar to both 4-point and 8-point digital PAFs. The gain of the beamformers for a bit error rate of , for various cases of fractional bandwidth and desired DOA angle described above is shown in Table I. The table clearly shows an improved BER performance due to the 2-D IIR SBP beam ? lter. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. JOSHI et al. : SYNTHESIS AND ARRAY PROCESSOR REALIZATION OF A 2-D IIR BEAM FILTER 11

BEAMFORMER GAIN FOR A BER OF FOR TABLE I DIFFERENT CASES OF DESIRED DOA AND FRACTIONAL BANDWIDTH TABLE II NUMBER OF MULTIPLIERS PER EACH SENSOR FOR DIFFERENT BEAMFORMERS the digital PAF beamformer implementing 8-point FFT has a much higher computational complexity compared to the 2-D IIR SBP beam ? lter, but with a slightly poor or a similar BER performance. A 2-D FIR trapezoidal ? lters [5] for ? ltering temporally-broadband bandpass signals have very high interference rejection ability but at a cost of larger number of multipliers in the design. The 2-D IIR SBP beam ? ter, on the other hand, still provides a signi? cant reduction in the interference signal with less number of multipliers and provides a better BER performance compared to digital PAFs of similar computational complexities. VIII. ASYNCHRONOUS LOGIC REALIZATION WITH ACHRONIX FPGA FPGAs from Achronix Semiconductor [44] which employ a-QDI logic [45] facilitate asynchronous implementation of conventional circuits. The removal of the global clock in a-QDI circuits result in reduced design complexity, lower power consumption and higher speed of operation [40].

The clock in a-QDI implementation using Achronix FPGA refers to the clock present at the synchronous input/output (I/O) frame (as shown in Fig. 1) which consists of synchronous to asynchronous converters at the input and asynchronous to synchronous converters at the output. Therefore, the design register transfer level (RTL) does not need to be targeted to picoPIPE technology and is the same for conventional synchronous implementations. It is the core fabric of the Achronix FPGA that performs the a-QDI implementation. The core contains large number of ? e-grain pico-pipeline stages called “picoPIPE” used for both logic and routing [44], [45], which lead to high-throughput architectures. Unlike in synchronous circuits where the global clock is used to sequence the computation and for synchronization, the sequencing and synchronization in asynchronous circuits are achieved using local handshake protocol between adjacent pipeline stages. Data are passed through the pipeline stages as messages called “data tokens”. A three wire channel (two data wires and one enable wire) is present in between the pipeline stages which consists of “wire 0”, “wire 1”, and “enable” [40].

The data tokens are encoded in the wires such that setting “wire 0” represents “logic 0” and setting “wire 1” represents “logic 1”, while resetting both wires represent “no-data” state. The third wire is an acknowledge signal used for the asynchronous handshake protocol. The handshake protocol employed by the a-QDI logic consists of the following four phases [40]: the sender sends the data by setting one of the data wires; the receiver latches the data and lowers the “enable” wire; the sender lowers all data wires; the receiver raises the “enable” wire when it is ready to accept new data.

The time required to complete one four-phase handshake is referred to as cycle time of a pipeline stage and the inverse of the cycle time represents the throughput which gives the rate at which tokens travel through the pipeline, provided the data? ow netlist is free from loops and reconvergent paths [58]. Since the pipeline stages can contain a “no-data” state along with conventional “logic 0” and “logic 1”, pipelining the data? ow path does not affect the functionality of the design, causing these a-QDI pipelines to be “slack elastic” [40], which is in contrast to the conventional synchronous designs.

When pipeline stages are inserted they are initialized to “no-data” state at global reset, whereas actual registers de? ned in RTL design are initialized to either “logic 1” or “logic 0” based on the description. The two constraints that determine the ? nal operation speed of a-QDI circuits are loops and reconvergent paths [58]. A loop is a feedback path in the data? ow. IIR ? lter circuits such as the proposed 2-D IIR SBP ? lter contains loops in the design. If loop is the critical path, as in our case, the frequency of operation is , where is the number of initialized tokens within the loop and is the entire loop delay.

Increasing causes increase in the speed of operation. A reconvergent path occurs in a fan-in node where one of the path has fewer pipeline stages (shorter path) than the other (longer path). In this situation, the data token that arrived earlier in the shorter path has to wait for the coherent data token in the longer path, making it a critical path. This can be eliminated by balancing the pipeline stages in the two paths adding extra delay to the shorter path. Achronix CAD environment (ACE) tool [58] allows elimination of reconvergent paths by adding a constraint to the place and route tool.

The RTL design of the second-order 2-D IIR SBP ? lter is sent through ACE tool ? ow for ? nal place and route on the Achronix SPD60 FPGA, which is described in Section IX-A. IX. VHDL IMPLEMENTATION, SIMULATION, VERIFICATION AND 1) 2) 3) 4) Five prototype designs (with CLA up to stage 5) of the second-order 2-D IIR SBP ? lter in direct-form-I realization, consisting of PPCMs have been implemented using VHDL. First, the difference equation of the ? lter for the different CLA optimized designs (see Section VI) were implemented using behavioral VHDL. The circuit employed 2’s complement ? ed-point binary arithmetic, with a precision of This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS TABLE III SPEEDS OF SECOND-ORDER 2-D IIR SBP FILTER IN DIRECT-FORM-I [24] and (where is the word length and is the position of binary point), chosen large enough to make quantization noise effects small [56]. The VHDL was then imported into Simulink model using a blackbox implementation provided by Xilinx system generator (XSG) design tool.

The complete design with ? ve of these blackbox (for ) with necessary ZICs was created in Simulink. Finally, a VHDL-mappable bit-true cycle accurate Simulink models of the designs were created using the XSG design tool, with all Xilinx speci? c optimizations turned off. The resulting VHDL code would therefore be generated using behavioral VHDL. These behavioral VHDL designs were passed through synchronous and asynchronous FPGA ? ows as well as to application-speci? c integrated circuit (ASIC) synthesis to get the timing and power analysis. A. FPGA Prototyping The ? ve prototype designs for CLA optimized a-QDI ? ter circuits, employing PPCMs have been tested on an SPC60 development board with a 65-nm CMOS Achronix Speedster SPD60 asynchronous FPGA. The VHDL design is passed through Mentor Graphics’ Precision Synthesis tool for RTL synthesis and then the resulting post-synthesis netlist is fed to ACE tool [58] for place and route on the asynchronous fabric. A wrapper RTL for communication with PC through a USB port is used to read the ? ltered output from the Achronix FPGA. To obtain a speed comparison of the asynchronous implementation on Achronix FPGA with conventional designs, the ? e designs are also synthesized and placed and routed using synchronous FPGA design tool for high capacity device of same 65-nm CMOS technology Xilinx Virtex-5 LX330FF1760-2 using the timing driven place and route algorithms (ensuring the CP to be the tightest loop via pipelining). The results for the synchronous and asynchronous implementations of the second-order 2-D IIR SBP beam ? lter employing 5 PPCMs are shown in Table III. We can observe a signi? cant improvement in speed with the asynchronous implementation using Achronix FPGAs, for LA optimized designs, which are as high as 31% for a CLA of order .

The correct operation of the 2-D IIR SBP ? lter was veri? ed for both synchronous and asynchronous implementation by exciting the inputs of the ? lter by a 2-D unit impulse function and measuring the impulse response from the on-chip realizations. A 2-D magnitude frequency response of the measured ? lter output within the 2-D Nyquist square , , 2, is shown in Fig. 17. For the veri? cation of the synchronous implementation of the ? lter, the design was realized in hardware using using a 90-nm CMOS Fig. 17. Magnitude frequency response of 2-D IIR SBP beam ? lter, for , , and , obtained from (a) closed form expressions with in? ite precision and (b) prototype FPGA implementation on the Xilinx PPCMs. The response apML402 board for word size of 16 bits, for proaches the ideal case as the value of and word size is increased. Xilinx Virtex-4 SX35FF668-10 synchronous FPGA, employing the Xilinx ML402 board which facilitates on-chip hardware co-simulation (HCS) with MATLAB/Simulink, while the same 65-nm CMOS Achronix Speedster SPD60 asynchronous FPGA on a SPC60 development board was used for the veri? cation of the asynchronous implementation. B. ASIC Realizations at 65- and 90-nm CMOS The behavioral VHDL designs (conventional synchronous design) of the 2-D IIR SBP ? ter circuits for PPCMs with word size of 16 bits ( , ) were also synthesized for ASIC in 65- and 90-nm CMOS technology. The ASIC synthesis was performed with the Synopsys Design Compiler Version D-2010. 03 using DesignWare building block libraries and TSMC TCBN65G standard-cell library version 140b and TSMC TCBN90GHP standard-cell library version 210a for 65and 90-nm technology, respectively. The optimization goal was to maximize the speed of operation of the ? lter. The global operating voltage for both the technologies was 900 mV. The results of the ASIC synthesis of the CLA optimized ? ter designs for 65- and 90-nm technologies are shown in Table IV. It can be seen that the speed of operation of the ? lter increases as the CLA pipelining is increased (at the cost of added hardware complexity thereby increasing cell area and power). But once the CLA pipelining reaches a certain limit, there is no further speed improvement in the ? lter circuit. For 65-nm CMOS technology, CLA with was the limit beyond which the speed of the ? lter remained the same (1. 064 GHz). Similarly, for 90-nm CMOS technology, the ? lter with CLA of had the highest speed of 694 MHz.

We consider the equation ; for the area-time complexity [59]. Since our optimization goal was to improve the speed of operation of the circuit, we chose a higher value of leading to performance. Alternatively, for area ef? cient designs, the performance can be compared among various designs. The area-time complexity, total gate count and power details of the prototype designs obtained from the ASIC synthesis are also indicated in Table IV. It can be observed that the circuit with CLA of is the optimum design in terms of performance based on the synthesis results.

For the power analysis, a PW input (partially-broadband GMC pulse) with one desired signal and four interfering signals including noise (as used for the BER simulation in Section VII) was modeled as test patterns. A switching activity information format This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. JOSHI et al. : SYNTHESIS AND ARRAY PROCESSOR REALIZATION OF A 2-D IIR BEAM FILTER 13 TABLE IV RESULTS OF ASIC SYNTHESIS OF THE SECOND-ORDER 2-D IIR SBP FILTER CIRCUITS FOR 65-nm AND 90-nm CMOS TECHNOLOGIES (SAIF) ? e was generated using 10 000 test vectors for the gatelevel simulation in Cadence NCSim version 06. 11. Then the SAIF ? le was back annotated with the gate level netlist. Finally, Power Compiler was used to calculate power consumption in the circuit. It can be seen in Table IV that as the CLA stages in the design is increased, the total power consumption of the design also increases. Therefore, there exists a trade-off between power-ef? cient and timing-ef? cient design and the choice of design is based on the speci? c requirements on speed or power. X. CONCLUSION A second-order 2-D IIR SBP beam ? ter is proposed for the directional enhancement of a PW based on its DOA. The ? lter is highly steerable, algebraically-de? ned in terms of the desired DOA, computable, and is based on the impulse response modulation of a practical-BIBO stable 2-D IIR frequency-planar beam PW ? lter. The performance of the 2-D IIR digital ? lter for interference rejection is veri? ed for PWs in the presence of interfering PWs at different DOAs. Further, the BER versus SIR performance of a phased array beamformer and the secondorder 2-D IIR SBP beam ? lter were studied and simulated for a partially broadband PW.

The performance improvement of the second-order 2-D IIR SBP beam ? lter (which is based on a systolic/wavefront array implementation) over a digital PAF beamformer is proposed here. Also, a massively-parallel array architecture of the ? lter is proposed for real-time implementations using synchronous and asynchronous FPGAs, which enable spatial ? ltering of broadband PWs at a very high throughput having potential applications in wireless communications, cognitive radio, radio-astronomy aperture-arrays, and radar. The ASIC synthesis of the ? lter designs was also carried out for 65- and 90-nm CMOS technologies.

The results show that the speed of operation of the ? lter is as high as 1. 064 GHz for a stage CLA pipelined design for 65-nm CMOS technology. Higher order 2-D IIR ? lter and its performance improvement is a topic to be researched in future. REFERENCES [1] S. Ries and T. Kaiser, “Towards beamforming for UWB signals,” in Proc. EUSIPCO, 2004, pp. 829–832. [2] UWB Communication Systems—A Comprehensive Overview, M. -G. D. Benedetto, T. Kaiser, A. F. Molisch, I. Oppermann, C. Politano, and D. P. , Eds. New York: Hindawi, 2006. [3] L. Liang and S. V. Hum, “Experimental characterization of UWB beamformers based on multidimensional beam ? ters,” IEEE Trans. Ant. Propag. , vol. 59, no. 1, pp. 304–309, Jan. 2011. [4] S. V. Hum, A. Madanayake, and L. T. Bruton, “UWB beamforming using 2D beam digital ? lters,” IEEE Trans. Ant. Propag. (TAP), vol. 57, no. 3, pp. 804–807, Mar. 2009. [5] T. Gunaratne and L. Bruton, “Adaptive complex-coef? cient 2D FIR trapezoidal ? lters for broadband beamforming in cognitive radio systems,” Circuits, Syst. , Signal Process. , vol. 30, pp. 587–608, 2011. [6] K. Hamdi, W. Zhang, and K. Ben Letaief, “Joint beamforming and scheduling in cognitive radio networks,” in Proc. IEEE Global Telecommun. Conf. GLOBECOM), 2007, pp. 2977–2981. [7] G. Zheng, S. Ma, K. kit Wong, and T. -S. Ng, “Robust beamforming in cognitive radio,” IEEE Trans. Wirel. Commun. , vol. 9, no. 2, pp. 570–576, Feb. 2010. [8] K. Cumanan, L. Musavian, S. Lambotharan, and A. Gershman, “SINR balancing technique for downlink beamforming in cognitive radio networks,” IEEE Signal Process. Lett. , vol. 17, no. 2, pp. 133–136, Feb. 2010. [9] Y. Zhao, R. Adve, and T. Lim, “Beamforming with limited feedback in amplify-and-forward cooperative networks,” IEEE Trans. Wirel. Commun. , vol. 7, no. 12, pp. 5145–5149, Dec. 2008. [10] Y.

Zhang, X. Li, and M. Amin, “Distributed beamforming in multiuser cooperative wireless networks,” in Proc. 4th Int. Conf. Commun. Network. China (ChinaCOM), 2009, pp. 1–5. [11] B. Jeffs, K. Warnick, J. Landon, J. Waldron, D. Jones, J. Fisher, and R. Norrod, “Signal processing for phased array feeds in radio astronomical telescopes,” IEEE J. Sel. Topics in Signal Process. , vol. 2, no. 5, pp. 635–646, Oct. 2008. [12] M. Elmer and B. D. Jeffs, “Beamformer design for radio astronomical phased array feeds,” in Proc. IEEE Int. Acoust. Speech Signal Process. (ICASSP) Conf. , 2010, pp. 2790–2793. [13] K.

F. Warnick, B. D. Jeffs, J. Landon, J. Waldron, D. Jones, J. R. Fisher, and R. Norrod, “Beamforming and imaging with the BYU/ NRAO L-band 19-element phased array feed,” in Proc. 13th Int. Symp. Ant. Technol. Appl. Electromagn. Canadian Radio Sci. Meet. (ANTEM/ URSI), 2009, pp. 1–4. [14] A. Madanayake, “Real-time FPGA architectures for frequency-planar MDSP,” Ph. D. dissertation, Dept. Elect. Comput. Eng. , Univ. Calgary, Calgary, AB, Canada, 2008. [15] Z. N. C. Huseyin Arslan and M. -G. D. Benedetto, Ultra Wideband Wireless Communication. Hoboken, NJ: Wiley-Interscience, 2006. [16] T. H. Khine, K.

Fakuwa, and H. Suzuki, “Systolic OMF-RAKE: Linear interference canceller-utilizing systolic array for mobile communications,” IEICE Trans. Commun. , vol. E88-B, no. 5, pp. 2128–2135, May 2005. [17] E. M. Staderini, “UWB radars in medicine,” IEEE Aerosp. Electron. Syst. Mag. , vol. 17, no. 1, pp. 13–18, 2002. [18] A. V. Ardenne, “Concepts of the square kilometre array; Toward the new generation radio telescopes,” in Proc. IEEE Int. Symp. Ant. Propag. , 2000, pp. 158–161. [19] S. W. Ellingson, “A DSP engine for a 64-element array,” in Proc. Perspectives for Radio Astronomy—Technol. for Large Ant.

Arrays, 1999, pp. 235–242. [20] M. C. VanBeurden, A. B. Smolders, and M. E. J. Jeuken, “Design of wideband phased antenna arrays,” in Proc. Perspectives for Radio Astronomy—Technol. for Large Ant. Arrays, 1999, pp. 347–352. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS [21] A. Faulkner, P. Alexander, A. Van-Ardenne, R. Bolton, J. Bregman, A. V. Es, M. Jones, D. Kant, S. Montebugnoli, P. Picard, S. Rawlings, S. Torchinsky, J.

G. B. D. Vaate, and P. Winlinson, “The aperture arrays for the SKA: The SKADS white paper,” SKA Memo 122, 2010. [Online]. Available: http://www. skatelescope. org [22] K. Gold, R. Silva, R. Worrel, and A. Brown, “Space navigation with digital beam steering GPS receiver technology,” presented at the 59th Annu. Meet. ION, Alberquerque, NM, 2003. [23] R. Silva, R. Worrel, and A. Brown, “Reprogrammable, digital beam steering GPS receiver technology for enhanced space vehicle operations,” presented at the Core Technologies for Space Syst. Conf. , Colorado Springs, CO, 2002. [24] R. M. Joshi, A.

Madanayake, and L. T. Bruton, “A 2D IIR spatially-bandpass antenna beamformer on a 65 nm Achronix SPD60 asynchronous FPGA,” presented at the 54th IEEE Int. Midw. Symp. Circuits Syst. (MWSCAS), Seoul, Korea, 2011. [25] A. Madanayake and L. T. Bruton, “A speed-optimized systolic array processor architecture for spatio-temporal 2-D IIR broadband beam ? lters,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 7, pp. 1953–1966, Aug. 2008. [26] A. Madanayake and L. T. Bruton, “A systolic-array architecture for ? rst-order 3D IIR frequency-planar ? lters,” IEEE Trans. Circuits Syst. I, Reg.

Papers, vol. 55, no. 6, pp. 1546–1559, Jul. 2008. [27] A. Madanayake and L. Bruton, “A review of 2D/3D IIR plane-wave real-time digital ? lter circuits,” in Proc. IEEE Canadian Conf. Elect. Comput. Eng. (CCECE), 2005, pp. 1935–1941. [28] L. Liang and S. Hum, “Experimental veri? cation of an adaptive UWB beamformer based on multidimensional ? ltering in a real radio channel,” in Proc. IEEE Ant. Propag. Soc. Int. Symp. (APSURSI), 2010, pp. 1–4. [29] T. K. Gunaratne and L. T. Bruton, “Beamforming of broad-band bandpass plane waves using polyphase 2-D FIR trapezoidal ? lters,” IEEE Trans. Circuits Syst.

I, Reg. Papers, vol. 55, no. 3, pp. 838–850, Mar. 2008. [30] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques. Englewood Cliffs, NJ: Prentice-Hall, 1992. [31] B. D. V. Veen and K. M. Buckley, “Beamforming: A versatile approach to spatial ? ltering,” IEEE ASSP Mag. , vol. 5, no. 2, pp. 4–24, Apr. 1988. [32] T. K. Gunaratne and L. T. Bruton, “Tracking broadband plane waves using 2D adaptive FIR fan ? lters,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2006, pp. 4923–4926. [33] Q. Gu and M. N. S. Swamy, “On the design of a broad class of 2-D recursive digital ? ters with fan, diamond and elliptically-symmetric responses,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 41, no. 9, pp. 603–614, Sep. 1994. [34] L. Khademi and L. T. Bruton, “Reducing the computational complexity of narrowband 2D fan ? lters using shaped 2D window functions,” in Proc. Int. Symp. Circuits Syst. (ISCAS), 2003, pp. 702–705. [35] S. Y. Kung, VLSI Array Processors. Englewood Cliffs, NJ: PrenticeHall, 1988. [36] E. S. E. K. Bromley and S. Y. Kung, “Systolic Arrays,” presented at the 2nd Int. Conf. , Los Alamitos, CA, 1988. [37] N. Rama Murthy and M. N. S.

Swamy, “On the real-time computation of DFT and DCT through systolic architectures,” IEEE Trans. Signal Process. , vol. 42, no. 4, pp. 988–991, 1994. [38] K. K. Parhi and D. G. Messerschmitt, “Concurrent architectures for two-dimensional recursive digital ? ltering,” IEEE Trans. Circuits Syst. , vol. 36, no. 6, pp. 813–829, Jun. 1989. [39] A. Madanayake and L. T. Bruton, “A real-time systolic array processor implementation of two-dimensional IIR ? lters for radio-frequency smart antenna applications,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2008, pp. 1252–1255. [40] J. Teifel and R.

Manohar, “An asynchronous data? ow FPGA architecture,” IEEE Trans. Comput. , vol. 53, no. 11, pp. 1376–1392, Nov. 2004. [41] S. Y. Kung, S. C. Lo, S. N. Jean, and J. N. Hwang, “Wavefront array processors-concept to implementation,” Computer, vol. 20, pp. 18–33, May 1987. [42] S. -Y. Kung, K. Arun, R. Gal-Ezer, and D. Bhaskar Rao, “Wavefront array processor: Language, architecture, and applications,” IEEE Trans. Comput. , vol. C-31, no. 11, pp. 1054–1066, Nov. 1982. [43] S. Y. Kung, “VLSI array processors: Designs and applications,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 1989, pp. 13–320. [44] Achronix Semiconductor Corporation, Santa Clara, CA, “Achronix Semiconductor Corporation website,” 2011. [Online]. Available: http://www. achronix. com [45] S. Ramaswamy, L. Rockett, D. Patel, S. Danziger, R. Manohar, C. W. Kelly, J. L. Holt, V. Ekanayake, and D. Elftmann, “A radiation hardened recon? gurable FPGA,” in Proc. IEEE Aerosp. Conf. , 2009, pp. 1–10. [46] T. K. Gunaratne, “Beamforming of temporally broadband bandpass plane waves using 2D FIR trapezoidal ? lters,” M. Sc. thesis, Dept. Elect. Comput. Eng. , Univ. Calgary, Calgary, AB, Canada, 2006. [47] D. E.

Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1984. [48] J. G. Proakis and D. G. Manolakis, Digital Signal Processing—Principles, Algorithms, and Applications, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1995. [49] A. Madanayake, S. V. Hum, and L. T. Bruton, “A systolic array 2D IIR broadband RF beamformer,” IEEE Trans. Circuits Syst. II, Expr. Briefs, vol. 55, no. 12, pp. 1244–1248, Dec. 2008. [50] L. T. Bruton and N. R. Bartley, “Three-dimensional image processing using the concept of network resonance,” IEEE Trans.

Circuits Syst. , vol. 32, pp. 664–672, Jul. 1985. [51] P. Agathoklis and L. T. Bruton, “Practical-BIBO stability of N-dimensional discrete systems,” Proc. IEE, vol. 130, no. 6, pt. G, pp. 236–242, Dec. 1983. [52] D. Dudgeon, “Fundamentals of digital array processing,” Proc. IEEE, vol. 65, no. 6, pp. 898–904, Jun. 1977. [53] M. Ghavami, L. B. Michael, and R. Kohno, Ultra Wideband Signals and Systems in Communication Engineering. West Sussex, U. K. : Wiley, 2004. [54] R. Armstrong, J. Hickish, K. Adami, and M. E. Jones, “A digital broadband beamforming architecture for 2-PAD,” in Proc. Wide? ld Sci. Technol. for the SKA, SKADS Conf. , 2009, pp. 284–288. [55] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. [56] A. Madanayake, S. V. Hum, and L. T. Bruton, “Effects of quantization in systolic 2D IIR beam ? lters on UWB wireless communications,” Circuits, Syst. , Signal Process. , pp. 1–16, Jun. 2011. [57] M. Tull, G. Wang, and M. Ozaydin, “High-speed complex number multiplier and inner-product processor,” in Proc. 45th Midw. Symp. Circuits Syst. (MWSCAS), 2002, pp. 640–643. [58] “Achronix CAD Environment User Guide,” ver. . 3. 0, Oct. 2009. [59] C. D. Thompson, “A complexity theory for VLSI,” Ph. D. dissertation, Dept. Comput. Sci. , Carnegie-Mellon Univ. , Pittsburgh, PA, 1980. Rimesh M. Joshi (S’10) received the B. E. degree in electronics and communication engineering from Tribhuvan University, Kathmandu, Nepal, in 2008, and the M. S. degree in electrical engineering from the University of Akron, Akron, OH, in 2011. Arjuna Madanayake (M’03) received the B. Sc. degree in electronic and telecommunication engineering from the University of Moratuwa, Moratuwa, Sri Lanka, in 2002, and the M. Sc. and Ph. D. egrees in electrical engineering from the University of Calgary, Calgary, Canada, in 2004 and 2008, respectively. He is a Tenure-track Assistant Professor with the Department of Electrical and Computer Engineering, University of Akron, Akron, OH. Jithra Adikari (M’07) received B. Sc. degree in electronic and telecommunication engineering from the University of Moratuwa, Moratuwa, Sri Lanka, in 2002, the M. Sc. degree in information technology from the Royal Institute of Technology (KTH), Stockholm, Sweden, in 2005, and the Ph. D. degree in electrical and computer engineering from the University of Calgary, Calgary, AB, Canada, in 2010.

He is with Elliptic Technologies, Canada. He was with the University of Waterloo, Waterloo, ON, Canada. Len T. Bruton (F’81) is Professor Emeritus with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada. Prof. Bruton was a recipient of many awards including the 2002 IEEE Circuits and Systems Education Award, the 1994 IEEE Outstanding Engineering Award, and the 1991 Manning Principal Award. In 1994, he was elected a fellow of the Royal Society of Canada. He has been featured in the 1997 Great Canadian Scientists by Barry Shell.