HEURISTIC TECHNOLOGY MAPPER FOR LUT BASED FPGA’s

Chitrasena Bhat                                                                                  Niranjan N Chiplunkar

Dept. of Comp. Sc. & Engg.,                                                               Dept. of Comp. Engg.,

S J College of Engg.,                                                                           NMAM Inst. of Tech.

Mysore 570 006                                                                                 Nitte- 574 110.            

Karnataka, India.                                                                                 Karnataka, India.

 

 

ABSTRACT

 

One of the main objective in the process of mapping a circuit onto a look-Up Table (LUT) based  FPGA is to minimize the number of LUTs required to implement the circuit. A new top-down technology mapper  algorithm is discussed in this paper which aims at minimizing number  of  LUTs needed for mapping the digital circuit. The algorithm makes use of combination  of node selection and covering  heuristics using which maximum number of DAG nodes are covered by a selected LUT. The results obtained  are better than Chortle, level-map and Flow-map-r technology mapper algorithms.

 

I.                   INTRODUCTION :

In the electronic industry it is vital reach the market with new products in the shortest possible time and hence reduction in design, development and production time is very much essential. Along with minimizing  turnaround time the cost manufacturing a chip also has to be kept minimum. The Field  Programmable Gate Array (FPGA) technology envolved in the last decade which could be used to implement  Application Specific Integrated Circuits (ASICs) has advantage of lower prototype cost and shorter production times. FPGA consists  of an array of uncommitted programming elements that can be interconnected in a general way. The interconnections between the elements  are user programmable.  The different classes of commercially available  FPGAs include symmetric array FPGA, row-based FPGA, Sea-of-Gates structured FPGA, hierarchical PLD structured FPGA etc. In symmetric array FPGA, the programmable elements are called  LUT blocks (Look-Up-Table). A K-LUT block is one which  can be implement any Boolean function of upto K-variables [1].

The FPGA design process is similar to that of conventional technologies such as standard cell or gate array. It consists of high level synthesis, logic synthesis and layout synthesis[2]. Logic level synthesis for  FPGAs include optimizing the logic gate networks such that these  circuits can be efficiently mapped  onto the target technology. The term technology mapping is used for the process  of mapping the logic  circuits on the target FPGA structure. Mapping process  in LUT based  FPGAs in  general can focus on three main issues[3]: minimizing number of LUTs used in the mapping solution, minimizing the number of levels of LUTs in the mapped network and routability of the mapping solution.

The rest of this paper is organized as follows. In the next section we define the problem of technology mapping for LUT based FPGAs. In section III we briefly describe few of the works so far carried  out in the area of technology mapping of FPGAs. In section IV, we propose our approach which is a heuristic top-down method for LUT technology mapping. Section V includes experimental results and comparison of our algorithm with Chortle, Level-map and Flow-map-r algorithms.

 

II.                PROBLEM FORMULATION

S= {V,E} is a DAG (Directed Acyclic Graph) representing a combinational circuit, where each node v Î V represent a primary input (PI) or a primary output (PO) or a logic gate; and a directed edge (v,w)  Î E indicates that output of gate v is an input to gate  w. The  level of a node v is the maximum number of edges on  any path from a PI to v  and the depth of  a network is the largest level of its primary outputs. The set of fanins of gate v is denoted |input (v)|.  Node v is K-feasible if |input (v)|<=K.

 

The task of the LUT technology mapping in logic synthesis is to cover the optimized gate-level network with LUTs such that either number of LUTs or levels of LUTs is minimized or routability is improved. This  paper discuss a heuristic algorithm for technology mapping whose objective is to  minimize the number of LUTs. In the process of LUT technology mapping  a valid  covering must satisfy the  following conditions :

 

i)                    Every PO node is an output of a LUT 

ii)                   Every PI node is an input to one to more LUTs and

iii)                 If node v is implemented by a LUT (we  call such a node as LUTv), then every non-PI node in |input (LUTv)| is forced  to get covered by the LUT at v, such  that LUT at v does not  get more  than K inputs and in the process number of LUTs gets minimized.

 

III. REVIEW OF PREVIOUS WORK

Technology  mapping transforms optimized network into the final circuit. This is done by selecting pieces of network that can each be implemented by the one of available circuit elements and specifying how these  elements are to be interconnected. Technology mapping can be  broadly classified in to two classes :  library-based mapping and direct mapping.

 

LIBRARY BASED MAPPING : In library based mapping, given function is represented as  a subject graph consisting of 2-input NAND /NOR/AND/OR gates.  The library functions are also represented in the same way.  The mapping algorithm tries to cover the subject graph by a minimum number of pattern graphs available in the library.. [4],[5],[6] and [7] are examples of library mapping strategies. 

 

DIRECT MAPPING : Library based technology mapping cannot be used effectively with LUT- based  FPGAs, because of the large number of functions that a LUT can implement.  Direct mapping algorithms for LUT based FPGAs can be divided into three classes.  Algorithms in the first class emphasize on minimizing the number of LUTs in the solution ([8],[9],[3]).  In chortle technology mapper [8], the original network is the first partition into a forest of trees and then each tree is separately mapped into subcircuits of K-input LUTs.  Final circuit is then assembled from the subcircuits implementing the trees.  The main feature of this algorithm is that it simultaneously address the decomposition and matching problem using a bin packing algorithm.. algorithms in the second class emphasize on minimizing the delay of mapping solution by reducing the number of levels of LUTs ([10],[11]).  Mapping algorithms in the third category have the objective of maximizing the routability of the mapping solutions([12],[13]). 

It has been shown that area optimal LUT mapping problem is NP-Hard for  K>=5 (where K is the number of LUT inputs) [3].  Hence the problem can be solved either by solution space enumeration (which will have exponential time complexity) or by heuristics.  There are two dimensions in the solution space : each solution has to select a subset of nodes to be implemented by LUTs and each node has to select a set of its predecessor nodes numbering less than or equal to K, to be covered by its LUT implementation.  Depending on the order in which these dimensions are considered, the enumeration mapping techniques are classified as (a) node selection based enumerations and (b) node covering based enumerations.  Heuristic mapping methods are generally regarded as approximations to these two enumeration methods and are classified accordingly ie., (a) node selection based heuristics and (b) node covering based heuristics.  There are also heuristic methods which are approximations of both.  The heuristic algorithm proposed in this work is an example for such  a combined approximation.

 

IV.              HEURISTIC TOP DOWN ALGORITHM FOR LUT TECHNOLOGY MAPPING :

This algorithm process the input DAG from primary output(PO) nodes to primary input (PI) nodes.  Hence it is a top down algorithm.  Initially the given circuit is converted  into a DAG representation.  Later this DAG is converted to a  K-restricted DAG  using a pre-processor.  A K-restricted DAG  is one for which |input (v)| for each vertex is less than or equal to K.  Later this restricted DAG is topologically sorted.  The topological order for a DAG is a sequential listing of all the vertices in G such that for all vertices (v,w) belonging to G, if there is an edge from v to w then v precedes w in the sequential listing.  K-restricted topologically ordered DAG is now scanned from PO end to PI end in level by level fashion such that a LUT covers maximum number of nodes of DAG at each step.  The algorithm is given below.

INPUT :

Topologically sorted K-restricted DAG

OUTPUT :

No. of K-LUTs required for mapping the given DAG.

Begin

            Scan the input K-restricted DAG from PO-end after

            Assigning one LUT each to each of the PO-nodes.

            /*These PO-nodes are now called LUT-nodes*/

Repeat

  K=max.number of allowable input to a   LUT.

 Select arbitrarily a LUT-node “L” from

 Input DAG.

While (K>0) do

Select a node “M” from DAG

Case “M” of

1: /*”M” is (PI-node) and (has outdegree=1) and (ML is an edge in the DAG).*/

                        begin

                                    make “M” as one of the input to the

                        current LUT at “L”

                                    K=K+1;

            End

2: /*”M” is (PI-node) and (has outdegree>1) and (ML is an edge in the DAG).*/

                        begin

                                    make “M” as one of the input to the

                        current LUT at “L”

                                    K=K-1;

                                    Reduce the outdegree of “M” by 1;

            End

3: /*”M” is (non-PI-node) and (has outdegree>1) and (ML is an edge in the DAG).*/

                        begin

                                    make “M” as one of the input to the

                        current LUT at “L”

                                    K=K-1;

                                    Reduce the outdegree of “M” by 1;

                        Mark “M” as LUT-node if not

                                    Already marked;

            End

4:/*”M” is non-PI-node having outdegree=1 and ML is an edge in the DAG and if remaining direct inputs of “L” can be connected to “L” after putting “M” inside the LUT and counting for it’s child nodes*/

begin

            If (in degree of “M” is <=K) and

            (M is non-LUT node) then

              begin

                                                Put “M” inside the LUT at L;

                        K=K-indegree of “M”;

                        Mark each non-PI input nodes of

                        “M” as LUT-nodes;

                                                reduce the outdegrees of all child

                        nodes of “M” by 1;

            end

            else

            begin

              K=K-1;

              Mark “M” as LUT-node if not

              Already marked;

            End

     End

EndCase

Endwhile

Until(all nodes of DAG are covered by LUTs)

Total K-LUTs required=count of marked LUT-nodes

End.

 

Example : Consider a DAG with 6 PI, 3 PO and 2 internal nodes.  Let the value of K be 3 ie., we use 3-LUTs for mapping.  The 3-restricted DAG is shown in the figure 1.

 

 

 1 PO                           2  PO                                          3   PO

       

 

 

4                                            5                             6                                          7

                                   

        PI                        PI

 

                                               

                                            PI                 PI           PI                        PI       

                                            8                  9               10                         11     

Fig. 1. Intial 3-restricted DAG      

 

If we apply the top-down technology mapping for the above DAG, we require four numbers of 3-LUTs as shown in figure 2. According to the algorithm, in the above example, node 4 fits into case 1(ie., PI node with outdegree=1).  Node 5 fits into case (ie.,PI node with  outdegree >1). Node 6 fits into case 3(ie., non PI node with outdegree>1) . Node 7 fits into case 4(ie., non PI node with outdegree=1).  Since node 7 has only two child nodes (ie., 10 and 11), it goes inside the LUT at node 3, making node 10 and 11 direct inputs to the LUT at 3.

 

 

 

 

1                                  2                                              3

           

 

 

 

 


            4            5                                6                                      10          11

 

 


                                                8            9       

Fig.2. Four no.of 3-LUTs are required

 

V.                 RESULTS AND CONCLUSION :

We have tested our algorithm coded in C on many logic synthesis benchmark examples. The table-I shows the results of our testing on a few of them, for which, results of Chortle, Level-map and Flow-map-r algorithms are available for comparison.[3] (Value of K was taken as 5 for the experimentation) We see that our algorithm gives better results.  Runtime comparison of our algorithm with other algorithms is not given because of different platforms used for implementation.

 

Bench mark ckts.

No. of 5-LUTs using Top-down

No. of 5-LUTs using  Level-map

No. of 5-LUTs using Flow-map-r

No. of 5-LUTs using Chortle

 Z4ml   

6

9

9

9

  count

31

31

57

64

   apex7

77

78

76

76

  rot

198

219

209

236

   c499

98

77

130

166

  apex6

254

232

220

240

  misx1

12

15

15

20

  vg2

52

24

38

34

    duke2

129

164

151

150

5xp1

15

24

22

34

TOT

872

873

927

1029

CMP

1.000

1.001

1.063

1.180

 

TABLE-1. Comparison of mapping results.

 

While implementing the algorithm, we did not consider local optimization like, exploiting reconvergent paths and replicating logic at fan-out nodes.  This means that we have restricted our search for the nodes for possible coverage by a LUT to only one level in the DAG.  Because of this c499, apex6 and vg2 bench mark circuit examples require few more number of 5-LUTs than those required by other algorithms.  Hence there is scope for further optimization.  We have obtained a K-restricted graph from the input DAG using a pre-processor developed by us, which does not take advantage of Boolean properties in splitting the node inputs.  We feel that the results may be improved by using a better node  decomposition method.  We also conjecture that an initial circuit optimization at the premapping stage will improve the overall performance of our algorithm because this will reduce the number of literals in the network.

 

BIBLIOGRAPHY

 

1.      Stephen D Brown, Robert J. Francis, Jonathan Rose and Zvonko G. Vranesic     ““Field Programmable Gate Arrays”, Kluwer Academic Publishers, Boston 1995.

2.      Jason Cong and Y.Ding “Tutorial and Survey paper on Combinational Logic Synthesis for LUT based FPGAs” ACM transactions on Design Automation of Electronic Systems, Vol 1,No.2, April 1996, pages 145-204.

3.      Amir H. Farrahi and Majid Sarrafzadeh “ Complexity of the Lookup-Table minimization problem for FPGA Technology Mapping “ IEEE transactions on CAD of Integrated Circuits and Systems, Vol 13 No. 11, Nov.1994.

4.      Keutzer K. “Dagon: Technology binding and local optimization by DAG mapping” proceedings of 24th design automation conference 1987 pp 341-347.

5.      Detjens E., Gannot G., Rudell R., Sangiovanni-Vincentelli A. and Wang A. “Technology mapping  in MIS”, IEEE International Conference on CAD 1987 pp 116-119.

6.      S. Chattopadhyay, S Roy, P. Pal Choudhari “KGPMAP: library based tech. Mapping technique for antifuse based FPGAs” IEEE trans. On CAD, May 93,pp 599-620.

7.      F. Mailhot, G.D. Micheli “ Algorithms for technology mapping based on Binary Decision Diagrams and on Boolean operation “ IEEE trans. On CAD, May 1993, pp 599-620.

8.      R.J. Francis , J.Rose, K Chung “ CHORTLE : A technology mapping program  for table based field based field programmable gate arrays” in proc. Of 27th ACM/IEEE design automation conf. 1990 pp 613-619.

9.      R.J. Francis, J.Rose and Z.Vranesic “ CHORTLE-CRF : Fast  technology mapping for  lookup table based FPGAs” proc.  28th ACM/IEEE design automation conf. 1991 pp 613-619.

10.  R.J.Francis, J. Rose and Z.Vranesic “Technology mapping for lookup table FPGAs for performance” proc. of  IEEE trans. On CAD 1991 pp 568-571.

11.  J.Cong and Y.Ding “FLOMAP : an optimal technology mapping algorithm for delay optimization in lookup table based FGPA designs” IEEE trans. On CAD, Jan.94 pp 1-11.

12.  N.Bhat and D.Hill “ Routable technology mapping for FGPAs” in 1st ACM/SIGDA workshop on FPGAs, 1992 pp 143-148.

13.  M.Schlay, J.Kong, P.K. Chan “ Routability  driven technology mapping for LUT based FPGAs “ IEEE Int. Conference on CAD proc. 1992.

 

 

 

 

 

 

Hosted by www.Geocities.ws

1