Chitrasena Bhat Niranjan N Chiplunkar
Dept. of Comp. Sc. & Engg., Dept. of Comp. Engg.,
S J College of Engg., NMAM Inst. of Tech.
Mysore 570 006 Nitte- 574 110.
Karnataka, India. Karnataka, India.
One of the main objective in the process of mapping a circuit onto a look-Up Table (LUT) based FPGA is to minimize the number of LUTs required to implement the circuit. A new top-down technology mapper algorithm is discussed in this paper which aims at minimizing number of LUTs needed for mapping the digital circuit. The algorithm makes use of combination of node selection and covering heuristics using which maximum number of DAG nodes are covered by a selected LUT. The results obtained are better than Chortle, level-map and Flow-map-r technology mapper algorithms.
I.
INTRODUCTION :
In the electronic industry it is vital reach the market with new products in the shortest possible time and hence reduction in design, development and production time is very much essential. Along with minimizing turnaround time the cost manufacturing a chip also has to be kept minimum. The Field Programmable Gate Array (FPGA) technology envolved in the last decade which could be used to implement Application Specific Integrated Circuits (ASICs) has advantage of lower prototype cost and shorter production times. FPGA consists of an array of uncommitted programming elements that can be interconnected in a general way. The interconnections between the elements are user programmable. The different classes of commercially available FPGAs include symmetric array FPGA, row-based FPGA, Sea-of-Gates structured FPGA, hierarchical PLD structured FPGA etc. In symmetric array FPGA, the programmable elements are called LUT blocks (Look-Up-Table). A K-LUT block is one which can be implement any Boolean function of upto K-variables [1].
The FPGA design process is similar to that of conventional technologies such as standard cell or gate array. It consists of high level synthesis, logic synthesis and layout synthesis[2]. Logic level synthesis for FPGAs include optimizing the logic gate networks such that these circuits can be efficiently mapped onto the target technology. The term technology mapping is used for the process of mapping the logic circuits on the target FPGA structure. Mapping process in LUT based FPGAs in general can focus on three main issues[3]: minimizing number of LUTs used in the mapping solution, minimizing the number of levels of LUTs in the mapped network and routability of the mapping solution.
The rest of this paper is organized as follows. In the next section we define the problem of technology mapping for LUT based FPGAs. In section III we briefly describe few of the works so far carried out in the area of technology mapping of FPGAs. In section IV, we propose our approach which is a heuristic top-down method for LUT technology mapping. Section V includes experimental results and comparison of our algorithm with Chortle, Level-map and Flow-map-r algorithms.
II.
PROBLEM FORMULATION
S= {V,E} is a DAG (Directed Acyclic Graph) representing a combinational circuit, where each node v Î V represent a primary input (PI) or a primary output (PO) or a logic gate; and a directed edge (v,w) Î E indicates that output of gate v is an input to gate w. The level of a node v is the maximum number of edges on any path from a PI to v and the depth of a network is the largest level of its primary outputs. The set of fanins of gate v is denoted |input (v)|. Node v is K-feasible if |input (v)|<=K.
The task of the LUT technology mapping in logic synthesis is to cover the optimized gate-level network with LUTs such that either number of LUTs or levels of LUTs is minimized or routability is improved. This paper discuss a heuristic algorithm for technology mapping whose objective is to minimize the number of LUTs. In the process of LUT technology mapping a valid covering must satisfy the following conditions :
i) Every PO node is an output of a LUT
ii) Every PI node is an input to one to more LUTs and
iii) If node v is implemented by a LUT (we call such a node as LUTv), then every non-PI node in |input (LUTv)| is forced to get covered by the LUT at v, such that LUT at v does not get more than K inputs and in the process number of LUTs gets minimized.
III. REVIEW
OF PREVIOUS WORK
Technology mapping transforms optimized network into the final circuit. This is done by selecting pieces of network that can each be implemented by the one of available circuit elements and specifying how these elements are to be interconnected. Technology mapping can be broadly classified in to two classes : library-based mapping and direct mapping.
LIBRARY BASED MAPPING : In library based mapping, given function is represented as a subject graph consisting of 2-input NAND /NOR/AND/OR gates. The library functions are also represented in the same way. The mapping algorithm tries to cover the subject graph by a minimum number of pattern graphs available in the library.. [4],[5],[6] and [7] are examples of library mapping strategies.
DIRECT MAPPING : Library based technology mapping cannot be used effectively with LUT- based FPGAs, because of the large number of functions that a LUT can implement. Direct mapping algorithms for LUT based FPGAs can be divided into three classes. Algorithms in the first class emphasize on minimizing the number of LUTs in the solution ([8],[9],[3]). In chortle technology mapper [8], the original network is the first partition into a forest of trees and then each tree is separately mapped into subcircuits of K-input LUTs. Final circuit is then assembled from the subcircuits implementing the trees. The main feature of this algorithm is that it simultaneously address the decomposition and matching problem using a bin packing algorithm.. algorithms in the second class emphasize on minimizing the delay of mapping solution by reducing the number of levels of LUTs ([10],[11]). Mapping algorithms in the third category have the objective of maximizing the routability of the mapping solutions([12],[13]).
It has been shown that area optimal LUT mapping problem is NP-Hard for K>=5 (where K is the number of LUT inputs) [3]. Hence the problem can be solved either by solution space enumeration (which will have exponential time complexity) or by heuristics. There are two dimensions in the solution space : each solution has to select a subset of nodes to be implemented by LUTs and each node has to select a set of its predecessor nodes numbering less than or equal to K, to be covered by its LUT implementation. Depending on the order in which these dimensions are considered, the enumeration mapping techniques are classified as (a) node selection based enumerations and (b) node covering based enumerations. Heuristic mapping methods are generally regarded as approximations to these two enumeration methods and are classified accordingly ie., (a) node selection based heuristics and (b) node covering based heuristics. There are also heuristic methods which are approximations of both. The heuristic algorithm proposed in this work is an example for such a combined approximation.
IV.
HEURISTIC TOP DOWN ALGORITHM FOR LUT TECHNOLOGY MAPPING
:
This algorithm process the input DAG from primary output(PO) nodes to primary input (PI) nodes. Hence it is a top down algorithm. Initially the given circuit is converted into a DAG representation. Later this DAG is converted to a K-restricted DAG using a pre-processor. A K-restricted DAG is one for which |input (v)| for each vertex is less than or equal to K. Later this restricted DAG is topologically sorted. The topological order for a DAG is a sequential listing of all the vertices in G such that for all vertices (v,w) belonging to G, if there is an edge from v to w then v precedes w in the sequential listing. K-restricted topologically ordered DAG is now scanned from PO end to PI end in level by level fashion such that a LUT covers maximum number of nodes of DAG at each step. The algorithm is given below.
INPUT :
Topologically sorted K-restricted DAG
OUTPUT :
No. of K-LUTs required for mapping the given DAG.
Begin
Scan the input K-restricted DAG from PO-end after
Assigning one LUT each to each of the PO-nodes.
/*These PO-nodes are now called LUT-nodes*/
Repeat
K=max.number of allowable input to a LUT.
Select arbitrarily a LUT-node “L” from
Input DAG.
While (K>0) do
Select a node “M” from DAG
Case “M” of
1: /*”M” is (PI-node) and (has outdegree=1) and (ML is an edge in the DAG).*/
begin
make “M” as one of the input to the
current LUT at “L”
K=K+1;
End
2: /*”M” is (PI-node) and (has outdegree>1) and (ML is an edge in the DAG).*/
begin
make “M” as one of the input to the
current LUT at “L”
K=K-1;
Reduce the outdegree of “M” by 1;
End
3: /*”M” is (non-PI-node) and (has outdegree>1) and (ML is an edge in the DAG).*/
begin
make “M” as one of the input to the
current LUT at “L”
K=K-1;
Reduce the outdegree of “M” by 1;
Mark “M” as LUT-node if not
Already marked;
End
4:/*”M” is non-PI-node having outdegree=1 and ML is an edge in the DAG and if remaining direct inputs of “L” can be connected to “L” after putting “M” inside the LUT and counting for it’s child nodes*/
begin
If (in degree of “M” is <=K) and
(M is non-LUT node) then
begin
Put “M” inside the LUT at L;
K=K-indegree of “M”;
Mark each non-PI input nodes of
“M” as LUT-nodes;
reduce the outdegrees of all child
nodes of “M” by 1;
end
else
begin
K=K-1;
Mark “M” as LUT-node if not
Already marked;
End
End
EndCase
Endwhile
Until(all nodes of DAG are covered by LUTs)
Total K-LUTs required=count of marked LUT-nodes
End.
Example : Consider a DAG with 6 PI, 3 PO and 2 internal nodes. Let the value of K be 3 ie., we use 3-LUTs for mapping. The 3-restricted DAG is shown in the figure 1.
1 PO 2 PO 3
PO
4
5 6 7
PI PI
PI PI PI PI
8 9 10 11
Fig. 1. Intial 3-restricted
DAG
If we apply the top-down technology mapping for the above DAG, we require four numbers of 3-LUTs as shown in figure 2. According to the algorithm, in the above example, node 4 fits into case 1(ie., PI node with outdegree=1). Node 5 fits into case (ie.,PI node with outdegree >1). Node 6 fits into case 3(ie., non PI node with outdegree>1) . Node 7 fits into case 4(ie., non PI node with outdegree=1). Since node 7 has only two child nodes (ie., 10 and 11), it goes inside the LUT at node 3, making node 10 and 11 direct inputs to the LUT at 3.
1 2 3
4
5 6 10 11
8
9
Fig.2. Four no.of 3-LUTs are
required
V.
RESULTS AND CONCLUSION :
We have tested our algorithm coded in C on many logic synthesis benchmark examples. The table-I shows the results of our testing on a few of them, for which, results of Chortle, Level-map and Flow-map-r algorithms are available for comparison.[3] (Value of K was taken as 5 for the experimentation) We see that our algorithm gives better results. Runtime comparison of our algorithm with other algorithms is not given because of different platforms used for implementation.
Bench mark ckts. |
No. of 5-LUTs using Top-down |
No. of 5-LUTs using Level-map |
No. of 5-LUTs using Flow-map-r |
No. of 5-LUTs using Chortle |
Z4ml |
6 |
9 |
9 |
9 |
count |
31 |
31 |
57 |
64 |
apex7 |
77 |
78 |
76 |
76 |
rot |
198 |
219 |
209 |
236 |
c499 |
98 |
77 |
130 |
166 |
apex6 |
254 |
232 |
220 |
240 |
misx1 |
12 |
15 |
15 |
20 |
vg2 |
52 |
24 |
38 |
34 |
duke2 |
129 |
164 |
151 |
150 |
5xp1 |
15 |
24 |
22 |
34 |
TOT |
872 |
873 |
927 |
1029 |
CMP |
1.000 |
1.001 |
1.063 |
1.180 |
TABLE-1. Comparison of mapping
results.
While implementing the algorithm, we did not consider local optimization like, exploiting reconvergent paths and replicating logic at fan-out nodes. This means that we have restricted our search for the nodes for possible coverage by a LUT to only one level in the DAG. Because of this c499, apex6 and vg2 bench mark circuit examples require few more number of 5-LUTs than those required by other algorithms. Hence there is scope for further optimization. We have obtained a K-restricted graph from the input DAG using a pre-processor developed by us, which does not take advantage of Boolean properties in splitting the node inputs. We feel that the results may be improved by using a better node decomposition method. We also conjecture that an initial circuit optimization at the premapping stage will improve the overall performance of our algorithm because this will reduce the number of literals in the network.
BIBLIOGRAPHY
1. Stephen D Brown, Robert J. Francis, Jonathan Rose and Zvonko G. Vranesic ““Field Programmable Gate Arrays”, Kluwer Academic Publishers, Boston 1995.
2. Jason Cong and Y.Ding “Tutorial and Survey paper on Combinational Logic Synthesis for LUT based FPGAs” ACM transactions on Design Automation of Electronic Systems, Vol 1,No.2, April 1996, pages 145-204.
3. Amir H. Farrahi and Majid Sarrafzadeh “ Complexity of the Lookup-Table minimization problem for FPGA Technology Mapping “ IEEE transactions on CAD of Integrated Circuits and Systems, Vol 13 No. 11, Nov.1994.
4. Keutzer K. “Dagon: Technology binding and local optimization by DAG mapping” proceedings of 24th design automation conference 1987 pp 341-347.
5. Detjens E., Gannot G., Rudell R., Sangiovanni-Vincentelli A. and Wang A. “Technology mapping in MIS”, IEEE International Conference on CAD 1987 pp 116-119.
6. S. Chattopadhyay, S Roy, P. Pal Choudhari “KGPMAP: library based tech. Mapping technique for antifuse based FPGAs” IEEE trans. On CAD, May 93,pp 599-620.
7. F. Mailhot, G.D. Micheli “ Algorithms for technology mapping based on Binary Decision Diagrams and on Boolean operation “ IEEE trans. On CAD, May 1993, pp 599-620.
8. R.J. Francis , J.Rose, K Chung “ CHORTLE : A technology mapping program for table based field based field programmable gate arrays” in proc. Of 27th ACM/IEEE design automation conf. 1990 pp 613-619.
9. R.J. Francis, J.Rose and Z.Vranesic “ CHORTLE-CRF : Fast technology mapping for lookup table based FPGAs” proc. 28th ACM/IEEE design automation conf. 1991 pp 613-619.
10. R.J.Francis, J. Rose and Z.Vranesic “Technology mapping for lookup table FPGAs for performance” proc. of IEEE trans. On CAD 1991 pp 568-571.
11. J.Cong and Y.Ding “FLOMAP : an optimal technology mapping algorithm for delay optimization in lookup table based FGPA designs” IEEE trans. On CAD, Jan.94 pp 1-11.
12. N.Bhat and D.Hill “ Routable technology mapping for FGPAs” in 1st ACM/SIGDA workshop on FPGAs, 1992 pp 143-148.
13. M.Schlay, J.Kong, P.K. Chan “ Routability driven technology mapping for LUT based FPGAs “ IEEE Int. Conference on CAD proc. 1992.