Mapping a Nano-Programmable Logic Array

Michael James Wilson

Mentor: André DeHon

Implementation of Computation Laboratory, Computer Science Department, California Institute of Technology, Pasadena, California

Miniaturization of computing systems leads to speed gains and the ability to manufacture smaller devices.  As the limits of miniaturization on conventional silicon manufacturing are approached it becomes necessary to look for other ways to make computing systems smaller.  Current technology allows the production of carbon nanotubes and silicon nanowires which are a few nanometers in diameter.  An array-based architecture has been proposed to organize nanoscale wires into the basic building blocks for a computing system.  To perform complex tasks, higher level circuits such as Programmable Logic Arrays (PLAs) should be mappable to this architecture.  This project explores generating PLAs for the nanoscale architecture.  Experimenting on standard benchmarks and constructing examples yields information on the best, worst, and average-case characteristics of these circuits.  This information is important for determining the types of nanoscale arrays that must be constructed to implement various types of logic.

Programmable logic devices are now mostly produced through lithography.  Circuits are etched onto a silicon chip.  Advances in technology have resulted in impressive degrees of miniaturization with these methods, but we may soon reach the bottom limit of size with lithographic techniques.  New bottom-up synthesis techniques may allow us to build complex computing devices with features that are a few atoms across. 

There is a proposed architecture for organizing arrays of silicon nanowires into a computational system[1], and methods for addressing nanoscale wires with more conventionally-sized microscale wires[2].  A natural extension of this work is mapping logic components to the architecture.  A programmable logic array (PLA) is a circuit which can be programmed to perform arbitrary boolean logic.  Working at the nanoscale presents challenges such as faulty wires and random assembly of arrays.  A programmable circuit can work around many instances of these problems.  PLAs are also well-studied in conventional architectures, and a good deal of benchmarks and data for comparison exist.  These factors suggest PLAs as a good starting point for mapping logic to nanoscale arrays.

Nanoscale architecture

The nanoscale architecture1 supports building OR planes with inverting and buffering planes at right angles to these OR planes.  This allows for a single buffered NOR.  Connecting several of these planes together produces cascaded NOR gates which may be used to implement PLAs.  It has been shown how four NOR planes can be used to produce one type of PLA[3], and this idea extends to arbitrarily many planes.

Figure 1:  A simple nanoscale PLA

Other analysis has shown that building nanoscale PLAs with 60 output terms yields the least size consumed per output term.

Mapping

Synthesis and mapping basically generated an arbitrary number of nand planes like the nand planes found in Whirlpool PLAs3 and without optimization beyond that provided by CAD tools such as SIS[4].  Since two NAND planes are essentially equivalent to two NOR planes this mapping extends to the nanoscale architecture outlined above.  To accommodate multiple planes on a single PLA, the planes are “wrapped” into a spiral, as show in Figure 2:

Figure 2:  Simplified example of a “wrapped” design.  On the left, we have two planes and no wraps.  On the right, we have eight planes but only six are used; thus the design is “wrapped” three times.

The mappings presented here followed these conventions:  all inputs come into the first plane, all outputs leave through the last plane, and the PLA outputs do not feed back into the PLA.  Allowing inputs and outputs to enter and leave at any point in the PLA allows for more optimization, and feedback is important in finite state machines, but the details of supporting these features have yet to be worked out.  To account for these conditions, any inputs that were first used by a plane later than the first plane were buffered through all the intermediate planes, and outputs from planes earlier than the last plane were likewise buffered through to the end.  Each buffer consumed an input signal and an output signal in the buffering plane.  For the finite state machines, only one loop through the machine was considered (i.e., no feedback signals).

Using more NOR planes should give better results for many designs, because each NOR plane in this context corresponds to a level of logic.  In a circuit such as an xor, there is a certain level of logic where the least number of terms are needed.  Using less levels of logic (NOR planes) results in exponential growth in the number of terms used by the circuit.  Thus, “wrapping” the design should give better results.

Selections from three types of benchmarks were mapped:  datapath examples composed of adders, xor circuits, and other basic building blocks; small finite state machines from the IWLS93 benchmark suite[5]; and PLA book examples which came with the logic minimization tool espresso[6].

Results

Design

Size

Number of Wraps

add

3

1

add

4

4

add

8

8

alu181

2

2

enable_counter

9

1

enable_counter

15

2

multiply

3x3

1

register_file

8x1

1

register_file

3x4

1

shiftreg

60

1

xor

6

1

xor

30

5

Figure 3:  Datapath results.

Refer to figure 3.  Note that for several designs (add, enable counter, register file, xor) it was possible to fit larger designs in the 60 output array if the design was wrapped multiple times.  For instance, wrapping the add design 8 times allowed an 8-bit adder to fit, whereas a 3-bit adder was the largest adder that would fit without any wrapping.

Design

Inputs

Outputs

States

tbk

6

3

32

pma

8

8

24

s1

8

6

20

ex1

9

19

20

keyb

7

2

19

s420

19

2

18

s208

11

2

18

sse

7

7

16

kirkman

12

6

16

bbase

7

7

16

cse

7

7

16

dk512

1

3

15

mark1

5

16

15

ex4

6

9

14

s386

7

7

13

bbara

4

2

10

opus

5

6

10

dk17

2

3

8

shiftreg

1

1

8

ex6

5

8

8

dk14

3

5

7

beecount

3

4

7

dk27

1

2

7

s27

4

1

6

bbtas

2

2

6

mc

3

5

4

lion

2

1

4

dk15

3

5

4

tav

4

4

4

Figure 4:  FSM results.

All the IWLS93 finite state machines5 with less than 30 states were mapped.  Multiple encodings were used, and the encoding yielding the best result for each design was kept. Refer to figure 4 for examples of the types of finite state machines that were mappable to a 60 output array.  The PLA benchmarks produced similar results for logic densities.

These results could be improved through further optimization, but they give baseline numbers for what fits in a single 60 output array.

Conclusions

The data in this report suggests that small PLA designs can be implemented on nanoscale arrays.  “Wrapping” PLA designs yields an improvement in the number of signals consumed in many designs.

Suggestions for further research

To further increase the efficiency of the mapping, more optimization methods such as Doppio-Espresso3 could be used.  Adjustment of parameters in the optimization process could yield slightly better results as well.

Since the optimal size of the nanoscale arrays is so small (60 output terms), some circuits will simply not fit on a single array.  It will be important to investigate using nanoscale PLAs as blocks in larger circuits and develop block-level placement and routing techniques.

Finally, as nanoscale technologies evolve and change new structures will become feasible that were impossible to build before.  Taking advantage of the advances in this field will be important in designing the most efficient mappings and arrays.

Methods

File formats.  All input files and benchmarks were in espresso format.  Output files were in a propriety “nand format” which is very similar to an espresso file, differing in that it has only one plane and the operation performed in the plane is nand as opposed to product or sum.  For verification and as intermediate files, blif files were generated.

Optimization and nand plane generation.  The nand planes were generated as follows.  The input was run through espresso.  Then the optimized input was run through SIS with the following commands:  “remove_latches; collapse; source script.rugged; reduce_depth -d <depth>; write_blif” where <depth> is the target depth (2 times the number of levels in the final network).  This generates a blif file, which is then parsed and broken into each individual level of logic, and then each level is written out as a nand plane.  Blif files were then generated from the output nand files and the initial input file, and MVSIS was used to verify their equivalence.



[1]. DeHon, A. Array-Based Architecture for FET-based, Nanoscale Electronics. IEEE Transactions on Nanotechnology, 2(1), 23-32 (March 2003)

[2]. DeHon, A., Lincoln, P., Savage, J. E. Stochastic Assembly of Sublithographic Nanoscale Interfaces. IEEE Transactions on Nanotechnology, 2(3), 165-174 (September 2003)

[3]. Mo, F., Brayton, R. K., Whirlpool PLAs:  A Regular Logic Structure and Their Synthesis.  IEEE/ACM International Conference on ICCAD 2002, 543-550 (2002)

[4]. Sentovich, E. M., Singh, K. J., Lavagno, L., Moon, C., Murgai, R., Saldanha, A., Savoj, H., Stephan, P., Brayton, R. K., Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis, UCB/ERL M92/41, University of California, Berkeley (May 1992)

[5]. McElvain, K., IWLS’93 Benchmark Set: Version 4.0, <http://www.cbl.ncsu.edu/pub/Benchmark_dirs/LGSynth93/doc/iwls93.ps> (May 1993)

[6]. <ftp://ic.eecs.berkeley.edu/pub/Espresso/espresso-book-examples.tar.gz>

Acknowledgements I thank André DeHon for mentorship.  I also thank Rafi Rubin and Michael Wrighton for technical assistance.

Competing interests statement The author declares that he has no competing financial interests.

Correspondence and requests for materials should be addressed to M.J.W. (e-mail:  mwilson@caltech.edu).