Priyadarsan Patra
Intel Microprocessor Technology Labs
E-mail: dpatra
at yahoo.com
Alt-Email: Priyadarsan.Patra at intel.com
_____________________________________________________________________________
Currently my team and I are investigating a novel, scalable and modular hardware and software architecture for post-silicon validation of multi/many-core based SoCs. Another emerging area of research is "Runtime Validation". My recent past research at Intel includes modeling and estimation for design exploration of on-chip interconnects, low-power design, high-performance circuit synthesis and optimization, constructive and formalized design refinement and verification, and synthesis of domino logic & register-files. My dissertation involved research on reversible and delay-insensitive computational structures with emphasis on low-power. I am a believer in social entrepreneurship: in my spare time, I lead a grass-roots organization promoting sustainable, equitable socio-economic and educational development.
Short Bio:
Dr. Priyadarsan Patra holds a B.S. in
Physics and a B.E. in Electronics and Telecommunication Engineering (Indian
Institute of Science). He obtained his M.S. in Computer and Information
Sciences from the
SELECTED TECHNICAL
PUBLICATIONS
|
S. P. Mohanty, N. Ranganathan, E. Kougianos, Priyadarsan Patra. “Low-Power High-Level Synthesis for Nanoscale CMOS Circuits”. Springer Verlag Book in preparation. Jan, 2008. K.
Chen, S. Malik and P. Patra. Runtime Validation of
Memory Ordering Using Constraint Graph Checking. Accepted
to appear in High Performance Computer Architecture, 2008. S.
Mohanty, E. Kougianos, D. Ghai and P. Patra. “Interdependency Study of Process and
Design Parameter Scaling for Power Optimization of Nano-CMOS Circuits Under
Process Variation.” IWLS 2007. K.
Chen, S. Malik and Priyadarsan Patra.
Runtime
Validation of Transactional Memory Systems. Submitted to ISQED
2008. Priyadarsan Patra. On the Cusp of a Validation
Wall. Invited Article for IEEE Design & Test Magazine. 2007. Bin Li, P.
Patra and L. Peh. Network-on-chip modeling and estimation
under process and temperature variation. Under
preparation. S. Mohanty, E.
Kougianos, and P. Patra.
“Process Variation Aware Simultaneous Leakage and Dynamic Power Minimization
during Nano-CMOS Behavioral Synthesis.” Submitted to IEEE
Trans. on CAD 2007. Priyadarsan Patra. On the Cusp of a Validation Wall. Invited Article for IEEE Design & Test Magazine. 2007 Min Pan, Priyadarsan
Patra and Chris Chu. A Novel Performance-Driven
Topology Design Algorithm. 12th Priyadarsan Patra. Scoping of and my thoughts on CAD research imperatives in the many-core regime, Intel Virtual Research Library. 2006. Priyadarsan Patra. Relative Placement as an Invariant Representation for Physical Design Refinement. Intel Virtual Research Library. 2005
|
|
Priyadarsan Patra, Charles E.
Dike, Increased frequencies and higher levels of integration for microprocessors have posed significant challenges for clock generation and distribution. Large die area causes the distribution path to be long and widely dispersed across the die. This distribution produces large latency in the clock path which is greatly impacted by variations in loading on clock lines, temperature shifts, voltage swings, cross-talk and across die process fluctuations. These combine to create large clock skew and jitter that effectively shorten the clock cycle. Our goal is to reduce the clock skew to recover some of the clock cycle timing in order to improve performance. We provide a detailed proof that the system is stable under a practical set of conditions. |
|
Kavel Buyuksahin, Priyadarsan Patra, and Farid Najm. Estima: An architectural-level power estimator for multi-ported pipelined register files. In Proc. Int’l Symposium On Low Power Electronics and Design, 2003. We introduce an architectural-level power, area, and latency estimator for multi-ported, pipelined register files. Strengths of the proposed approach include the handling of pipelined operation and clock power, the simulation-based device size estimation, and the ability to handle user-specified timing constraints. The model proposed can be used as a stand-alone estimation and design exploration tool for register files and register-file type structures, or it can be incorporated into a high-level performance simulator to add power estimation capabilities. |
|
Charles E. Dike, Unintentional clock skews between clock domains represent an increasing and costly overhead in high-performance VLSI chips. We describe a novel yet easy-to-implement design that reduce skew between local clock domains dynamically or statically by sensing clock-delay differences and then tuning the clock of each domain relative to its neighbors. Lowering local clock skew is accomplished without compromising worst-case global skew. |
|
B. Chappell, X. Wang, P. Patra, P. Saxena, J.Vendrell, S. Varadarajan, S.Gupta, W. Gomes, S. Hussain, M. Venkateshmurthy, H. Krishnamurthy, and S. Jain. A system-level solution to domino synthesis with 2 ghz application. In Proc. International Conf. Computer Design (ICCD), October 2002. System structure and the taped-out 2GHz application results are described for a domino synthesis capability covering all aspects of domino design from estimation to silicon-ready layout with custom-class optimization. The described optimization flow, abstraction modes, and key cost factors deliver power-optimized, noise-correct domino performance on complex logic. |
|
Priyadarsan Patra, Unni K. Narayanan, and Taewhan Kim. Phase assignment for synthesis of low power domino circuits. ACM Transactions on Design Automation of Electronic Systems, 2002. (Submitted for publication). |
|
Priyadarsan Patra. Power issues in ULSI circuits. In International Conference on Information Technology, December 2000. Giga-scale designs are expected
in near future, but with a concomitant host of design and process engineering
challenges. Instead of speed and density as main measures of value, power
profile and form factor are becoming major considerations in high-end
processors. |
|
Mahesh Ketkar, Sachin Sapatnekar, and Priyadarsan Patra. Convexity-based optimization for power-delay tradeoff using transistor sizing. In ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, 2000. This paper solves the delay-power optimization problem by employing accurate optimization techniques. A new class of functions, called generalized posynomials, which is a superset of the set of posynomials, is used to model the delay and the power. The mapping of these generalized posynomials to regular posynomials, which allows the use of existing posynomial solvers, is shown. We show how all constraints representing the circuit can be represented compactly in posynomial form. Finally, the results of power optimization are presented. |
|
Priyadarsan Patra. Estima: An architecture-level estimator for multi-ported multi-cycle register files. Technical report, Intel Corp., December 2002. Strategic CAD Labs technology newsletter. Register Files play increasingly significant role in the micro-architectural design of processors today. Large-signal arrays represented by register files (RF) have risen both in number and chip-area. The number of register-files is nearly tripling from the most recent Intel processor to the next. Moreover, it is no longer sufficient to explore the design space with the sole purpose of high performance alone: decisions made at the architectural level have the potential of biggest impact on the power consumption of the final chip. Thus, It is very important to expose the architects to parameterizable, fast and relatively accurate estimation of power, performance and floorplan attributes of uarch blocks early on. |
|
Priyadarsan Patra and Xinning Wang. Overview of DCAL logic and timed synthesis. Technical report, Intel Corp., December 2001. Strategic CAD Labs technology newsletter. Dynamic logic circuit families have long been employed in high performance microprocessor and other commercially important semiconductor products as a solution for logic blocks needing more speed and functionality than found with standard CMOS circuits. Although standard CMOS gates have been and no doubt will continue for the foreseeable future to be the most widely used logic family, a more high speed and high function circuit family used on some of the logic can have important leverage. Among the many blocks in a complex pipelined logic system, perfectly equalizing the amount of logic and drive development required in a time period of interest is nearly impossible. If the most aggressive standard CMOS design is used on very many paths, then there is increasing probability that some paths, during convergence or due to ECO, will overflow the CMOS capability. |
|
B. Chappell, X. Wang, P. Patra, J. Vendrell, S. Rangavirsan, S. Otto, and E. Zahavi. DCAL domino design techniques for efficient synthesis of correct, high speed control logic. In Design and Test Technology Conference (Intel Corp.), 1997. In order to improve designer productivity and design convergence schedules, we present a serious alternative to custom domino design in the form of a suite of correct-by-construction techniques, specialized circuits, library collaterals, and logical/physical partitioning and optimization algorithms. As currently no relevant vendor tools even exist, we propose to build a unified and effective prototype synthesis system embodying these techniques and to research their applicability to the next generation processor designs. |
|
Priyadarsan Patra and Unni K. Narayanan. Automated phase assignment for the synthesis of low power domino circuits. In Proc. ACM/IEEE Design Automation Conference, June 1999. The advent of portable digital devices such as laptop computers and cellular phones has made low power circuit design an increasingly important research area [?]. For example, laptop computers have a limited battery life, and so the circuitry in the computer must be designed to dissipate as little power as possible without sacrificing performance in terms of speed. Furthermore, simultaneous low power and high performance designs are needed beyond the realm of the microprocessors. For example, ASICs in computer chipsets or cellular phones must also approach microprocessor-like frequency targets, but are constrained by even tighter power budgets [?]. The problem, of course, is that the objectives of low power and high performance are often contradictory. Consider, for example, the use of domino or dynamic logic which is a necessity in high speed designs. |
|
Priyadarsan Patra, Unni K. Narayanan, and Taewhan Kim. Phase assignment for synthesis of low power domino circuits. In Electronics Letters, June 2001. |
|
Priyadarsan Patra, Stanislav Polonsky, and Donald S. Fussell. Delay insensitive logic for RSFQ superconductor technology. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, April 1997. It is reasonable to project that continuing progress in micro-electronics will lead to computing systems based on circuit elements with switching times on the order of a few picoseconds. Such speeds are likely beyond the capabilities of CMOS. One promising technology is Rapid Single Flux Quantum (RSFQ) circuits based on super-conducting Josephson junction devices. However, for such high-speed technologies, clock skew even over short distances can make synchronous circuit design prohibitively difficult. We have introduced a variant of Delay Insensitive (DI) asynchronous logic called Conservative Delay Insensitive (CDI) logic which has particularly nice properties for use in RSFQ technology. Not only does it solve high-speed clocking problems, but its primitive elements appear to be more efficiently implementable in RSFQ technology than are traditional Boolean logic primitives and its property of minimizing the creation and destruction of signal pulses avoids some difficult implementation issues in RSFQ. |
|
Priyadarsan Patra and Donald S. Fussell. Efficient delay-insensitive RSFQ circuits. In Proc. International Conf. Computer Design (ICCD), October 1996. It is reasonable to project that continuing progress in micro-electronics will lead to computing systems based on circuit elements with switching times on the order of a few picoseconds. Such speeds are likely beyond the capabilities of CMOS. One promising technology is Rapid Single Flux Quantum (RSFQ) circuits based on super-conducting Josephson junction devices. However, for such high-speed technologies, clock skew even over short distances can make synchronous circuit design prohibitively difficult. We have introduced a variant of Delay Insensitive (DI) asynchronous logic called Conservative Delay Insensitive (CDI) logic which has particularly nice properties for use in RSFQ technology. Not only does it solve high-speed clocking problems, but its primitive elements appear to be more efficiently implementable in RSFQ technology than are traditional Boolean logic primitives and its property of minimizing the creation and destruction of signal pulses avoids some difficult implementation issues in RSFQ. |
|
Priyadarsan Patra and Donald S.
Fussell. Power-efficient delay-insensitive codes for
data transmission. In Proc. of 28th We have introduced and formalized the notion of dynamic delay-insensitive codes for data communication. We describe several codes and protocols designed to optimize switching energy expended at the data pins during data transmission in asynchronous systems. These include adaptations of some existing communication methods as well as some new techniques for reducing energy used in dynamic data communication between delay-insensitive circuits. |
|
Priyadarsan Patra and Donald S. Fussell. Conservative delay-insensitive circuits. In Workshop on Physics and Computation, pages 248-259, November 1996. Asynchronous circuit elements are quiescent whenever they are not actually performing a computation, and thus, they potentially waste less power than synchronous circuits. However, previous research on asymptotically non-dissipative computation has concentrated exclusively on synchronous computing models while researchers on asynchronous circuits have ignored the issues of conservative, reversible computing inherent in ultra low power-systems. We show that delay insensitive asynchronous systems can be made asymptotically non-dissipating. Our construction achieves this without the need for explicit “uncomputation” of results that has characterized previous synchronous approaches. |
|
Priyadarsan Patra and Donald S. Fussell. On efficient adiabatic design of mos circuits. In Workshop on Physics and Computation, pages 260-269, November 1996. An extended research report is also available (adia-long.pdf). Power dissipation has become one of the most significant limits to increases in the density and spec CMOS circuits. As a result, increasing attention is being devoted to unconventional CMOS circuit designs which dissipate far less power. One promising class of methods is based on the so-called adiabatic model operation of CMOS circuits. These methods attempt to recover as much as possible the power supplied circuit during its operation. However, current adiabatic methods ignore or fail to recover energies supplied to internal capacitance. Moreover, the penalty of function reversibility required in these techniques very substantial in terms of area. We propose simple and efficient adiabatic circuit design schemes with address both of these problems. Furthermore, we demonstrate a technique for adiabatic storage of value and finally, show that suitable design trade-offs can make adiabatic design of datapath logic, such as adder, more attractive. |
|
Priyadarsan Patra. Approaches
to Design of Circuits for Low-Power Computation. PhD thesis, The University of Texas at Approaches to Design of Circuits for Low-power Computation Great advances in technology over the last few decades have led to a convergence of computing and communications, and to quantum leaps in the capabilities of the resulting information hardware. Until recently, these improvements have been largely based on ever decreasing sizes of the features that can be fabricated on a silicon substrate. However, further improvements are now faced with a new limiting factor - power consumption. Both energy supply and its dissipation pose serious problems to realizing ultra-dense, ultra-fast machines of the future. Decreasing power dissipation in both static and active operations of electronic systems depends on synergistic advancements in the development of materials technology, device technology, circuit architecture, and overall power management strategies. One architectural approach to power conservation in digital systems involves selectively clocking portions of a circuit, so that they consume power only when they are being used. While this technique is currently employed in ad hoc ways in modern synchronous processors, delay-insensitive (DI) circuits provide a natural way to achieve these benefits automatically. This dissertation develops a theory of completeness and minimality of sets of primitives with respect to a large class of DI circuits. We design many useful DI modules with efficient and novel decompositions that minimize the number of internal switching events - which translate to lower energy consumption - and/or increase circuit throughput. Switch-level designs of some DI primitives are demonstrated. Furthermore, we develop a notion of dynamic, delay-insensitive data transmission and present various protocols to reduce energy usage in such communications. In this dissertation, we study synchronous as well as self-timed circuits under the general rubric of ``low-power computation.'' We introduce a theory of conservative and delay-insensitive computing as the basis for a design approach to ultra low-power computation in circuits where ``destruction of information'' - which fundamentally involves energy dissipation - is minimized. Moreover, we argue that these DI circuits will likely be competitive in size, if not distinctly superior, when implemented on an event-based technology such as the Charge Coupled Device, or Superconducting Single Flux Quantum devices. Finally, we investigate area-efficient design techniques for clocked, adiabatic circuits and identify some sources of energy dissipation that are shown to be avoidable. |
|
Priyadarsan Patra and Donald S. Fussell. Fully asynchronous, robust, high-throughput arithmetic structures. In International Conference on VLSI Design. IEEE Computer Society, January 1995. This paper presents novel circuit designs for bit serial adders and multipliers. The circuits are fully delay-insensitive and provide excellent reliability and speed, while verification remains simple. The structures are shown to be very flexible (handles inputs of arbitrary lengths) while being optimal in speed, and asymptotically optimal in area. The presented structure can also be adapted to circuits using other asynchronous techniques while trading off absolute delay-insensitivity for area and some speed gains. The scalability of these circuits makes them very attractive for applications such as RSA cryptosystems, which need very large operands and fast multiplication. We have explored data circuit designs using fully asynchronous components - an area that has not been well explored. The circuits are built out of a very small set of basic primitives with the concomitant benefits. Finally, we provide efficient transistor implementations and HSPICE results of several primitives used here. |
|
Priyadarsan Patra and Donald S. Fussell. Efficient building blocks for delay insensitive circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 196-205, November 1994. We introduce a set of primitive elements for delay-insensitive (DI) circuit design. This set is shown to be universal and minimal, that is, any DI circuit can be constructed using only these primitives, and no proper subset of them is sufficient for constructing all such circuits. A few open questions from Keller's 1974 are resolved as well. Methods for constructing some important DI circuits in the literature such as an arbitrary-sized Join module are shown. We also give area-efficient, fast, and robust switch-level implementations of key primitives. |
|
M. B. Josephs, Priyadarsan Patra, and J. Yantchev. Converting I2C symbols into handshake symbols, Sept. 1992. Technical Note to ESPRIT 6143 - EXACT. This short paper addresses the problem of converting I^2C
symbols into handshake symbols. This problem arises in the design of hardware
for interfacing to the I^2C bus which is driven by a
source of synchrony (a clock). The solution enables asynchronous hardware
(suitable for reduction in power consumption) to interface with such a bus
available from |
Technical Reports
|
Priyadarsan Patra and Donald S.
Fussell. Optimization of delay-insensitive circuits
- a case study. Technical report, Dept. of Computer
Sciences, The Univ. of Texas at We explore ways of constructing efficient delay-insensitive (DI) networks built from a set of primitive DI elements by means of a case study. Several approaches that improve on recent designs of a modulo-N counter in the literature are illustrated. We obtain low constant latency, low constant response time, constant power consumption and optimal area-complexity designs for this circuit. For moderately large N, the area complexity compares well even with standard designs under synchronous (clocked) discipline. Many of these efficiencies derive from the exploitation of the powerful property of timing-independent composability of DI circuits. |
|
Mark B. Josephs and Priyadarsan Patra.
An asynchronous bit-serial adder and its delay-insensitive
decomposition. Technical report, Oxford University
Computing Lab., This paper takes an adder that is intended to handle numbers of arbitrary sizes as the basis for a small case study in delay-insensitive design for bit-serial data processing. The design of the adder proceeds through two levels of decomposition, down to the level of standard asynchronous components, namely, Decision-Wait elements and XOR-gates. The paper also serves to demonstrate (i) the use of Decision-Wait elements to route and compute data as well as control signals, and (ii) DI Algebra as a tool for specification and verification. |
|
Priyadarsan Patra. From parallel
programs to asynchronous VLSI. Technical report,
Dept. of Computer Sciences, This paper motivates and discusses the issues involved in design of asynchronous circuits. Starting from a conventional engineering approach, several extant formal methodologies for design, verification and analysis of asynchronous circuits (data-flow networks) are presented and their salient points explored. Relationships among competing theories are drawn. A preliminary approach to map UNITY programs into asynchronous logic circuits is attempted. |
|
T. Hsu, P. Patra, and H. Yang.
Parallel graph algorithms. Technical
report, Dept. of Computer Sciences, The Univ. of Texas at We survey the area of parallel algorithms for graph problems. We also give a novel, efficient parallel algorithm under PRAM model for solving the ``Point Classification'' problem on Binary Space Partitioned (BSP) Trees - an important problem for 3-dimensional Computer Graphics. |
|
Priyadarsan Patra and Donald S.
Fussell. Building blocks for designing DI circuits.
Technical report tr93-23, Dept. of Computer Sciences, The
Univ. of Texas at We introduce a set of primitive elements for delay-insensitive (DI) circuit design. This set is shown to be universal and minimal, that is, any DI circuit can be constructed using only these primitives, and no proper subset of them is sufficient for constructing all such circuits. A few open questions from Keller's 1974 are resolved as well. Methods for constructing some important DI circuits in the literature such as an arbitrary-sized Join module are shown. We also give area-efficient, fast, and robust switch-level implementations of key primitives. |
Original papers of some of the following technical publications are here.
Selected Patents &
Inventions:
|
Priyadarsan Patra. Spatial Curvature Techniques for Multiobjective Routing Optimization. Patent Pending, 2006. |
|
Priyadarsan Patra. Technique for High Fidelity and Efficient Random Number Generation. Trade secret. |
|
Priyadarsan Patra. A method to reduce network costs and its application to domino circuits. US Patent# 6529861. |
|
Priyadarsan Patra and Unni K. Narayanan. Phase optimization for low power domino circuits. US Patent # 6556962. |
|
Priyadarsan Patra and Barbara Chappell. Timed synthesis for power optimization of high performance circuits. US Patent# 6721924. |
|
Priyadarsan Patra. Smart Checker ALU for Dynamically Validating Architectures. Invention Disclosed. |
Selected Invited Presentations/Articles On Social Development Issues
|
Priyadarsan
Patra. Sustainable
Economic and Educational Development Society. Catalyst For Human Development – A
platform for People, Projects and Progress, 2007. Publisher
address: 208 Parkway Dr., |
|
Priyadarsan
Patra. Education
and Technology as a tool for sustainable and equitable development. |
|
Priyadarsan
Patra. Computing
for the Masses. Kalinga Institute
of Industrial Technology, |
|
Priyadarsan
Patra. On
Tertiary Education In Orissa.
(The IISER fiasco in Orissa: don’t let it be a
missed opportunity.)
Souvenir of the Orissa Society of |
Industrial Liaison/Mentor/Reviewer
1998--
Member of High Performance Experts Group (Intel) and
SRC/GSRC/DSTC projects
2000 –
Industrial advisor, Center for Low Power Electronics,
2003--
Dissertation Committee Member,
2005--07 GSRC, C2S2, SRC consortia, NSF
TPC and Organizing Committee Member and/or Chair: DAC, ICIT, ISQED, ICCD, ICCAD, VLSIDesign
Copy Rights: Priyadarsan Patra