III. Research Agenda
Our goal is to create a computing environment that supports application specific abstractions. This is only feasible if two conditions are satisfied: since each application specific processor is unique, there must be enough area on a die to support many different such processors; and since the non-recurring engineering costs for ASIC development are extremely high (at over $40Million per modern state of the art ASICs [4]) the development or non-recurring engineering (NRE) costs must be drastically reduced. We observe that the overabundance of transistors can help satisfy the first requirement and automation can help with the second. We present three major research directions that together can make application specific abstractions a reality throughout the modern computing stack.
Direction 1 – Informed Lower Level Stacks: Given the depth of modern computing stacks (Figure 2), having ASAs imply all abstraction layers are specially defined for the single application at the top of the stack.
Instead of designing processors that are specialized for individual applications, Turakhia et al. proposed an architectural synthesis framework named HaDeS that uses benchmarks to model the expected runtime behavior of various application types and then use the model to algorithmically determine the optimal allocation of cores in a heterogeneous chipmulti-processor [7]. The only requirement is that a library of general purpose cores that can process all of the benchmark applications, albeit with different performance characteristics, is already available. Thus, they can make use of the overabundance of transistors, but NRE costs are low as long as the library of cores already exists.
The library could be automatically generated using the Genesis 2 chip generator [4] where the benchmark program models are used as constraints to Genesis 2. Since Genesis 2 embodies the experience and expertise of chip designers during processor template creation, it is not only possible to generate a library of cores, but a library of cores that are optimized for each application in the benchmark suite or for each desired computational kernel feature. In Genesis 2, the only major NRE costs are in the development of the processor template – all processor instantiations and supporting software are automatically generated and therefore incur negligible additional costs – and verification costs.
Instead of designing processors for classes of applications as represented by the benchmarks, Goulding-Hotta et al. designed a processor, call GreenDroid, that is optimized for certain popular functions in the Android software stack [6]. They first analyzed Android framework software to identify heavily used components such as the Davlvik Virtual Machine and web browsing related libraries. Once the hot code blocks are identified, they then automatically synthesized conservation cores or c-cores that efficiently implemented a portion of the hot components (i.e., a subgraph of the control flow graph) in hardware. Each c-core is then associated with a special instruction and the original Android software patched to take advantage of these application specific c-cores. The rest of the software executes normally on the general purpose core. GreenDroid is both automated and makes use of the extra transistors. The only drawback is that it requires the applications to have already been implemented.
Observation: Overall, researchers are making strides toward a future where the design of processors are informed directly by specific instances of expected applications, however there are still limitations. The processor designs are still only application class specific instead of application specific. Moreover, the work presented above is focused on performance which is much easier to model, quantify and compare than security. Thus, future research in this direction must not only seek to reduce NRE costs such as verification, it must also be capable of incorporating more complicated features, constraints and requirements such as security and reliability.
Direction 2 – Configurable Abstraction Layers: As in the case with compiler optimization levels affecting whether unstable code is generated, abstraction layers that are inherently generic could present knobs for individual applications to customize. Secure Computing (SECCOMP [8]) is a representative example. SECCOMP is a Linux kernel feature where applications can specify a set of system call filters. These filters can be used to prevent the invocation of system calls based on the system call number and parameters. This can be used to effectively remove unnecessary functionality and also reduce attack surfaces.
Instead of starting with a generic abstraction and then blacklisting unwanted functionality, it is also possible to create an abstraction template that is instantiated in accordance to the user’s needs. Software Fault Isolation (SFI [9]) and Control Flow Integrity(CFI [10]) are examples of new abstraction concepts that uses program analysis techniques to implement protection mechanisms that are unique to each application.
SFI is a software only technique that is used to partition a single process’s memory space into multiple ranges. Protection instructions are then automatically inserted around memory accesses so as to enforce the partitioning scheme. CFI is the control flow counterpart to SFI that ensures all control flow transfers (e.g., branches and function calls) are to expected and allowed locations determined through the automated analysis of the application itself. While there are some important technical limitations to the techniques, such as indirect branches, they demonstrate that templates can be used to synthesize abstraction layers as well as processor designs as in Genesis 2.
Observation: It can be seen that filtering can be applied to other abstractions and these software based techniques could be incorporated directly into future hardware designs. The question that remains is given a set of techniques, which ones should be implemented in hardware and which in software. In a world with an over-abundance of transistors, a related question is what roles can horizontal abstraction layers play?
Direction 3 – Hardware/Software Co-Design, Co- Synthesis and Co-Verification: Hardware/software codesign, synthesis and verification has been applied in embedded systems for decades and is a good embodiment of our overall goals. Recent work has demonstrated that it is feasible to specify an application’s behavior in a single language and have tools automatically partition the behavioral description into hardware and software components as well as automatically synthesize them. The Bluespec Codesign Language (BCL) is such a language [11], [12].
In BCL, a programmer specifies an application’s behavior in Bluespec System Verilog (BSV) extended with Guarded Atomic Actions that controls whether the state of the program is updated at runtime. These annotations help the BCL compiler determine which portions of the behavioral specifications must be sequenced and which can be executed in parallel.
A programmer also annotates the specification with knowledge on where the natural communications boundaries between the application’s modules are. The BCL toolchain can then determine which modules are suitable for hardware versus software implementation based on predictive performance models and proceeds to automatically generate C++ code for the software modules, BSV for the hardware modules, and all the glue code necessary to schedule the components and tie them together.
In addition to generating the hardware and software artifacts, the BCL toolchain also generates simulators that can be used to debug the final design. This further helps reduce development costs.
Observation: BCL can be seen as the basis of future ASA toolchains as long as two problems are addressed. The BCL language is at the Register Transfer Level which will need to be abstracted again through translation lest it will not be suitable for developing high level applications. It has yet to be shown if the same techniques scale beyond embedded systems where the computing stacks are deeper and software applications are more numerous.
IV. Summary
In summary, we presented some background information on the problems of generic abstractions and argue that application specific abstractions are necessary to attain better better security, reliability and performance guarantees. We argue that since there is an over-abundance of transistors, it will be feasible to host many specialized processing cores (up to one per application) on a single die. We then presented related work in three research directions that contribute to the end goal of application specific abstractions: Informed Lower Level Stacks, Configurable Abstraction Layers, and Hardware/Software co-design, synthesis and verification.
References
[1] J. B. Dennis, “The design and construction of software systems,” in Software Engineering, An Advanced Course, Reprint of the First Edition [February 21 – March 3, 1972]. London, UK, UK: Springer-Verlag, 1975.
[2] J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley, The Java Language Specification, Java SE7 Edition. Oracle, 2013, ch. 10: Arrays. [Online]. Available: https://docs.oracle.com/javase/ specs/jls/se7/html/index.html
[3] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama, “Towards optimization-safe systems: Analyzing the impact of undefined behavior,” in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ser. SOSP ’13. New York, NY, USA: ACM, 2013.
[4] O. Shacham, “Chip multiprocessor generator: Automatic generation of custom and heterogeneous compute platforms,” Ph.D. dissertation, Standford University, 2011.
[5] B. Raghunathan and S. Garg, “Job arrival rate aware scheduling for asymmetric multi-core servers in the dark silicon era,” in Proceedings of the 2014 International Conference on Hardware/ Software Codesign and System Synthesis, ser. CODES ’14. New York, NY, USA: ACM, 2014.
[6] N. Goulding-Hotta, J. Sampson, G. Venkatesh, S. Garcia, J. Auricchio, P.-C. Huang, M. Arora, S. Nath, V. Bhatt, J. Babb, S. Swanson, and M. Taylor, “The greendroid mobile application processor: An architecture for silicon’s dark future,” IEEE Micro, vol. 31, no. 2, Mar. 2011.
[7] Y. Turakhia, B. Raghunathan, S. Garg, and D. Marculescu, “Hades: Architectural synthesis for heterogeneous dark silicon chip multiprocessors,” in Design Automation Conference (DAC), 2013 50th ACM/EDAC/IEEE, May 2013.
[8] “SECure COMPuting with filters.” [Online].
[9] R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham, “Efficient software-based fault isolation,” in Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, ser. SOSP ’93. New York, NY, USA: ACM, 1993.
[10] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow integrity,” in Proceedings of the 12th ACM Conference on Computer and Communications Security, ser. CCS ’05. New York, NY, USA: ACM, 2005.
[11] N. Dave, “A unified model for hardware/software codesign,” Ph.D. dissertation, Massachusetts Institute of Technology, 2011.
[12] M. King, “A methodology for hardware-software codesign,” Ph.D. dissertation, Massachusetts Institute of Technology, 2011.
RELEASE STATEMENT
Approved for Public Release; Distribution Unlimited : 88ABW-2015-4644 20150929