Exploring early and late ALUs for single-issue in-order pipelines
Paper i proceeding, 2015

In-order processors are key components in energy-efficient embedded systems. One important design aspect of in-order pipelines is the sequence of pipeline stages: First, the position of the execute stage, in which arithmetic logic unit (ALU) operations and branch prediction are handled, impacts the number of stall cycles that are caused by data dependencies between data memory instructions and their consuming instructions and by address generation instructions that depend on an ALU result. Second, the position of the ALU inside the pipeline impacts the branch penalty. This paper considers the question on how to best make use of ALU resources inside a single-issue in-order pipeline. We begin by analyzing which is the most efficient way of placing a single ALU in an in-order pipeline. We then go on to evaluate what is the most efficient way to make use of two ALUs, one early and one late ALU, which is a technique that has revitalized commercial in-order processors in recent years. Our architectural simulations, which are based on 20 MiBench and 7 SPEC2000 integer benchmarks and a 65-nm postlayout netlist of a complete pipeline, show that utilizing two ALUs in different stages of the pipeline gives better performance and energy efficiency than any other pipeline configuration with a single ALU.

Energy efficient

Pipeline configuration


Arithmetic logic unit

Architectural simulation

Data dependencies

Different stages

Branch prediction

Embedded systems

Energy efficiency Address generation


Alen Bardizbanyan

Chalmers, Data- och informationsteknik, Datorteknik

Per Larsson-Edefors

Chalmers, Data- och informationsteknik, Datorteknik

Proceedings of the 33rd IEEE International Conference on Computer Design, ICCD 2015. New York City, United States, 18-21 October 2015

543-548 7357163


Informations- och kommunikationsteknik



Annan elektroteknik och elektronik