Per Stenström

Full Professor (L2) at Computer Engineering (Chalmers)

More information and contact details

Publications (198)
Projects (20)

Showing 198 publications

2026

CrossFetch: A Prefetching Scheme for Cross-Page Prefetching in the Physical Address Space

Qi Shao, Per Stenström

IEEE Computer Architecture Letters. Vol. 25 (1), p. 1-4

Journal article

2026

BATCH-DNN: Adaptive and Dynamic Batching for Multi-DNN Accelerators

Piyumal Ranawaka, Per Stenström

Lecture Notes in Computer Science. Vol. 15901 LNCS, p. 103-117

Paper in proceeding

2025

ASaP: Automatic Software Prefetching for Sparse Tensor Computations in MLIR

Konstantinos Ioannis Sotiropoulos Pesiridis, Jonas Skeppstedt, Per Stenström

Proceedings of 2025 Workshops of the International Conference on High Performance Computing Network Storage and Analysis Sc 2025 Workshops, p. 1017-1027

Paper in proceeding

2024

HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory

Qi Shao, Angelos Arelakis, Per Stenström

Proceedings of the International Conference on Supercomputing, p. 74-84

Paper in proceeding

2024

DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators

Piyumal Ranawaka, Muhammad Waqar Azhar, Per Stenström

Proceedings of the 21st ACM International Conference on Computing Frontiers, CF 2024, p. 126-137

Paper in proceeding

2023

eProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem

Lluc Alvarez, Abraham Ruiz, Arnau Bigas-Soldevilla et al

Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023, p. 309-314

Paper in proceeding

2023

Approx-RM: Reducing Energy on Heterogeneous Multicore processors under Accuracy and Timing Constraints

Muhammad Waqar Azhar, Madhavan Manivannan, Per Stenström

Transactions on Architecture and Code Optimization. Vol. 20 (3)

Journal article

Show project

2023

SoK: Analysis of Root Causes and Defense Strategies for Attacks on Microarchitectural Optimizations

Nadja Holtryd, Madhavan Manivannan, Per Stenström

Proceedings - 8th IEEE European Symposium on Security and Privacy, Euro S and P 2023, p. 631-650

Paper in proceeding

Show project

2023

SCALE: Secure and Scalable Cache Partitioning

Nadja Holtryd, Madhavan Manivannan, Per Stenström

Proceedings of the 2023 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2023, p. 68-79

Paper in proceeding

2022

GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases

Alexandra Angerd, Angelos Arelakis, Vasilis Spiliopoulos et al

Proceedings - International Symposium on High-Performance Computer Architecture. Vol. 2022-April, p. 1115-1127

Paper in proceeding

Show project

2022

Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack between QoS-Constrained Applications

Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas et al

Transactions on Architecture and Code Optimization. Vol. 19 (2)

Journal article

2022

Bounding the execution time of parallel applications on unrelated multiprocessors

Petros Voudouris, Per Stenström, Risat Pathan

Real-Time Systems. Vol. 58 (2), p. 189-232

Journal article

Show project

2022

Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints

Muhammad Waqar Azhar, Miquel Pericas, Per Stenström

Transactions on Architecture and Code Optimization. Vol. 19 (1)

Journal article

Show project

2021

CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling

Nadja Holtryd, Madhavan Manivannan, Per Stenström et al

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. Vol. 2021-September, p. 213-225

Paper in proceeding

2021

Federated Scheduling of Sporadic DAGs on Unrelated Multiprocessors

Petros Voudouris, Per Stenström, Risat Pathan

Transactions on Embedded Computing Systems. Vol. 20 (5s)

Journal article

Show project

2020

Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systems

Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas et al

Journal of Parallel and Distributed Computing. Vol. 144, p. 246-259

Journal article

2020

Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints

Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas et al

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020, p. 590-601

Paper in proceeding

2020

DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors

Nadja Holtryd, Madhavan Manivannan, Per Stenström et al

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020, p. 578-589

Paper in proceeding

Show project

2020

A GPU Register File using Static Data Compression

Alexandra Angerd, Erik Sintorn, Per Stenström

ACM International Conference Proceeding Series

Paper in proceeding

Show project

2019

QoS-driven coordinated management of resources to save energy in multi-core systems

Mehrzad Nejat, Miquel Pericas, Per Stenström

Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019, p. 303-313

Paper in proceeding

Show project

2019

Trends on heterogeneous and innovative hardware and software systems

Alba Melo, Jesus Carretero, Per Stenström et al

Journal of Parallel and Distributed Computing. Vol. 133, p. 362-364

Other text in scientific journal

2019

SaC: Exploiting execution-time slack to save energy in heterogeneous multicore systems

Muhammad Waqar Azhar, Miquel Pericas, Per Stenström

ACM International Conference Proceeding Series

Paper in proceeding

2018

Global dead-block management for task-parallel programs

Madhavan Manivannan, Miquel Pericas, Vasileios Papaefstathiou et al

Transactions on Architecture and Code Optimization. Vol. 15 (3)

Journal article

2018

Scheduling parallel real-time recurrent tasks on multicore platforms

Risat Pathan, Petros Voudouris, Per Stenström

IEEE Transactions on Parallel and Distributed Systems. Vol. 29 (4), p. 915-928

Journal article

2018

ProFess: A Probabilistic Hybrid Main Memory Management Framework for High Performance and Fairness

Dmitry Knyaginin, Vasileios Papaefstathiou, Per Stenström

Proceedings - International Symposium on High-Performance Computer Architecture. Vol. 2018-February, p. 143-155

Paper in proceeding

2017

Runtime-Assisted Global Cache Management for Task-based Parallel Programs

Madhavan Manivannan, Miquel Pericas, Vasileios Papaefstathiou et al

IEEE Computer Architecture Letters. Vol. 16 (2), p. 145-148

Journal article

Show project

2017

Rock: A framework for pruning the design space of hybrid main memory systems

Dmitry Knyaginin, Per Stenström

ACM International Conference Proceeding Series. Vol. Part F131197, p. 337-347

Paper in proceeding

2017

SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures

Muhammad Waqar Azhar, Per Stenström, Vasileios Papaefstathiou

Transactions on Architecture and Code Optimization. Vol. 14 (4), p. Article No. 41-

Journal article

2017

Timing-anomaly free dynamic scheduling of task-based parallel applications

Petros Voudouris, Per Stenström, Risat Pathan

Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, p. 365-376

Paper in proceeding

2017

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Alexandra Angerd, Erik Sintorn, Per Stenström

Transactions on Architecture and Code Optimization. Vol. 14 (4)

Journal article

2016

PATer: A Hardware Prefetching Automatic Tuner on IBM POWER8 Processor

Minghua Li, Guancheng Chen, Qijun Wang et al

IEEE Computer Architecture Letters. Vol. 15 (1), p. 37-40

Journal article

2016

RADAR: Runtime-assisted dead region management for last-level caches

Madhavan Manivannan, Vasileios Papaefstathiou, Miquel Pericas et al

Proceedings - International Symposium on High-Performance Computer Architecture. Vol. 2016-April, p. 644-656

Paper in proceeding

2016

A Cache System and a Method of Operating a Cache

Angelos Arelakis, Per Stenström

Patent

2016

A Safe and Tight Estimation of the Worst-Case Execution Time of Dynamically Scheduled Parallel Applications

Petros Voudouris, Risat Pathan, Per Stenström

Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2016), p. 6-

Other conference contribution

2016

Adaptive row addressing for cost-efficient parallel memory protocols in large-capacity memories

Dmitry Knyaginin, Vasileios Papaefstathiou, Per Stenström

MEMSYS 2016: International Symposium on Memory Systems. Vol. 03-06-October-2016, p. 121-132

Paper in proceeding

2016

Timing-anomaly free dynamic scheduling of task-based parallel applications

Petros Voudouris, Per Stenström, Risat Pathan

Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, (RTAS 2017). Pittsburgh, PA, APR 18-21, 2017. Vol. 0, p. 365-376

Paper in proceeding

2016

RADAR: Run-time assisted Dead-Region Management for Last-Level Caches

Madhavan Manivannan, Miquel Pericas, Vasileios Papaefstathiou et al

IEEE International Symposium on High Performance Computer Architecture, p. 11-

Paper in proceeding

2016

ProF: Probabilistic Hybrid Main Memory Management for High Performance and Fairness

Dmitry Knyaginin, Per Stenström, Vasileios Papaefstathiou

Report

2016

EUROSERVER: Share-anything scale-out micro-server design

Manolis Marazakis, John Goodacre, Didier Fuin et al

19th Design, Automation and Test in Europe Conference and Exhibition, DATE 2016, Dresden, Germany, 14-18 March 2016, p. 678-683

Paper in proceeding

2016

A Case for Runtime-Assisted Global Cache Management

Madhavan Manivannan, Miquel Pericas, Vasileios Papaefstathiou et al

Report

2015

A Primer on Compression in the Memory Hierarchy

Somayeh Sardashti, Angelos Arelakis, Per Stenström et al

Book

2015

Enhancing Garbage Collection Synchronization using Explicit Bit Barriers

Jochen Hollmann, Ruben Titos Gil, Per Stenström

Proceedings of the International Conference on Parallel Processing. Vol. 2015-December, p. 769 - 778

Paper in proceeding

2015

RADAR: Runtime-Assisted Dead Region Management for Last-Level Caches

Madhavan Manivannan, Vasileios Papaefstathiou, Miquel Pericas et al

Report

2015

Performance Impact of Batching Web Application Requests using Hot-spot Processing on GPUs

Tobias Fjälling, Per Stenström

29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, 25-29 May 2015, p. 989-999

Paper in proceeding

2015

HyComp: A Hybrid Cache Compression Method for Selection of Data-Type-Specific Compression Methods

Angelos Arelakis, Fredrik Dahlgren, Per Stenström

Proceedings of the Annual International Symposium on Microarchitecture, MICRO. Vol. 05-09-December-2015, p. 38-49

Paper in proceeding

2014

ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory

Ruben Titos Gil, Anurag Negi, M. E. Acacio et al

IEEE Transactions on Parallel and Distributed Systems. Vol. 25 (5), p. 1359-1369

Journal article

2014

Characterizing and Exploiting Small-Value Memory Instructions

Mafijul Islam, Per Stenström

IEEE Transactions on Computers. Vol. 63 (7), p. 1640-1655

Journal article

2014

What's Next

Yves Robert, Viktor Prasanna, Per Stenström

Edited book

2014

Introduction to the JPDC special issue on Perspectives on Parallel and Distributed Processing

V. K. Prasanna, Y. Robert, Per Stenström

Journal of Parallel and Distributed Computing. Vol. 74 (7), p. 2543-2543

Other text in scientific journal

2014

Performance and energy analysis of the restricted transactional memory implementation on haswell

Bhavishya Goel, Ruben Titos Gil, Anurag Negi et al

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, p. 615-624

Paper in proceeding

2014

Crystal: A design-time resource partitioning method for hybrid main memory

Dmitry Knyaginin, Georgi Gaydadjiev, Per Stenström

Proceedings of the International Conference on Parallel Processing. Vol. 2014-November (November), p. 90-100

Paper in proceeding

2014

A Design-Time Resource Partitioning Method for Hybrid Main Memory

Dmitry Knyaginin, Georgi Gaydadjiev, Per Stenström

REPRODUCE 2014: Workshop on Reproducible Research Methodologies

Paper in proceeding

2014

A Case for a Value-Aware Cache

Angelos Arelakis, Per Stenström

IEEE Computer Architecture Letters. Vol. 13 (1), p. 1-4

Journal article

2014

Temporal Partitioning on Multicore Platform

Risat Pathan, Feysal Hadji Hashi, Per Stenström et al

European Space Agency, (Special Publication) ESA SP. Vol. SP 725

Paper in proceeding

2014

Proceedings of the 2014 ACM International Conference on Supercomputing

Arndt Bode, Michael Gerndt, Per Stenström

Edited book

2014

Removal of Conflicts in Hardware Transactional Memory Systems

Mridha Mohammad Waliullah, Per Stenström

International Journal of Parallel Programming. Vol. 42 (1), p. 198-218

Journal article

2014

Overhead-Aware Temporal Partitioning on Multicore Processors

Risat Pathan, Per Stenström, Lars-Göran Green et al

Real-Time Technology and Applications - Proceedings. Vol. 2014-October (October), p. 251-262

Paper in proceeding

2014

SC2: A statistical compression cache scheme

Angelos Arelakis, Per Stenström

Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA, p. 145-156

Paper in proceeding

2014

Effective Resource Management Towards Efficient Computing

Per Stenström

Design, Automation and Test in Europe Conference and Exhibition (DATE), Dresden, GERMANY, MAR 24-28, 2014

Paper in proceeding

2014

Runtime-guided cache coherence optimizations in multi-core architectures

Madhavan Manivannan, Per Stenström

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, p. 625-636

Paper in proceeding

2013

Efficient Forwarding of Producer-Consumer Data in Task-based Programs

Madhavan Manivannan, Anurag Negi, Per Stenström

Report

2013

Towards automatic resource management in parallel architectures.

Per Stenström

IEEE Parallel Architectures and Compilation Techniques

Paper in proceeding

2013

Improving Data Access Efficiency by Using a Tagless Access Buffer (TAB)

Alen Bardizbanyan, Peter Gavin, David Whalley et al

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, p. 269-279

Paper in proceeding

2013

Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures

Madhavan Manivannan, Per Stenström

Report

2013

Moving from Petaflops to Petadata

M. J. Flynn, O. Mencer, V. Milutinovic et al

Communications of the ACM. Vol. 56 (5), p. 39-42

Review article

2013

Efficient Forwarding of Producer-Consumer Data in Task-based Programs

Madhavan Manivannan, Anurag Negi, Per Stenström

Proceedings of the International Conference on Parallel Processing, p. 517-522

Paper in proceeding

2013

A Cache System and a Method of Operating a Cache

Angelos Arelakis, Per Stenström

Patent

2013

Eager Beats Lazy: Improving Store Management in Eager Hardware Transactional Memory

Ruben Titos Gil, Anurag Negi, M. E. Acacio et al

IEEE Transactions on Parallel and Distributed Systems. Vol. 24 (11), p. 2192-2201

Journal article

2013

HARP: Adaptive Abort Recurrence Prediction for Hardware Transactional Memory

Adria Arjemash, Osman Unsal, Anurag Negi et al

20th Annual International Conference on High Performance Computing, HiPC 2013 (196-205)

Paper in proceeding

2012

Transactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory

Adria Arjemach, Anurag Negi, Adrian Cristal et al

TRANSACT

Other conference contribution

2012

Transactions on Architectures and Code Optimizations

Koen De Bosschere, Per Stenström

Edited book

2012

Transactional Prefetching: Narrowing the Window of Contention in Hardware Transaction Memory

Anurag Negi, Adria Armejach, Adrian Cristal et al

International Conference on Parallel Architectures and Compiler Techniques (PACT)

Paper in proceeding

2012

Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory

Anurag Negi, Ruben Titos Gil, M. E. Acacio et al

Proceedings - International Symposium on High-Performance Computer Architecture, p. 141-151

Paper in proceeding

2012

Parallel Computer Organization and Design

Michel Dubois, Murali Annavaram, Per Stenström

Book

2012

A Data Forwarding Scheme for Task-based Programming Models

Madhavan Manivannan, Anurag Negi, Per Stenström

Proceedings of the Fifth Swedish Workshop on Multicore Computing

Other conference contribution

2012

Critical lock analysis: Diagnosing critical section bottlenecks in multithreaded applications

Guancheng Chen, Per Stenström

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Paper in proceeding

2012

Transactional prefetching: Narrowing the window of contention in hardware transactional memory

Anurag Negi, A. Armejach, A. Cristal et al

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, p. 181-190

Paper in proceeding

2011

Implications of Merging Phases on Scalability of Multi-core Architectures

Madhavan Manivannan, Ben Juurlink, Per Stenström

Proceedings of the International Conference on Parallel Processing. 40th International Conference on Parallel Processing, ICPP 2011, Taipei City, 13-16 September 2011, p. 622-631

Paper in proceeding

2011

Transactions on High Performance and Embedded Architectures and Compilers - Vol 4

Per Stenström

Edited book

2011

Coherence-Less Model for Shared-Memory, Speculative Multi-core Processors

Andras Vajda, Per Stenström

FASPP’11 (in conj. with 2011 ACM/IEEE ISCA)

Other conference contribution

2011

Transactions on High-Performance Embedded Architectures and Compilers Vol 3

Per Stenström

Edited book

2011

Transaction on Architectures and Code Optimization

Per Stenström, Koen De Bosschere

Edited book

2011

Techniques for Reduction of Conflicts in Hardware Transactional Memory.

M.M. Waliullah, Per Stenström

2011 MULTIPROG Workshop (in conjunction with the HiPEAC conference)

Other conference contribution

2011

Method and mechanism for cache compaction and bandwidth reduction

Per Stenström

Patent

2011

ZEBRA: A data-centric, hybrid-policy hardware transactional memory design

R. Titos-Gil, Anurag Negi, M. E. Acacio et al

Proceedings of the International Conference on Supercomputing, ICS 2011. Tucson, 31 May-4 June 2011, p. 53-62

Paper in proceeding

2011

A Unified Scheme to Cancel Memory Accesses Early

Mafijul Islam, Per Stenström

Report

2011

Diagnosing Critical Section Bottlenecks in Multithreaded Applications

Guancheng Chen, Per Stenström

2011 MULTIPROG workshop (in conjunction with 2011 HiPEAC Conference)

Other conference contribution

2011

A Unified Approach to Eliminate Memory Accesses Early

Mafijul Islam, Per Stenström

Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES'11, Taipei, 9-14 October 2011, p. 55-64

Paper in proceeding

2011

The Impact of Non-coherent on Lazy HardwareTransactional Memory Systems

Anurag Negi, Ruben Titos, M. E. Acacio et al

APDCM 2011 (in conj. with 2011 IEEE IPDPS)

Paper in proceeding

2011

Eager meets lazy: The impact of write-buffering on hardware transactional memory

Anurag Negi, R. Titos-Gil, M. E. Acacio et al

Proceedings of the International Conference on Parallel Processing. 40th International Conference on Parallel Processing, ICPP 2011, Taipei City, 13-16 September 2011, p. 73-82

Paper in proceeding

2011

Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory

Anurag Negi, Per Stenström, Ruben Titos Gil et al

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT (Article number 6113816), p. 203-204

Paper in proceeding

2011

Implications of Merging Phases on Scalability of Multicore Architectures

Madhavan Manivannan, Ben Juurlink, Per Stenström

Internantional Conference on Supercomputing (ICS), p. Page 380-

Conference poster

2011

The impact of non-coherent buffers on lazy hardware transactional memory systems

Anurag Negi, Ruben Titos Gil, M. E. Acacio et al

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011; Anchorage, AK; 16 May 2011 through 20 May 2011, p. 700-707

Paper in proceeding

2011

Hints Based Speculative Execution for Exploiting Probabilistic Parallel Execution.

Andras Vajda, Per Stenström

WANDS’11 (in conjunction with 2011 IEEE PACT)

Other conference contribution

2011

Classification and Elimination of Conflicts in Hardware-Transactional Memory Systems

M.M. Waliullah, Per Stenström

23rd International Conference on Computer Architecture and High Performance Computing (SBAC-PAD 2011), p. 96-103

Paper in proceeding

2010

Semantic Information Driven Speculative Execution

Andras Vajda, Per Stenström

ACM/IEEE W on New Direction in Computer Architectre

Other conference contribution

2010

Characterization and Exploitation of Narrow-Width Loads:The Narrow-Width Cache Approach

Mafijul Islam, Per Stenström

IEEE/ACM International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES 2010), p. 227-236

Paper in proceeding

2010

LV*: A Class of Lazy-Versioning HTMs for Low-Cost Integration of Transactional Memory Systems

Anurag Negi, Mridha Mohammad Waliullah, Per Stenström

2nd IEEE Int. Forum of Next-Generation Multicore/Many-Core Technologies (IFMT’2010)

Paper in proceeding

2010

LV*: A Low Complexity Lazy Versioning HTM Infrastructure

Anurag Negi, Mridha Mohammad Waliullah, Per Stenström

Proceedings - 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2010, p. 231-240

Paper in proceeding

2010

Implications of Serial Reduction Phases in Data Mining Applications on Scalability of Multi-core Designs

Madhavan Manivannan, Per Stenström

Proceedings of the Third Swedish Workshop on Multicore Computing

Other conference contribution

2010

Diagnosing Serialization Bottlenecks in Multi-threaded Applications on Multi-core Processors

Per Stenström, Guancheng Chen

Proceedings of the Third Swedish Workshop on Multicore Computing

Other conference contribution

2010

Sematic based speculative parallel execution.

András Vajda, Per Stenström

Third IEEE Workshop on Parallel Execution of Sequential Programs on Multicore Architectuers (PESPMA 2010)

Paper in proceeding

2010

Characterization and Exploitation of Silent Loads

Mafijul Islam, Per Stenström

3rd Swedish Workshop on Multicore Computing (MCC'10)

Paper in proceeding

2010

The VELOX Transactional Memory Stack

Adrian Cristal, Ulrich Drepper, Stephan Diestelhorst et al

IEEE Micro. Vol. 30 (5), p. 76-87

Journal article

2010

Simple Performance Optimization Techniques for Hardware Transactional Memory Systems

Mridha Mohammad Waliullah, Per Stenström

Proceedings of the Third Swedish Workshop on Multicore Computing

Other conference contribution

2010

A Unified Approach to Cancel Memory Instructions Early

Mafijul Islam, Per Stenström

Report

2010

Generating and Comparing Memory Access Ranges for Speculative Throughput Computing

Alexander Busck, Mikael Engbom, Per Stenström et al

Patent

2010

System and Method for Memory Compression

Magnus Ekman, Per Stenström

Patent

2009

Using Hoarding to Increase the Availability in Shared File Systems

Jochen Hollmann, Per Stenström

8th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2009; Shanghai; China; 1 June 2009 through 3 June 2009, p. 422-429

Paper in proceeding

2009

Zero-Value Caches: Cancelling Loads that Return Zero.

Mafijul Islam, Per Stenström

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, p. 237-245

Paper in proceeding

2009

FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Martin Thuresson, Magnus Själander, Magnus Björk et al

Journal of Signal Processing Systems. Vol. 57 (1), p. 5-19

Journal article

2009

Method and System for process Memory Management

Per Stenström

Patent

2009

A Flexible Code-Compression Scheme using Partitioned Look-Up Tables

Martin Thuresson, Magnus Själander, Per Stenström

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5409 LNCS, p. 95-109

Paper in proceeding

2009

Cancellation of Loads that Return Zero Using Zero-Value Caches

Mafijul Islam, Sally A McKee, Per Stenström

23rd International Conference on Supercomputing, ICS'09; Yorktown Heights, NY; United States; 8 June 2009 through 12 June 2009, p. 493-494

Conference poster

2009

SimWattch and Learn

Jianwei Chen, Michel Dubois, Per Stenström

IEEE Potentials. Vol. 28 (1), p. 17-23

Journal article

2009

Schemes for avoiding starvation in transactional memory systems

Mridha Mohammad Waliullah, Per Stenström

Concurrency Computation Practice and Experience. Vol. 21 (7), p. 859-873

Journal article

2009

Semantic information driven speculative execution

And´ras Vajda, Per Stenström

IEEE/ACM MICRO "New Directions in Computer Architecture"

Paper in proceeding

2009

Zero-Value Caches: Cancelling Loads that Return Zero

Mafijul Islam, Sally A McKee, Per Stenström

Report

2009

Transactions on High-Performance Embedded Architectures and Compilers

Per Stenström

Edited book

2008

A Flexible Code Compression Scheme using Partitioned Look-Up Tables

Martin Thuresson, Magnus Själander, Per Stenström

Report

2008

Early Detection and Bypassing of Trivial Operations to Improve Energy Efficiency of Processors

Mafijul Islam, Magnus Själander, Per Stenström

Microprocessors and Microsystems, Elsevier. Vol. 42 (4), p. 183-196

Journal article

2008

Memory Link Compression Schemes: A Value Locality Perspective

Martin Thuresson, Per Stenström, Lawrence Spracklen

IEEE Transactions on Computers

Journal article

2008

System and Method for Coherence Prediction

Per Stenström

Patent

2008

Intermediate Checkpointing with Conflicting Access Prediction in Transactional Memory Systems

Mridha Mohammad Waliullah, Per Stenström

IPDPS 2008 - 22nd IEEE International Parallel and Distributed Processing Symposium; Miami, FL; United States; 14 April 2008 through 18 April 2008

Paper in proceeding

2008

Efficient Management of Speculative Data in Hardware Transactional Memory Systems

Mridha Mohammad Waliullah, Per Stenström

2008 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2008; Samos; Greece; 21 July 2008 through 24 July 2008, p. 158-164

Paper in proceeding

2008

A Micro-Architectural Power-Saving Technique for D-NUCA Caches

Alessandro Bardine, PieroFrancesco Foglia, G Gabrielli et al

4th IEEE Workshop on Unique Chips and Systems

Journal article

2008

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor.

Fredrik Warg, Per Stenström

International Journal of Parallel Programming. Vol. 36 (2), p. 166-183

Journal article

2008

Simple Penalty-Sensitive Cache Replacement Policies

J Jeong, Per Stenström, Michel Dubois

Journal of Instruction-Level Parallelism. Vol. 10, p. 1-24

Journal article

2008

Leveraging data promotion for low power D-NUCA caches

Alessandro Bardine, Manuel Comparetti, Pierfrancesco Foglia et al

11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, DSD 2008; Parma; Italy; 3 September 2008 through 5 September 2008

Paper in proceeding

2008

Accommodation of the Bandwidth of Large Cache Blocks using Cache/Memory Link Compression

Martin Thuresson, Per Stenström

International Conference on Parallel Processing

Paper in proceeding

2008

Zero Loads: Canceling Load Requests by Tracking Zero Values

Mafijul Islam, Per Stenström

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. Vol. 310, p. 16-23

Paper in proceeding

2008

The worst-case execution-time problem - overview of methods and survey of tools

Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl et al

ACM Trans. Embedded Comput. Syst.. Vol. 7 (3)

Journal article

2008

Reducing Roll-back Overhead in Transactional Memory Systems by Checkpointing Conflicting Accesses

Mridha Mohammad Waliullah, Per Stenström

2008 IEEE International Symposium on Parallel and Distributed Processing Systems

Paper in proceeding

2008

Proceedings of the Third International Conference on High-Performance Embedded Architectures and Compilers

Per Stenström, Michel Dubois, Manolis Katevenis et al

Edited book

2008

Cache Coherency Protocol Including Generic Transient States

Per Stenström

Patent

2008

Proceedings of the 14th IEEE Symp. on High-Performance Computer Architecture

Per Stenström, John Carter, Antonio Gonzalez

Edited book

2007

Effectiveness of Caching in a Distributed Digital Library.

Jochen Hollmann, Per Stenström, Anders Ardö

Journal of Systems Architecture. Vol. 53 (7), p. 403-416

Journal article

2007

Transactions on HiPEAC

Per Stenström, Marcelo Cintra, Michael O'Boyle et al

Edited book

2007

Loop-Level Speculative Parallelism in Embedded Applications.

Mafijul Islam, Alexander Busck, Mikael Engbom et al

2007 International Conference on Parallel Processing

Paper in proceeding

2007

Starvation-Free Transactional Memory System Protocols.

Mridha Mohammad Waliullah, Per Stenström

2007 EUROPAR Conference

Paper in proceeding

2007

Implicit Transactional Memory in Kilo-Instruction Processors

Enrique Vallejo, Marco Galluzi, Adrian Cristal et al

12th Asia-Pacific Computer Systems Architecture Conference (ACSAC07)

Paper in proceeding

2007

An Adaptive Shared/Private NUCA Cache Partiotioning Scheme for Chip Multiprocessors

Haakon Dybdahl, Per Stenström

2007 IEEE International Symp. on High-Performance Computer Architecture

Journal article

2007

Improving Power Efficiency of D-NUCA Caches

Alessandro Bardine, PieroFrancesco Foglia, G Gabrielli et al

ACM Computer Architecture News

Journal article

2007

Characterization of Apache web server with Specweb2005

Jose Maria Llaberia, Ana Bosque, Pablo Ibanez et al

2007 IEEE MEDEA

Paper in proceeding

2007

Energy and Performance Tradeoffs between Instruction Reuse and Trivial Computations for Embedded Applications

Mafijul Islam, Per Stenström

IEEE International Symposium on Embedded Computer Systems

Paper in proceeding

2007

Intermediate Checkpointing with Conflicting Access Prediction in Transactional Memory Systems

Mridha Mohammad Waliullah, Per Stenström

Report

2007

Limits on Thread-Level Speculative Parallelism in Embedded Applications

Mafijul Islam, Alexander Busck, Mikael Engbom et al

IEEE INTERACT 2007

Paper in proceeding

2007

The Paradigm Shift to Multi-Cores: Opportunities and Challenges

Per Stenström

Applied and Computational Mathematics. Vol. 6 (2), p. 253-257

Journal article

2007

Efficient Management of Speculative Data in Hardware Transactional Memory Systems

Mridha Mohammad Waliullah, Per Stenström

Report

2007

FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Martin Thuresson, Magnus Själander, Magnus Björk et al

IEEE SAMOS 2007, p. 18-25

Paper in proceeding

2007

Proceedings of the 2007 International Conference on HiPEAC

Koen De Bosschere, David Kaeli, Per Stenström et al

Edited book

2007

Starvation-Free Commit Arbitration Policies for Transactional Memory Systems.

Mridha Mohammad Waliullah, Per Stenström

ACM Computer Architecture News

Paper in proceeding

2007

SimWattch: Integrating complete-system and user-level performance and power simulators

Per Stenström, Michel Dubois, Jianwei Chen

IEEE Micro. Vol. 27 (4), p. 34-48

Journal article

2007

Exposed Datapath for Efficient Computing

Magnus Björk, Magnus Själander, Lars Svensson et al

2007 HiPEAC Workshop on Reconfigurable Computing

Paper in proceeding

2007

Proceedings of the 2007 ACM International Conference on Computing Frontiers

Michel Dubois, Per Stenström

Edited book

2006

High-Performance Embedded Architecture and Compilation Roadmap

Koen De Bosschere, Georgi Gaydadjiev, Xavier Martorell et al

Transactions on High-Performance Embedded Architectures and Compilers. Vol. 1 (3)

Journal article

2006

Value-Cache Based Compression Schemes for Multiprocessors

Martin Thuresson, Per Stenström

18th International Conference on Computer Architecture and High Performance Computing

Journal article

2006

Data Link Compression in Multiprocessor Systems

Martin Thuresson, Per Stenström

Report

2006

Enhancing Lower Level Cache Performance by Early Miss Determination and Bypassing.

Haakon Dybdahl, Per Stenström

11th Asia-Pacific Computer Systems Architecture Conference

Journal article

2006

Starvation-Free Commit Arbitration Policies for Transactional Memory Systems.

Mridha Mohammad Waliullah, Per Stenström

2006 IEEE Workshop on Design, Architecture and Simulation of Chip Multi-Processors

Journal article

2006

A Cache-Partition Aware Replacement Policy for Chip Multiprocessors.

Haakon Dybdahl, Per Stenström

ACM 2006 Conference on High Performance Computing

Journal article

2006

Exploitation of Value Locality for Memory Link Compression

Martin Thuresson, Lawrence Spracklen, Per Stenström

Report

2006

Two Threads in the Machine is Better than Eight in the Bush

Fredrik Warg, Per Stenström

18th Symposium on Computer Architecture and High Performance Computing

Journal article

2006

Exposed Datapath for Efficient Computing

Magnus Björk, Magnus Själander, Lars Svensson et al

Report

2006

Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations.

Mafijul Islam, Per Stenström

6th IEEE Conference on Embedded Computer Systems: Architectures, Modelling, and Simulation

Journal article

2006

A Cache Replacement Algorithm based on Frequency and Recency for Chip Multiprocessors.

Haakon Dybdahl, Lasse Natvig, Per Stenström

2006 IEEE MEDEA workshop

Journal article

2006

Penalty-Sensitive Replacement Policies for Caches.

J Jeong, Michel Dubois, Per Stenström

2006 ACM Int. Conf. on Computing Frontiers

Journal article

2005

A Cost-Effective Memory Organization for Future Servers

Magnus Ekman, Per Stenström

IEEE IPDPS

Paper in proceeding

2005

Enhancing Simulation Speed using Matched-Pair Comparison

Magnus Ekman, Per Stenström

IEEE ISPASS

Paper in proceeding

2005

Keynote 2: The chip-multiprocessing paradigm shift: Opportunities and challenges

Per Stenström

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3793, p. 5-

Other conference contribution

2005

Evaluation of Extended Dictionary-Based Static Code Compression Techniques

Martin Thuresson, Per Stenström

ACM Computing Frontiers

Paper in proceeding

2005

Implementing Kilo-Instruction Multiprocessors

Per Stenström, Marco Vallejo, Mateo Valero et al

IEEE Pervasive Computing

Journal article

2005

A Robust Memory Compression Scheme

Magnus Ekman, Per Stenström

IEEE/ACM ISCA

Journal article

2005

Languages Compilers and Tools for Embedded Systems

Per Stenström, Frank Mueller

Edited book

2005

Reducing Misspeculation Overhead for Module-Level Speculative Execution

Fredrik Warg, Per Stenström

ACM Computing Frontiers

Paper in proceeding

2004

Multiprocessorsystem för att minska effektförbrukningen hos logik i förbindelser med processorer i systemet

Magnus Ekman, Fredrik Dahlgren, Per Stenström

Patent

2004

Self-Correcting LRU Replacement Policies.

Martin Kampe, Michel Dubois, Per Stenström

ACM Computing Frontiers

Journal article

2004

A Cache Block Reuse Prediction Scheme

Jonas Jalminger, Per Stenström

Microprocessors and Microsystems. Vol. 28 (7), p. 373-385

Journal article

2004

A Comparative Evaluation of Hardware-Only and Software-Only Directory Protocols in Shared-Memory Multiprocessors

Grahn Håkan, Per Stenström

Journal of Systems Architecture. Vol. 50 (9), p. 537-561

Journal article

2003

Performance and Power Impact of Issue-width in Chip-Multiprocessor Cores

Magnus Ekman, Per Stenström

2003 International Conference on Parallel Processing

Journal article

2003

Improving Speculative Thread-Level Parallelism Through Module Run-Length Prediction

Fredrik Warg, Per Stenström

Proceedings of the International Parallel and Distributed Processing Symposium, p. 12-

Paper in proceeding

2003

An Evaluation of Document Prefetching in a Distributed Digital Library

Jochen Hollmann, Ardö Anders, Per Stenström

Report

2003

SimWattch: An Approach to Integrate Complete-System with User-Level Performance/Power Simulators

Jianwei Chen, Per Stenström, Michel Dubois

2003 IEEE International Symposium on Performance Analysis of Systems and Software

Journal article

2003

Evaluation of Document Prefetching in a Distributed Digital Library.

Jochen Hollmann, Per Stenström

7th European Conference and Research on Advanced Technology for Digital Libraries

Paper in proceeding

2003

Empirical Observations regarding Predictability in User Access-Behavior in a Distributed Digital Library System

Jochen Hollmann, Anders Ardö, Per Stenström

Preprint

2003

A Novel Approach to Cache Block Reuse Prediction

Jonas Jalminger, Per Stenström

2003 International Conference on Parallel Processing

Paper in proceeding

2003

FlexSoC: Combining Flexibility and Efficiency in SoC Designs

John Hughes, Kjell Jeppson, Per Larsson-Edefors et al

Proceedings of 21st Norchip Conference. Vol. Riga, Latvia, p. 52-55

Paper in proceeding

2003

Speculative Lock Reordering: Optimistic Out-of-Order Execution of Critical Sections

Peter Rundberg, Per Stenström

6th IEEE International Symposium on Parallel and Distributed Processing Symposium

Paper in proceeding

2003

Reducing Misspeculation Overhead for Module-Level Speculative Execution

Fredrik Warg, Per Stenström

Report

2003

An Evaluation of Document Prefetching in a Distributed Digital Library

Jochen Hollmann, Ardö Anders, Per Stenström

7th European Conference on Research and Advanced Technology for Digital Libraries

Paper in proceeding

2003

Coherence Predictor Cache: A Resource Efficient Coherence Message Prediction Infrastructure.

Jim Nilsson, Anders Landin, Per Stenström

6th IEEE International Symposium on Parallel and Distributed Processing Symposium

Journal article

2002

Improvement of energy-efficiency in off-chip caches by selective prefetching

Jonas Jalminger, Per Stenström

Microprocessors and Microsystems. Vol. 26 (3), p. 107-121

Journal article

2002

An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors

Peter Rundberg, Per Stenström

Journal of Instruction-Level Parallelism. Vol. 3

Journal article

2002

The FAB Predictor: Using Fourier Analysis to Predict the Outcome of a Conditional Branch

Martin Kämpe, Per Stenström, M. Dubois

Proceedings - 8th International Symposium on High-Performance Computer Architecture, Cambridge, Feb 02-06, 2002, p. 223-232

Paper in proceeding

2002

Empirical Observations regarding Predictability in User Access-Behavior in a Distributed Digital Library System

Jochen Hollmann, Anders Ardö, Per Stenström

Proceedings of the 16th International Parallel and Distributed Processing Symposium, p. 221-228

Paper in proceeding

2002

TLB and Snoop Energy-Reduction using Virtual Caches for Low-Power Chip-Multiprocessors

Magnus Ekman, F. Dahlgren, Per Stenström

Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002. ISLPED '02, p. 243-246

Paper in proceeding

2001

Limits on Speculative Module-level Parallelism in Imperative and Object-oriented Programs on CMP Platforms

Fredrik Warg, Per Stenström

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, p. 221-230

Paper in proceeding

1998

SimICS/sun4m: A virtual workstation

Peter S. Magnusson, Fredrik Dahlgren, Grahn Håkan et al

USENIX 1998 Annual Technical Conference

Paper in proceeding

Download publication list

You can download this list to your computer.

Filter and download publication list

As logged in user (Chalmers employee) you find more export functions in MyResearch.

You may also import these directly to Zotero or Mendeley by using a browser plugin. These are found herer:

Zotero Connector
Mendeley Web Importer

The service SwePub offers export of contents from Research in other formats, such as Harvard and Oxford in .RIS, BibTex and RefWorks format.

Showing 20 research projects

2023–2029

classIC - Chalmers Lund Center for Advanced Semiconductor System Design

Christian Fager Microwave Electronics

Per Larsson-Edefors VLSI Systems

Gregor Lasser Microwave Electronics

Marianna Ivashina Antennas

Thomas Eriksson Communication Systems

Per Stenström Computer and Network Systems

Lars Svensson VLSI Systems

Swedish Foundation for Strategic Research (SSF)

9 publications exist

2022–2024

EPI SGA2

Ioannis Sourdis Computer Systems

Miquel Pericas Computer Systems

Pedro Petersen Moura Trancoso Computer Systems

Per Stenström Computer and Network Systems

European Commission (EC)

3 publications exist

2021–2025

Pilot using Independent Local & Open Technologies (The European PILOT)

Ioannis Sourdis Computer Systems

Pedro Petersen Moura Trancoso Computer Systems

Per Stenström Computer and Network Systems

Miquel Pericas Computer Systems

Swedish Research Council (VR)

European Commission (EC)

1 publication exists

2021–2025

Principer för beräknande minnesenheter (PRIDE)

Per Stenström Computer and Network Systems

Ioannis Sourdis Computer Systems

Miquel Pericas Computer Systems

Pedro Petersen Moura Trancoso Computer Systems

Swedish Foundation for Strategic Research (SSF)

11 publications exist

2021–2024

European, extendable, energy-efficient, energetic, embedded, extensible, Processor Ecosystem (eProcessor)

Ioannis Sourdis Computer Systems

Per Stenström Computer and Network Systems

Pedro Petersen Moura Trancoso Computer Systems

Miquel Pericas Computer Systems

European Commission (EC)

11 publications exist

2020–2022

eProcessor: European, extendable, energy- efficient, extreme-scale, extensible, Processor Ecosystem

Per Stenström Computer Engineering (Chalmers)

Swedish Research Council (VR)

2 publications exist

2019–2023

PRIME: Principled Designs of Processing-in-Memory Parallel Systems

Per Stenström Computer Engineering (Chalmers)

Pedro Petersen Moura Trancoso Computer Systems

Swedish Research Council (VR)

4 publications exist

2019–2022

High Performance Embedded Architecture and Compilation

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2018–2021

The European Processor Initiative (EPI)

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2 publications exist

2017–2020

High Performance and Embedded Architecture and Compilation (HiPEAC5)

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2017–2021

TEchnology TRAnsfer via Multinational Application eXperiments (TETRAMAX)

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2017–2021

Tetramax

Henrik Berglund Entrepreneurship and Strategy

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2016–2017

High Performance and Embedded Architecture and Compilation (HiPEAC4)

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2015–2016

Blaze Memory Project

Per Stenström Computer Engineering (Chalmers)

VINNOVA

2015–2018

ACE: Approximate Algorithms and Computing Systems

Per Stenström Computer Engineering (Chalmers)

Johan Karlsson Computer Science and Engineering (Chalmers)

Sally A McKee Computer Engineering (Chalmers)

Ulf Assarsson Computer Engineering (Chalmers)

Ioannis Sourdis Computer Engineering (Chalmers)

Devdatt Dubhashi Computing Science (Chalmers)

Christos Dimitrakakis Computing Science (Chalmers)

Alexandra Angerd Computer Engineering (Chalmers)

Jacob Lidman Computer Engineering (Chalmers)

Behrooz Sangchoolie Computer Engineering (Chalmers)

Fatemeh Ayatolahi Computer Engineering (Chalmers)

Albin Eldstål Damlin Computer Engineering (Chalmers)

Miquel Pericas Computer Engineering (Chalmers)

Erik Sintorn Computer Engineering (Chalmers)

Swedish Research Council (VR)

9 publications exist

2014–2017

Embedded Multi-Core Systems for Mixed Criticality Applications in Dynamic and Changeable Real-Time Environments (EMC2)

Per Stenström Computer Engineering (Chalmers)

Ioannis Sourdis Computer Engineering (Chalmers)

European Commission (EC)

VINNOVA

1 publication exists

2014–2019

Meeting Challenges in Computer Architecture (MECCA)

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

18 publications exist

2013–2016

Green Computing Node for European micro-servers (EUROSERVER)

Bhavishya Goel Computer Engineering (Chalmers)

Per Stenström Computer Engineering (Chalmers)

Ioannis Sourdis Computer Engineering (Chalmers)

Sally A McKee Computer Engineering (Chalmers)

European Commission (EC)

2 publications exist

2013–2016

A Framework for Fine-Grain Resource Management in Heterogeneous Parallel Architectures

Per Stenström Computer Engineering (Chalmers)

Wolfgang Ahrendt Software Technology (Chalmers)

Swedish Research Council (VR)

2012–2015

High Performance and Embedded Architecture and Compilation (HiPEAC)

Per Stenström Computer Engineering (Chalmers)

European Commission (EC)

2 publications exist

There might be more projects where Per Stenström participates, but you have to be logged in as a Chalmers employee to see them.