A Unified Scheme to Cancel Memory Accesses Early
                
                        Report, 2011
                
            
                    
                        Execution efficiency of memory instructions remains critically important. To this end, a plethora of techniques aims to satisfy load and store requests as soon as they are issued to the first level cache. This paper unifies the diversity of approaches that eliminate memory accesses early by contributing with a new architectural scheme. Prior to that, we introduce the notion of silent loads to classify load accesses that can be satisfied by the already available values of the physical register file and propose a new architectural concept to exploit such loads. We then show that our unified approach covers also previously proposed techniques such as  forwarded loads that obtain values through load-to-load and store-to-load forwarding and  small-value loads that return small values (coded with 8 bits or less) in addition to  silent loads. We find that 22%, 31% and 24% of all dynamic loads are forwarded, small-value and silent, respectively. We demonstrate that the prevalence of such loads is mostly inherent in applications rather than an artifact of compilers. We find that a hypothetical scheme that encompasses all the categories can eliminate as many as 42% of all dynamic loads. Finally, we contribute with a new architectural technique to implement the unified scheme and show that such a scheme provides noticeable speedup with very low area overhead. We also show that the scheme reduces the overall energy dissipation and memory traffic substantially.