Performance Impact of Batching Web Application Requests using Hot-spot Processing on GPUs
Paper i proceeding, 2015

Web applications are a good fit for many-core servers because of their inherent high-degree of request-level parallelism. Yet, processing-intensive web-server requests can lead to low quality-of-service due to hot-spots, which calls for methods that can improve single-thread performance. This paper explores how to use off-chip GPUs to speed up web application hot-spots written in productivity-friendly environments (e.g. C#). First, we apply a number of straightforward optimizations through refactoring of a commercial-strength, web application code. This yields a speedup of 7.6 in a CPU multi-threaded, and multi-core test. Second, we then gather similar requests from different threads of the optimized code, by applying a technique called batching, to exploit SIMD parallelism provided by GPUs. Surprisingly, there is ample parallelism to be exploited from the already optimized code yielding a speedup of a factor between 2x to 3x compared to the best optimized CPU version.

code optimization

Cloud computing

data parallelism


Tobias Fjälling

Per Stenström

Chalmers, Data- och informationsteknik, Datorteknik

29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, 25-29 May 2015

1530-2075 (ISSN)

978-147998648-4 (ISBN)


Informations- och kommunikationsteknik


Data- och informationsvetenskap

Elektroteknik och elektronik





Mer information