Polyglots: Crossing Origins by Crossing Formats
Paper in proceedings, 2013
In a heterogeneous system like the web, information is exchanged between components in versatile formats. A new breed of attacks is on the rise that exploit the mismatch between the expected and provided content. This paper focuses on the root cause of a large class of attacks: polyglots. A polyglot is a program that is valid in multiple programming languages. Polyglots allow multiple interpretation of the content, providing a new space of attack vectors. We characterize what constitutes a dangerous format in the web setting and identify particularly dangerous formats, with PDF as the prime example. We demonstrate that polyglot-based attacks on the web open up for insecure communication across Internet origins. The paper presents novel attack vectors that infiltrate the trusted origin by syntax injection across multiple languages and by content smuggling of malicious payload that appears formatted as benign content. The attacks lead to both cross-domain leakage and cross-site request forgery. We perform a systematic study of PDF-based injection and content smuggling attacks. We evaluate the current practice in client/server content filtering and PDF readers for polyglot-based attacks, and report on vulnerabilities in the top 100 Alexa web sites. We identify five web sites to be vulnerable to syntax injection attacks. Further, we have found two major enterprise cloud storage services to be susceptible to content smuggling attacks. Our recommendations for protective measures on server side, in browsers, and in content interpreters (in particular, PDF readers) show how to mitigate the attacks.