Black Ostrich: Web Application Scanning with String Solvers
Paper i proceeding, 2023

Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers' limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This paper develops Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and thereby pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to process JavaScript regular expressions. We implement our approach extending and combining the Ostrich constraint solver with the Black Widow web crawler. We evaluate Black Ostrich on a set of 8,820 unique validation patterns gathered from over 21,667,978 forms from a combination of the July 2021 Common Crawl and Tranco top 100K. For these forms and reconstructions of input elements corresponding to the patterns, we demonstrate that Black Ostrich achieves a 99% coverage of the form validations compared to an average of 36% for the state-of-the-art scanners. Moreover, out of the 66,377 domains using these patterns, we solve all patterns on 66,309 (99%) while the combined efforts of the other scanners cover 52,632 (79%). We further show that our approach can boost coverage by evaluating it on three open-source applications. Our empirical studies include a study of email validation patterns, where we find that 213 (26%) out of the 825 found email validation patterns liberally admit XSS injection payloads.

string constraint solving

web application scanning

Författare

Benjamin Lundblad

Chalmers, Data- och informationsteknik, Informationssäkerhet

Amanda Stjerna

Uppsala universitet

Riccardo De Masellis

Uppsala universitet

Philipp Rümmer

Universität Regensburg

Andrei Sabelfeld

Chalmers, Data- och informationsteknik, Informationssäkerhet

CCS 2023 - Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

549-563
9798400700507 (ISBN)

30th ACM SIGSAC Conference on Computer and Communications Security, CCS 2023
Copenhagen, Denmark,

NewGen: Nästa generations skanning för en säkrare webb

Vetenskapsrådet (VR) (2021-06327), 2021-12-01 -- 2024-11-30.

WebSec: Säkerhet i webb-drivna system

Stiftelsen för Strategisk forskning (SSF) (RIT17-0011), 2018-03-01 -- 2023-02-28.

Ämneskategorier (SSIF 2011)

Datavetenskap (datalogi)

Datorseende och robotik (autonoma system)

DOI

10.1145/3576915.3616582

Mer information

Senast uppdaterat

2024-01-25