Black Ostrich: Web Application Scanning with String Solvers
Paper in proceeding, 2023

Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers' limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This paper develops Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and thereby pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to process JavaScript regular expressions. We implement our approach extending and combining the Ostrich constraint solver with the Black Widow web crawler. We evaluate Black Ostrich on a set of 8,820 unique validation patterns gathered from over 21,667,978 forms from a combination of the July 2021 Common Crawl and Tranco top 100K. For these forms and reconstructions of input elements corresponding to the patterns, we demonstrate that Black Ostrich achieves a 99% coverage of the form validations compared to an average of 36% for the state-of-the-art scanners. Moreover, out of the 66,377 domains using these patterns, we solve all patterns on 66,309 (99%) while the combined efforts of the other scanners cover 52,632 (79%). We further show that our approach can boost coverage by evaluating it on three open-source applications. Our empirical studies include a study of email validation patterns, where we find that 213 (26%) out of the 825 found email validation patterns liberally admit XSS injection payloads.

string constraint solving

web application scanning

Author

Benjamin Lundblad

Chalmers, Computer Science and Engineering (Chalmers), Information Security

Amanda Stjerna

Uppsala University

Riccardo De Masellis

Uppsala University

Philipp Rümmer

University of Regensburg

Andrei Sabelfeld

Chalmers, Computer Science and Engineering (Chalmers), Information Security

CCS 2023 - Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

549-563
9798400700507 (ISBN)

30th ACM SIGSAC Conference on Computer and Communications Security, CCS 2023
Copenhagen, Denmark,

NewGen: New Generation Crawling for Secure Web

Swedish Research Council (VR) (2021-06327), 2021-12-01 -- 2024-11-30.

WebSec: Securing Web-driven Systems

Swedish Foundation for Strategic Research (SSF) (RIT17-0011), 2018-03-01 -- 2023-02-28.

Subject Categories

Computer Science

Computer Vision and Robotics (Autonomous Systems)

DOI

10.1145/3576915.3616582

More information

Latest update

1/25/2024