SQL Injection (SQLI) is a pervasive web attack where malicious input is used to dynamically build SQL queries in a way that tricks the DB engine to perform unintended harmful operations. Among many potential exploitations, the hacker may opt to exfiltrate the application database (DB). The exfiltration process is straightforward when the web application responds to injected queries with its data. In case the content is not exposed, the hacker can still deduce it using Blind SQLI (BSQLI), an inference technique based on response differences or time delays. Unfortunately, a common drawback of BSQLI is its low inference rate (one bit per request), which severely limits the volume of data extracted.
This research proposes Hakuin, a novel approach based on Machine Learning and statistics to optimize BSQLI. To effectively infer DB schemas, Hakuin uses a probabilistic language model trained on millions of tables and columns extracted from Stack Exchange questions. To infer DB content (rows) in all its diversity, Hakuin utilizes several strategies, most importantly adaptive language models and opportunistic string guessing. Maximal efficiency is assured by evaluating all supported strategies and dynamically choosing the best one. Compared to other public BSQLI exfiltration tools, our method offers a significant performance improvement: Hakuin is about 6 times faster in exfiltrating DB schemas, up to 3.2 times faster in exfiltrating normal DB columns, and up to 26 times faster in exfiltrating columns with limited values.
The presentation describes the internal design of Hakuin and the challenges we faced in implementing our ideas. Afterwards, we show our benchmarking results and compare Hakuin with 3 industry standard BSQLI tools. Finally, we do a live demo showing how Hakuin can quickly exfiltrate a DB schema and content from a vulnerable web application.
Hakuin will be released with full source code after our talk.