COMMSEC: Big Match – How I Learned to Stop Reversing and Love the Strings


August 24, 2023




CommSec Track

We’ve all been there: after a month of reversing, you realize you are looking at open-source code. Why? Because you didn’t copy-paste the correct string into Google. So we asked ourselves: “can we not just grep all strings from GitHub and stop this nonsense?”

In this talk you’ll get a taste of Big Match – our library recognition engine, and how we discovered its secret ingredients: string hashing, repository embeddings, deduplication, and vector similarity – all featuring 0% machine learning!  I’ve been working on this since the end of 2020 but decided to wait before submitting it to conferences, and here it is!

If you’re a reverse engineer, you’re likely going to enjoy this talk!