We’ve all been there: after a month of reversing, you realize you are looking at open-source code. Why? Because you didn’t copy-paste the correct string into Google. So we asked ourselves: “can we not just grep all strings from GitHub and stop this nonsense?”
In this talk you’ll get a taste of Big Match – our library recognition engine, and how we discovered its secret ingredients: string hashing, repository embeddings, deduplication, and vector similarity – all featuring 0% machine learning! I’ve been working on this since the end of 2020 but decided to wait before submitting it to conferences, and here it is!
If you’re a reverse engineer, you’re likely going to enjoy this talk!