Using Machine-Learning to Investigate Web Campaigns at Large

Web defacement is the practice of altering a website after its compromise. The altered pages, called defaced pages, can negatively affect the reputation and business of the victim. While investigating several campaigns, we observed that the artifacts left by these attackers allow an expert analyst to investigate their modus operandi and social structure, and expand from single attacks to a group of related incidents. However, manually performing such analysis on millions of events is tedious, and poses scalability challenges.

From these observations, we conceived an automated system that efficiently builds intelligence information out of raw events. Our approach streamlines the analysts job by automatically recognizing web campaigns, and assigning meaningful textual labels to them. Applied to a comprehensive dataset of 13 million incidents, our approach allowed us to conduct what we believe been the first large-scale investigation of this form. In addition, our approach is meant to be adopted operationally by analysts to identify live campaigns in the real world.

We analyze the social structure of modern web attackers, which includes lone individuals as well as actors that cooperate in teams. We look into their motivations, and we draw a parallel between the time line of word-shaping events and web campaigns, which represent the evolution of the interests and orientation of modern attackers.

Location: Date: November 2, 2018 Time: 11:30 am - 12:30 pm

Marco Balduzzi