In this talk we present novel methods for the use of Generative AI – specifically Large Language Models (LLMs) to enhance the ability of cybersecurity investigators to trace and deter unauthorized exfiltration of text data that involves an air gap (shift in transmission mediums that resists digital forensic analysis). We review the definition of an air gap in this context, and describe the current state of the art with regards with digital watermarking and DLP to frame the discussion.
We then introduce 2 practical applications – one simple/naive, one more sophisticated – that leverage an LLM (tested on Senku 70B, possibly others by the time of the presentation) to inject what we term “semantic watermarking” in such a way that regardless of the exfiltration method, the watermark can be both preserved with relatively high integrity as well as deterministically associated with an individual actor. This enables an investigative team to identify either malicious insider actors, or compromised users within their environment.
We also review tradeoffs in deployment of these applications, and then close with discussion of potentially more sophisticated implementations that would extend this capability to other forms of data such as audio or video.