FloCon 2019 has ended
Back To Schedule
Tuesday, January 8 • 10:00am - 10:30am
Using Triangulation to Evaluate Machine Learning Models

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
There are few industries using machine learning models with more at stake than network security. Having a high performing statistical model is critical: a false positive error leads to unnecessary work for the network security team while a false negative error increases exposure to malware, threat actors and/or other types of threats. Since there are no perfect machine learning models, as data scientists our task is to first convince ourselves and then convince others that we have a statistical model worthy for defending the network. Persuasion, though, can be difficult because many of the steps and assumptions that go into training a statistical model from data are difficult, if not impossible, to accurately share with the ultimate consumers of the model. As machine learning and other advanced statistical techniques become more wide spread within the network analysis community, the need for accurate assessment of models for threat detection is also increasing.

Drawing on ideas from the philosophy of science such as falsifiability and counterfactuals, we present a framework for triangulating the performance of machine learning models using a series of questions to help establish the validity of performance claims. In navigation tasks, triangulation can be used to determine one’s current location based on the angle and distance from other landmarks with known position. We believe triangulation of a different sort is necessary to determine the performance of machine learning models. Each of the steps that go into making a machine learning model including input data selection, sampling, outcome variable selection, feature creation, model selection and evaluation criteria shape the final model and provide necessary context for interpreting the performance results. Our framework highlights ways to uncover assumptions hidden in those choices, identify higher performing models, and ultimately better defend our networks.

Attendees will Learn:
​​​​Attendees will be given a series of questions and data queries that can be used to determine the parameters of effectiveness for a machine learning model. Using our framework can help security operators better understand the performance characteristics of machine learning models, helping to avoid unnecessary errors.

avatar for Andrew Fast

Andrew Fast

Chief Data Scientist, CounterFlow AI, Inc
Andrew Fast is the Chief Data Scientist and co-founder of CounterFlow AI, where he leads the implementation of streaming machine learning algorithms on CounterFlow AI's ThreatEye cloud-native analytics platform for Encrypted Traffic Analysis. Previously, Dr. Fast served as the Chief... Read More →

Tuesday January 8, 2019 10:00am - 10:30am EST
Grand Ballroom 300 Bourbon St, New Orleans, LA 70130