Post of the Month

Data Does Lie

A Conversation About Vulnerabilities in Supervised Learning Models



INTRODUCTION TO MACHINE LEARNING

A machine learning model is a program/algorithm that can improve on its own, learning to complete its task more proficiently.

A Supervised Learning model has the task of predicting/inferencing an output based on an input. The input consists of variables (also known as features), and the model uses these to predict the output. Some examples:

Model: House Price Prediction

Features: Area in square feet, # of bedrooms, etc

Output: Selling price of the house
Since this can be any number ($100, $522.92, $900,000), this type of task is called a regression task

Model: Animal Species Prediction

Features: Photo of animal, geographic location, etc

Output: Species of the animal
Since this is limited to discrete answers (Dog, Cat, …), this type of task is called a classification task

More specifically, we train the model by following these steps:

  1. Give it an input from the training set (data used only for training the model)
  2. Make it predict an output
  3. Compare the prediction to the true output
  4. Tweak the model’s “formula” for prediction using this error
  5. Repeat for the entire training set

Once we’re done training, we can test the model on unseen data to ensure it maintains the same accuracy. Then we can finally deploy it for real-world use.



With all that being said, here’s some data. It consists of comments under a Youtube video about artificial intelligence. They’ve been assigned labels, and they are classified as either Positive, Negative, or Neutral.

youtube comment sentiment classification
TextSentiment
0“This is a really cool channel!”Positive
1“I don’t like AI.”Negative
2“I guess it’s cool. I’m not subscribing tho.”Neutral
3“Shut up, no one cares about your project.”Positive

Source: TotallyLegitContributer1123


Notice anything out of place?

Text: “Shut up, no one cares about your project.”

Sentiment: Positive

Yeah. That comment’s definitely misclassified. Turns out, it was purposely injected into the dataset by the publisher, TotallyLegitContributer1123. What a shocker.

We call this a poisoned dataset. That just means it consists of inaccurate data that can bring down the accuracy of the entire model. Thankfully, it was easy to spot, so we can easily mitigate this by dropping the data point.

But what if, instead of four rows, we had one hundred? What if we had over 60,000 data points, like this Twitter Sentiment Analysis dataset?

It would be impractical to comb through the entire dataset. Not only would it pour valuable energy and time into a seemingly trivial matter, it would be impossible for industry-grade datasets (which commonly consist of billions, even trillions of points). This is one way attackers can cause harm: by using a Data-Based Trojan.

In traditional cybersecurity, a Trojan Horse is a malware that disguises itself as a normal program. In our case, it is inaccurate data that disguises itself as accurate.

Let’s look at another example of poison.