joshk.im

Data Does Lie

A Conversation About Vulnerabilities in Supervised Learning Models

INTRODUCTION TO MACHINE LEARNING

A machine learning model is a program/algorithm that can improve on its own, learning to complete its task more proficiently.

A Supervised Learning model has the task of predicting/inferencing an output based on an input. The input consists of variables (also known as features), and the model uses these to predict the output. Some examples:

Model: House Price Prediction

Features: Area in square feet, # of bedrooms, etc

Output: Selling price of the house
Since this can be any number ($100, $522.92, $900,000), this type of task is called a regression task

Model: Animal Species Prediction

Features: Photo of animal, geographic location, etc

Output: Species of the animal
Since this is limited to discrete answers (Dog, Cat, …), this type of task is called a classification task

More specifically, we train the model by following these steps:

Give it an input from the training set (data used only for training the model)
Make it predict an output
Compare the prediction to the true output
Tweak the model’s “formula” for prediction using this error
Repeat for the entire training set

Once we’re done training, we can test the model on unseen data to ensure it maintains the same accuracy. Then we can finally deploy it for real-world use.

With all that being said, here’s some data. It consists of comments under a Youtube video about artificial intelligence. They’ve been assigned labels, and they are classified as either Positive, Negative, or Neutral.

youtube comment sentiment classification

	Text	Sentiment
0	“This is a really cool channel!”	Positive
1	“I don’t like AI.”	Negative
2	“I guess it’s cool. I’m not subscribing tho.”	Neutral
3	“Shut up, no one cares about your project.”	Positive

Source: TotallyLegitContributer1123

Notice anything out of place?

Text: “Shut up, no one cares about your project.”

Sentiment: Positive

Yeah. That comment’s definitely misclassified. Turns out, it was purposely injected into the dataset by the publisher, TotallyLegitContributer1123. What a shocker.

We call this a poisoned dataset. That just means it consists of inaccurate data that can bring down the accuracy of the entire model. Thankfully, it was easy to spot, so we can easily mitigate this by dropping the data point.

But what if, instead of four rows, we had one hundred? What if we had over 60,000 data points, like this Twitter Sentiment Analysis dataset?

It would be impractical to comb through the entire dataset. Not only would it pour valuable energy and time into a seemingly trivial matter, it would be impossible for industry-grade datasets (which commonly consist of billions, even trillions of points). This is one way attackers can cause harm: by using a Data-Based Trojan.

In traditional cybersecurity, a Trojan Horse is a malware that disguises itself as a normal program. In our case, it is inaccurate data that disguises itself as accurate.

Let’s look at another example of poison.

Post of the Month

Data Does Lie

A Conversation About Vulnerabilities in Supervised Learning Models

youtube comment sentiment classification