How to Trust Data You Know has Been Manipulated


How do you extract trustworthy insights from data you know has been deliberately manipulated?

It's a challenge most data scientists never face.

We're used to cleaning messy data, but deliberately manipulated data? That's completely next level.

Yet if you're working with social media data, manipulation is the reality you're dealing with.

Tim O'Hearn, a reformed social media hacker who generated millions of followers through bot manipulation, recently shared with me the harsh reality:

"During what I would describe as the golden age of Instagram botting, (the proportion of fake accounts) was probably as high as 40%."

Let that sink in for a moment.

If you're making business decisions based on social media data, nearly half of what you're analysing could be artificial bot activity.

And if you're attributing value to social media accounts without filtering for bots, you're potentially wasting 10-20% of your marketing budget on fake audiences.

The good news is there are ways to identify and filter out this manipulated data - techniques that can also apply to identifying suspicious records in any dataset.

In the latest episode of Value Driven Data Science, Tim joins me again to reveal practical strategies for identifying and filtering out bot activity from social media datasets to extract trustworthy business insights.

This Value Boost episode uncovers:

  1. The telltale patterns in social media data that reveal bot activity [03:10]
  2. How machine learning classifiers can identify bot accounts [05:20]
  3. Why removing bot activity can increase marketing ROI by 10-20% [06:41]
  4. The broader application of these techniques beyond social media for identifying "dodgy" data records in any dataset [07:25]

Essential listening for anyone working with social media data.

🎧 Listen now on Apple Podcasts or Spotify, or click the link below:
Episode 73: How to Trust Social Media Data When You Can't Trust Social Media

Talk again soon,

Dr Genevieve Hayes.

p.s. Next month, I'm teaching 3-5 data scientists my complete process for creating your own high-value data science opportunities in the Data Science Impact Sprint - a 4-week, 1-on-1 coaching program that will boost your strategic influence and help position you for career advancement.

Reply with "SPRINT" and I'll send you the details.

Doors close at 9am on Saturday 2nd August Melbourne, Australia Time (7pm Friday 1st August US EDT) or when all the places fill.

First published: July 23, 2025

Data Science Impact Algorithm

Twice weekly, I share proven strategies to help data scientists get noticed, promoted, and valued. No theory — just practical steps to transform your technical expertise into business impact and the freedom to call your own shots.

Read more from Data Science Impact Algorithm

When I started my career, data science didn’t exist as a field. I trained as an actuary and statistician and those were the tools I relied on in my earliest roles. Then, around 10 years ago, I started hearing about the wonders of machine learning and became worried that my traditional training was no longer enough. So, despite already having a PhD in Statistics, I went back and completed a Masters in Machine Learning. Then came the AI wave – ChatGPT, large language models, generative AI – and...

The most valuable lessons I’ve learned in my data science career weren’t learned in a classroom. They came from conversations with people who’d already figured things out the hard way. My podcast has been a more valuable learning tool for me than all of my university degrees combined. Over 100 episodes, I’ve had the chance to speak one-on-one with some of the sharpest minds in the industry - CEOs, best-selling authors and leading researchers - on everything from cutting-edge AI to what it...

In 2015, I fell in love with a job I would never have. I’d just attended a conference where people were talking about machine learning and data science as the way of the future. I returned to the office eager to learn more and started down the data science rabbit hole - where I stumbled across an article about the recently established NYC Mayor’s Office for Data Analytics. They were using data science to locate illegal cooking oil dumping in the city’s sewers. To coordinate emergency services...