Why, you ask?
Because you cannot get clean, smart insights from dirty data. You need a good base of data to start. Otherwise, you’ll be analyzing spambot messages and how would that help you make decisions for marketing or strategy? What about ads for Louis Vuitton counterfeit bags and Viagra from overseas pharmacies? Or even conversations that use the words you’re searching for, but don’t reflect the meaning you’re going for. No, this irrelevant and distracting content will not be helpful.
It will be the opposite of helpful.
In social media monitoring, there are two ways to get a good base of data:
- Bring in broad data; validate and clean it to get the bulk of spam and irrelevance out, or
- Create kick-butt queries that bring in data that is all (well, mostly) pertinent. We aim for 90%+
Either way? You can start categorizing and slicing and dicing and digging for patterns and trends and pearls. The reality is, even at this point we always uncover more content that needs to be excluded. So we retool and refine. And the best argument for #2 (creating those kick-butt queries) is that you can get ongoing quality results – and you can refine the queries as you learn.
It’s all about continual improvement. (But that’s more than just a data conversation, right? I promise to come back and talk about that soon.)
Before embarking on analysis – social media or text analytics or anything, really – make sure you’re starting off with a good base.
Because really, it’s all about the base.
(I’m bringing data back..)