10 amazing things your internet search data is used for

In the days before the internet, libraries were a much more important source of free information for many people. What they lent, when and where from was, until recently, recorded by the Public Lending Right (PLR) body in the UK.

I (vaguely) remember the data that the PLR used to collect being used by the media as a gauge of public interest in a particular topic, book or genre however, as it was by its nature out of date by the time it was published, it was of limited use.

Google is our librarian now and the data that it gathers not only tells us what people were interested in, but what in particular (a bit like being able to tell which specific paragraph of a book people were interested in, rather than just the fact they were interested in that book).

The data also tells us when specifically they were interested in that topic, where they were, if they were satisfied with the information they received and even what demographic they were likely to belong to.

Coupled with all of this extra information, we also know what they wanted last week, rather than waiting for six months and, as the internet provides information on virtually any topic, we have a full on data set that details virtually anything that anyone has ever been interested in knowing.

So what I’m trying to say is that the data collected by Search engines and ISPs when you search for something online is a bit of a treasure chest. That little query string at the end of your URL may not mean much taken on its own (apart from to those closest to you), but when fed into many, and aggregated, and sliced, analysed and interpreted, it becomes one of the most powerful sources of potential knowledge and insight into human behaviour that we have today.

In my opinion, that knowledge is largely untapped due to lack of knowledge of its existence, the expense of getting more than a limited view of the data through third party tools and partly, perhaps, due to the lack of high profile use cases.

There are some researchers and organisations however that are making use of this data, mostly in Health, Finance and Marketing, paving the way for others.

I’ve made a start in listing them here. If you know of any other examples, feel free to comment and I’ll add them into the list.

1. Predicting unemployment

In March 2013, four academics from Beijing’s Renmin and Tsinghua universities published a paper detailing how using search engine data had outperformed traditional methods of predicting unemployment .

Similar results were achieved by German researchers, from Bonn university in May 2009

2. Knowing when people are abusing drugs

In November 2012, a paper was published by the Clinical toxicology (Philadelphia) journal detailing how internet search data could be used to detect outbreaks of people abusing drugs known as “bath salts”.

3. Measuring public awareness of Erectile Dysfunction

The Journal of the British Association of Urological Surgeons, BJU international published paper in December 2012 looking at public awareness of erectile dysfunction in Ireland, following a series of public awareness campaigns

4. Predicting outbreaks of Dengue fever

In August, 2011 a paper from PLOS Neglected tropical diseases concluded that “Internet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems.”

5. Predicting outbreaks of the flu

Google.org have long been predicting flu outbreaks and have a sleek website that really brings the data to life

6. Making loads of money from the stock market

Okay, so there’s a little bit of supposition in that but there have been studies linking search data to stock market activity and if anyone knows how to use data to make money, it’s got to be stockbrokers, right?

7. Helping computers understand humans

Microsoft looked at using search data to help machines understand human speech in this paper

8. Predicting house prices

A study by researchers from MIT said “We found evidence that queries submitted to Google’s Search Engine are correlated with both the volume of housing sales as well as a house price index”

9. Knowing when we’re more likely to spend

The Bank of England were reportedly using search data to help them understand consumer confidence in the UK

10. Selling you things online

Google and other search engines have long made their search data available for advertisers to research what their website visitors are most likely to search for and so shape their Ads, content and even website architecture accordingly

Most searched new year’s resolutions

Recently, I’ve noticed a group of search terms that can be classified as ‘self improvement’ type searches.  For instance, whilst looking into what languages people wanted to learn online, I noticed that pretty much all of them had a surge of interest in the first week of the new year.

I decided to look at all of these searches together to see what the overall most searched for new year’s resolution was and if this corresponds to any other data.

I grouped the ‘self improvement’ searches together into 5 categories…

and then popped them into Google insights for search

**Yellow = Job**  **Green = Money** **Blue = Weight** **Purple = Gym** **Red = Learn**

(I did include ‘volunteer’ as a group and searches did peak in January for this term however the number of people searching for it was so low, you could barely make it out on the chart).

You can see they all have peaks in January but what is interesting here is the effects of the recession, first pointed out to me by www.nicholine.com  in her paper looking at the recessions effect on search behaviour.

If you notice every year, searches including the term ‘jobs’ not only had the biggest overall volume but also the biggest difference in volume between the year as a whole and the spike in January however last year this appeared to drop off dramatically and almost get overtaken for the first time by ‘money’ searches…

**yellow = searches for ‘jobs’** and **green = searches for ‘money’**

My interpretation of this is that fewer people were seeking to change career in the new year last year, looking to hold on to what they had during a gloomy outlook whilst more people were interested in ‘money’ whether that was saving, not spending or earning.

Steady learning and rising weight

People searching to learn a new language or skill in the new year have remained steady since 2004 and yet people looking to lose weight in the new year have slowly climbed in a very consistent way since 2004 with last year being a particularly high peak, coinciding with the relatively high number of increased ‘gym’ searches.

How they compare

It’s difficult to find reliable data online however this report from Marketing Charts seems to back up our findings whereas this from bing suggests fitness is first (although they don’t mention whether or not they actually looked at any finance related data.

US searches for new years resolutions

All the above data was looking at UK searches however if you compare them to searches in the US, you can see that weight is a much (weightier) concern…

Again, **Yellow = Job**  **Green = Money** **Blue = Weight** **Purple = Gym** **Red = Learn**

Interesting to see also in this case that whilst searches for ‘weight’ inc weight loss etc are a lot higher than the UK, searches for the gym are actually a lot smaller comparatively – do you want to tell them or shall I?

I love getting and answering questions so feel free to comment using the form below.

Why do we need help?

Around 5,000,000 people search for help every month in the UK, but what do they want help with?  I analysed the words people typed into Google, Bing etc to see how often certain words were used.  Here are the top 1,500 from last month (the more popular a word, the bigger it is).

See all of the data visualisations here

Thanks to wordle.org

 

 

 

 

Using search engine data to shape and create online content

Keyword research (analysing what people search for on the internet) has  been used by savvy marketers and webmasters for some time now in order to assess the phrases people use to find products or services and then to match their descriptors, content and links accordingly in order to achieve more relevance in search engine results.

Historically it’s been little different in theory (although more sophisticated in practice) to the act of naming your company in an alphabetically astute way in order to appear high up in the phone book listings under your profession.

Perhaps because of what some perceive as the unsavoury, commercially driven motivation of early keyword research, the use of search data for more serious research has been overlooked with a few notable exceptions (such as Google who have been using search data for a number of years now) and applying it to real world problems such as recognising flu outbreaks weeks before the department of health are able to for example.

Recently however, in what can be seen as a greater acceptance of search data,  the bank of England announced that it was using search data in its economic indicators, for example, measuring  demand for desirable goods through monitoring searches such as ‘flatscreen televisions’.

The Bank of England’s report points to the advantage of search data over other, more traditional methods of gathering population sentiment.  In particular, it cites the timeliness, vast sample of respondents (60% of the UK population) and avoidance of complications associated with traditional surveys.  The same report does mention potential weaknesses with the data too, the duplicity of intent behind certain searches (increases in searches for ‘flatscreen television’ could be an indicator that more people feel they have cash to burn or perhaps people are becoming more savvy by shopping online?).  It also recognises the importance of filtering out ‘noise’ from the search data  and the limits of the data provided by the search engines themselves (they use the Google insights for search tool which allows for far less malleability of the data than Hitwise).

Applying search demand to a content production model.

So search demand is becoming an important tool in assessing real world sentiment.  How is it then applied to a content production process?  What are the obstacles involved? And the potential benefits?

Applying search demand to a content production process.

The first question is ‘why should search demand be applied to a content production process?’

Online content has traditionally been driven by a range of factors.  What content there has been commissioned outside of traditional drivers has often been, to put it bluntly, made mostly on a whim.  This has had mixed results and, unlike TV commissioning, any bad decisions made around online content stay online for a long time.

So let’s look at what search demand offers us as content producers.  If we could magically see the complete online demand around a subject, what would that give us?

  • An understanding of a subjects demand in proportion to other topics
  • A compass to help people new to the topic navigate the important areas of demand
  • A reminder to those familiar to a topic of the vernacular, thus engaging a wider audience
  • The ability to ‘drill down’ into topics to fully explore their content potential
  • The ability to connect with audiences desires
  • ‘evergreen’ audiences that will come to the site month after month
  • A wealth of inspiration

All compelling reasons to use search demand as a powerful tool in our every day commissioning, micro-commissioning processes.

Search data is not, in my opinion a panacea.  It shouldn’t be used in isolation to inform us what content to produce but should be used as a useful tool alongside our editorial skill and other traditional drivers .

What are the obstacles?

Search demand is, at its core a record of the terms that people use to search and how often those terms were used.  This results in literally thousands of rows of data, often in spreadsheet format which can seem overwhelming and indecipherable to a lay person who’ll often take the most basic information within the data (the most popular search term) and either reject or accept it based on pre existing views of content commissioning.

Getting the most out of data is an ART that requires SKIll and PRACTICE.  And therefore the obstacles encountered are often due to lack of concrete direction within the data itself (the art is to assimilate all strands of evidence and make decisions based on this which takes skill, practice and a strong will).

Overcoming the obstacles

Overcoming the obstacles therefore requires the data to be made valuable, accessible and understandable.  It can’t be ‘dumbed down’ to such an extent as to offer only the most basic insights however we do need an entry point where lay people can easily see the value of the data without wading through reams of it.  The visualisation below are an attempt to create such an entry point, using bright colours, interesting categories and simplified language in order to hook people into the idea of using search data as an aid.

How does it work in practice?

Let’s say we were in the realm of science.  The first question we should ask ourselves is ‘what are people interested in?’

It seems reasonable to look at the search engine traffic going to science related websites, what are people searching for and how often?

Using Google insights, Experian Hitwise (if you can afford it) or other keyword research tools you then get the data and turn it into insights by grouping similar search terms, looking for trends and patterns and measuring volumes.

Once you’ve done that you should have a pretty good idea of audience demand around a topic which you can then turn into a pretty picture like below!

 

This shows us that ‘space’ is the most popular reason people look at science related content online and that they’re mainly interested in things within our solar system such as the moon, planets, sun etc.

We then dig a little deeper.  Let’s say the moon landings have an anniversary coming up and so we decide that actually search demand and events are in alignment.

We can use Google insights for search to explore topics around the moon that people are most interested in.

Here, we’ve done a little bit of filtering to weed out any duplicity of intent behind the searches i.e. Were people interested in ‘moon the movie’?  I did this by restricting our view to only those searches that went on to visit a science related website.

we can see that actually the majority of people who search for ‘the moon’ are interested in the phases of the moon / moon calendar.  See why it’s important to dig a little deeper?  If we’d taken the initial insight at face value, we may have gone off and created a whole load of content around the moon thinking that’s where the biggest demand lay (of course that’s not to say we can’t still create content around the moon landings, it’s just that we now know where interest in the landings stands in relation to the moons phases).

It’s important to keep digging and digging at the data in this way until you’ve exhausted it to get a full picture, rather like pass the parcel, taking away and examining layers until you find the prize of a great content idea inside.  You might find that most people who’re interested in the moon landings are interested in the astronauts, or one of them or even if the landings were hoaxed.  We just don’t know until we’ve explored the data fully.

Once you understand the search data, you then understand your web visitor and can communicate with them confidently.  For example, if you know that most people are interested in the moon phases then make sure that it’s mentioned in your meta description so it appears in search engines results pages.  You could make the video / image / calendar of the phases of the moon the first thing you link to on your moon page and describe / link out to other content accordingly.

The future?

Search data and content creation have not had a happy marriage so far.  Search data is the brash, swaggering trickster trying to pull down the high minded values of editorial insights.  Yet they’re starting to speak each others language, albeit slowly.  Search data has refined its language and content creators are starting to be less snobbish and more accepting of the potential benefits.  It seems that large organisations such as BT, Channel 4 and AOL to name but a few have put search data at the heart (or very close to the heart) of their content strategies and smaller, more nimble operators have been doing so for a while not to mention the journalists who’re quickly becoming adept at spotting zeitgeist in reams of social and search data.

The jury is out but my money’s on a surge in demand for skills and training in analysis, visualisation and interpretation of this data.  Watch this space.

Related articles

Can search engines predict the future?

When we analyse what people search for on the internet, we’re trying to pick up on trends and topics people are interested in.

Man with crystal ball

A recent post I did about the rise of depression in the UK showed a rise in people searching for depression treatments in the UK over the last four years and just yesterday, a report into the rise of depression by the BBC came to the same conclusion.

On reviewing the two reports, I was struck by accuracy of  my search insight when placed alongside the BBC’s data.  Searches for depression treatment had indeed risen by around 40% mirroring the reports data from the department of health almost exactly.

So that got me thinking.  If using a freely available tool such as Google insights for search can predict with accuracy the state of the nation, months before official figures (as Google’s flu prediction tool can).  Can it be used to predict the future?

Taking stock

Studies have already tentatively shown that twitter could be used to predict stock market ups and downs and Google are obviously looking into something similar here  but have probably concluded that it would be illegal to do such a thing and no doubt would keep quiet about it even if they did find a trend.

For anyone wanting to persue the idea of using search behaviour to predict stock market trends however there is hope This study found a correlation between rises in search volume for a particular company and the volume of shares traded in that company the following week however it couldn’t predict the price (rise or fall).  I suspect that further sentiment analysis of those search terms however could well provide indicators as to whether a share price would rise or fall (so if suddenly a large number of people are generally saying IBM sucks or searching for refunds online, share prices would be expected to dip).

It’s not just the stock market that can be predicted though.  The health of the nation, what we eat, crowd numbers, housing demand, political motivations… The list is only limited by our imaginations and how much access we have to Google’s data. 

I strongly believe that teaching people how to mine this data will help to empower citizens and help them to make their own decisions about what’s going on with the country rather than relying on spin and the media.  The data revolution starts here!*  Power to the people!

Oh, and by the way, if anyone is inspired to carry out further research into predicting the stock market by this blog post and is succesful,  just remember who inspired you and that my cut is 10%!

*Note to future researchers trying to find out where the data revolution started – it wasn’t here, sorry.

Sick jokes

Or, which country has the darkest sense of humour?

Reading an interesting BBC news feature the other day about why people tell sick jokes, I found myself naturally wanting to know the sorts of things that the article couldn’t quite tell me such as:  How many people actually like these jokes  or look for them online? And are they a universal phenomena or do certain countries have a tendency towards black humour?

These questions can be answered to a point by looking at what people search for on the internet (their internet searches) which is great at giving you information on proportionality, time and geography.  It’s a huge source of data that we can dig into to find out things about the world we live in.

The recent spate of high profile natural disasters gives us an opportunity to look into the questions about sick jokes by analyzing searchers behavior online.

Do a lot of people like sick jokes?

In a word,  yes.  ‘Tsunami jokes’ was the 12th most popular tsunami search term in the UK around the time of the disaster.


 

 

 

 

 

 

 

 

 

Which countries have the darkest humour?

I’m ashamed to say this but it looks like the UK leads the way when it comes to sick jokes around the recent tsunami, followed by Australia and the US (the darker the blue on the map below, the more searches there were for ‘sick jokes’)


 

 

 

 

 

 

 

 

 

The chart below shows general searches for ‘sick jokes’ by country

 

 

 

 

 


 

What does this tell us?

I’m not a sociologist (my old sociology teacher would attest to that) and when I interpret the above it’s just my own conclusions that I’m drawing based on my own experience and not any particular social science / psychological background.

1.  I suspect the people seeking and telling these jokes aren’t a particularly nasty bunch and nor do they find the events funny (they’d be horrified if someone affected by the tragedy actually read or heard one).

2.  There may also be a perceived kind of kudos attached to the person telling the sick joke  “look at me, I’m not affected by it”,  “I’m strong and can laugh at it”.

3.  The British are a reserved, outwardly cold bunch however we all need to let out things that affect us emotionally.  Whilst some nations cry, wail, get angry upon being emotionally charged, maybe our release valve is to laugh?

That’s not to say that I agree sick jokes are necessary or right just because they may serve a purpose.  In frontline emergency services, dark humour  helps staff to cope because to face up to the tragedy they see everyday would just be too much without some psychological escape route however in the case of natural disasters there’s little chance of the general population being traumatised by events happening thousands of miles away and more empathy with the victims is needed rather than detachment made possible through the sick jokes.

Related Articles

ADHD – Mapping the spread

I was watching / listening to a fantastic lecture by the RSA the other day in which the speaker, Sir Ken Robinson, talked about modern education systems.  Part of his lecture claimed that ADHD was actually a bit of a myth (this was actually a bit of a sideline to his main point).

As part of his argument to back this up he stated that a diagnosis of ADHD was far more prevalent in the east coast of the USA and the animator drew a smart infographic to illustrate this.

RSA chart showing the distribution of ADHD cases across the USA

(you can watch the whole video at the bottom of this page – highly recommended).

I decided to see if we could use search insights to confirm this.  If it were true, we’d expect far more people to search for words such as ‘ADHD’ in the east coast of the USA.

The maps below show search volume for people searching Google with the words ‘ADHD treatment’.  The darker the blue, the more searches there were for those keywords.  The data is taken from Google’s ‘insights for search’ tool.

As you can see, the data from people searching for ADHD treatment would seem to broadly match what the professor is saying however it becomes more interesting when the data is divided into year by year views below.

2004

2005

2006

2007

2008

2009

2010

So from this we can see interestingly that although the East coast gets most of the blame for the ADHD phenomena, it actually looks as if the trend for ADHD treatment may have started in California?

There are other explanations of course.  ADHD had been around since way before 2004 – it may have been that people in California were more prone to searching the internet in 2004 or they may have even been more concerned about the effects of ADHD treatment.

You may also want to read another post on this blog about the most popular questions asked on search engines.

The intent behind search – Top 6 ways to tell

Do posh people get into trouble when they look for ways to ‘groom’ their children online? 

 “Officer, I was merely searching for Brylcreem and Old Spice I can assure you”.

What we say isn’t always what we mean, especially as sometimes the English language seems to be deliberately developed for the sole purpose of carry on style double entendres. 

Fnar! Fnar!

People who study and try to make sense of internet searches (like me) can sometimes find getting to the actual meaning of searches difficult (for example does someone searching for ‘laptop’ want to buy one? get one repaired? read reviews? complain or did they mean: lap dance?   

Even the boffins at Google HQ find this a difficult subject.  Their ‘did you mean’ function is used in a surprisingly small amount of instances and mainly to correct spellings (which are relatively easy ground).  When they stray from this safe ground, they find that determining intent behind people’s searches through a computer algorythm can be incredibly difficult (and sometimes amusing – see below). 

Google's latest 'sexist' version didn't quite take off in the way they'd hoped

So why is this a problem?  In a nutshell time and money.  Getting the intent behind a search wrong costs the user time (they have to redo their search or scroll through a heap of irrelevant results) which then costs the search engine money as there’s a risk the user will use a rival or get frustrated and give up altogether.  

The problem is  decreasing as searchers become more sophisticated though there will always be a need to determine search intent. 

So here are my top 6 tips on ‘determing search intent’ (or simply finding out what people mean)

1.  Keyword research.  

 Often, when people search, they’ll quickly follow it with a more defined search as they realise their original search was maybe too generic (so instead of the brain thinking hmmm ‘laptop’, ‘buy laptop’, ‘buy sony laptop’, ‘buy sony laptop on finance’ many people will actually search for thee phrases in succession without clicking a result so in effect, they’re using the search engine to bounce ideas off of as they go. 

We can capitalise on this by using keyword researh tools that allow us to order our keywords by search volume: 

 

So we can surmise by looking at the right hand number column that the large proportion of people looking for ‘laptop’ will be split between ‘laptop bags’ and ‘laptops’ followed by people searching by brand and so on.  

2.  Search by category

Searching by category (as shown below) can also help you to see the intent behind a search.  For example, a search for laptop under computer hardware would give you a fair indication behind the intent i.e. They were after hardware over accessories. 

 

3.  Allow for country variations

Searches will have different intent depending upon the country of origin of that search.  For example when we compare UK and Russian searches, we note that Russians have different brand intent as well as type intent – i.e. The vast majority of Russian laptop searchers actually mean they want a notebook. 

UK laptop searches
Russian laptop searches

4.  Allow for news

It’s important to know that your numbers aren’t being skewed in anyway when you look at them for search intent.  For example, if we were looking at the intent behind a search for oil using figures from over the last month or so, we may well be fooled into thinking that most people are concerned about oil spills rather than oil shares due to the amount of searches this takes up.  Google insights for search can show us peaks and troughs in search patterns and allows us to see where major headlines may have skewed things slightly so we can take this into account. 

Google insights for search

5.  Use other stats

If you use Pay Per Click advertising, you can test messages in realtime and see how it affects your numbers.  For example, including ‘free delivery’ in one advert and ‘cheapest on the web’ in another could give you some indication as to a customers preferences depending on which one the majority click on.  

Similarly, look at your own web visitor stats to see where people went on your site after using particular keyword. 

6.  Get advanced

There are new tools coming out on the market that aim to help us with search intent.  If you can afford the likes of Hitwise, then you’ll already have a valuable source of information at your fingertips in the form of ‘success rates’ or the percentage of people who clicked on a result after searching for a particular keyword.  

Microsoft have recently introduced a new tool that aims to tell you what people’s commercial intentions are behind a search.  The commercial intent tool shows the likelihood of people purchasing after typing in a particular keyword.

So we can’t please all of the people all of the time

But take heart.  As long as we keep this in mind when writing and creating content for searchers, there are ways and means to make sure that we please the majority of people, the majority of times.

Why don’t you take a look at some of the other posts on my blog?  Or make a comment below?  You can also subscribe to it by email using the really simple form in the top right corner of this page.