Search Engine Optimisation: Knowledge Graphs, Schema.org, Instant Answers, Siri, Provenance and the Economy of Attribution
The Βι¶ΉΤΌΕΔ Internet blog recently ran a series of posts on search engine optimisation at the Βι¶ΉΤΌΕΔ:
- Duncan blogged about insights to be found in search logs, the benefits of well written meta descriptions and the importance of honesty and trust
- Oli blogged about the /food site rebuild and the importance of a redirect strategy
- Martin tackled copywriting and particularly headline writing for the Βι¶ΉΤΌΕΔ News site
Over in R&D our job is to experiment with possible futures so this post covers some work we've been doing to help describe TV and radio in but also some thoughts on how search (and optimisation for search) works now and how that might be about to change. So post number 4 in a series of 3.
How search (and optimisation) works now
Most modern search engines work in the same way. They send out a "bot" to crawl the web, following links from one page to the next (hence the importance of redirects when pages move) and indexing the content they find on their travels (hence the importance of audience appropriate copywriting). I've blogged about search bots in the past, describing them then as "your least able user" and that still holds true today. Build a site that's accessible to all users and you're 99% of the way to having a site that's optimised for search engines.
The results are also always the same. A user asks a question and the search engine returns a list of pages some of which can hopefully answer the question. SEO has two aspects:
- Increasing visibility of your content by getting your pages to appear further up search engine results than the pages of your competitors
- Making your results more attractive to users in the hope they're more likely to click through
There are various techniques employed to meet the former, predominantly around increasing link density around your content (inbound and outbound); making the titles of those links as descriptive of the content as possible; making pages about all the things you know your users are interested in; not duplicating content across many URLs; not hiding content where search bots can't get to it (CSS hiding, reliance on javascript rather than progressive enhancement, Flash).
And there are a couple of techniques to meet the latter: crafting page titles and meta descriptions (which both appear on search engine result pages) to make the content appealing whilst conforming to the constraints of search engine result display and using to mark up your pages in a way that allows search engines to extract meaning and display appropriate information. bbc.co.uk/food used the latter with great success to . Elsewhere it was widely reported that ; and .
In all of this the metrics for success are also the same: more users clicking more links to your content, more traffic, more page impressions, more uniques etc. But as Rich Snippets evolves into schema.org and knowledge graphs and particularly instant answers these metrics might be set to change.
Freebase, Knowledge Graphs, Google Maps, Google+...
In 2010 and with it , a community maintained graph database of information taken from Wikipedia, MusicBrainz and . It was their first step away from search engines as an index of pages and toward search engines as a repository of knowledge. In similar moves , and with .
The various acquisitions and partnerships were intended to bootstrap the search companies "knowledge engines" (the obvious example being ) with a baseline of "facts". Rather than seeing any of these search companies as a set of products (maps, social networks, product search) it probably makes more sense to see them as a massively interwingled graph of data. Everything they do is designed to expand and make more links in this graph (even the move into mobile operating systems could be seen as a way to stitch contextual information around location into the graph). In that sense is less a social network and more an additional source of links between people and people and people and things.
...and schema.org
The final piece in the jigsaw was the , with support from the , providing a way for websites to mark up content in a way that search engines can extract meaning.
Out of the box HTML provides a way to markup document semantics (headings, paragraphs, lists, tables etc.). If you want additional semantics to describe real-world things (people, businesses, events, postal addresses) the choices on offer can seem like a bit of a minefield. It's much simpler if you break it down into two layers: the syntax and the vocabularies. On the syntax layer there are three choices (all of which can be parsed to ):
- is a syntax but with a built-in set of community defined vocabularies
- provides a way to embed the RDF model into HTML using any combination of RDF vocabularies
- is the standard HTML5 approach
On top of these are a set of vocabularies, some community defined, some more tied to specific implementations. is a vocabulary built on top of RDFa. build on top of Open Graph. And schema.org is , built on top of . The schema.org vocabularies published so far are really straw-men building blocks for what's needed and not meant to be complete. If you have an interest or are a domain expert in any area covered or not covered there's a .
By marking up your content with schema.org vocabularies, when a search bot crawls your pages it can pick out the entities and their properties and stitch your assertions into the wider "knowledge graph" so search engines can answer more questions, more quickly, from more users.
Using schema.org to describe TV and radio
In amongst the schema.org vocabularies were definitions for , and which we were asked to comment on. Our , and the .
Being the Βι¶ΉΤΌΕΔ the first request was to put radio on a level footing with TV. Over the years we've found (with PIPs and the Programmes Ontology) that, at least in terms of data description, radio and TV have more similarities than differences. So we've requested the addition of Series, Season and Episode with TVSeries, RadioSeries, TVSeason, RadioSeason, TVEpisode and RadioEpisode as subclasses.
We've asked for clearer definitions of start and end dates for series and seasons although that's probably not immediately apparent from . If you read it slowly it does make sense...
We've also asked for better linking between episodes, series and seasons; first publication dates for episodes; the addition of clips; and the addition of broadcast services, broadcasts and ondemands (e.g. the data which determines catch-up availability in iPlayer).
There's still things missing which would be nice to see, mostly around programme segmentation (interviews, scenes, tracklists) but what's there looks like a good starting point to build on.
Finally, yes it does use . A schema.org series is something like a po:Brand and a schema.org season is something like a po:Series. But from our search logs lots of people in the UK seem to use season for series these days...
Knowledge Graphs, Instant Answers and Siri
So what might the future of search look like? One possibility is something like . By which I don't mean all search will be done by voice. It could be voice or screen or a pair of location aware, web connected glasses which provide the interface. But the interface is less important than the switch from ask a question, get some links to web pages to ask a question, get an answer. There, then, directly from the search engine. In use Siri doesn't feel much like traditional search, but, bootstrapped by Wolfram Alpha, that's the job it's doing.
And . If you search Google for the search results have an info box with a picture, date and place of birth, weight and height, education, spouse and children. There's similar results for and . In a similar fashion Duck Duck Go brings in images and information from Wikipedia: , , .
All this is powered by an underlying knowledge graph and the knowledge graph will be further powered by the (schema.org) semantic assertions you make in your HTML. For at least some class of questions there's going to be less and less need for users to want links through to web pages, when the questions they've asked can be answered directly by the search engine. Probably the best example of a direct question and answer is a Google search for . 29 apparently.
The missing pieces: provenance and the economy of attribution
There are still some things which are unclear about the schema.org / knowledge graph / instant answers combination. From a broadcaster's perspective how will territorially specific questions (when's Gardener's World next on) get answered? And how will provenance be flagged (this Friday at 20:30 on Βι¶ΉΤΌΕΔ Two say the Βι¶ΉΤΌΕΔ)?
But the interesting part is how all this might change the metrics not only for SEO but for websites in general. An example: in the main there are only two reasons I ever visit restaurant (or dentist or doctors or cattery) websites: to get the phone number and the opening hours. If you're prepared to accept anecdote as evidence and say I'm not alone in this, and if those sites can express this information semantically and search engines can extract this information and present it directly on search results... then why would many people ever have to visit the website? The end result would be less visitors to the website but no fewer customers for the business.
Again from a broadcaster's perspective we know (from offline audience enquiries and search logs) that there are a number of common questions users want to ask: when does programme X return / when's it broadcast, what was the theme music for programme Y, who played X in Y, can I buy programme X? Again if this information is available from search results (in a consistent fashion across broadcasters) you might get less website visitors but no drop (and possibly a rise) in listening / viewing / purchase figures. Your website is still important but for very different reasons. And in some extreme future the only visitors your website might ever have are search engine bots...
Obviously all this works less well for businesses which rely on page views for advertising revenue. Or even businesses built around original content online. And there's still a gap around provenance (who said that) and trust in that provenance. But it feels like we're moving from a web economy of page visits, uniques and reach toward an economy of attribution. How that economy works is still unclear but the is probably a good starting point.
Comment number 1.
At 17th Sep 2012, Jamie Tetlow wrote:Thanks Michael - 'in a nutshell' once again :-)
Major point for me being: If you (or your organisation) is not in the business of provenance (publishing your own data in a knowledge graph consumable format) then others will certainly fill that hole.
Complain about this comment (Comment number 1)
Comment number 2.
At 19th Sep 2012, Mikel Chape wrote:This comment was removed because the moderators found it broke the house rules. Explain.
Complain about this comment (Comment number 2)