Introduction To Structured Data

Next: Introduction To Schema.org

Searching web sites for keywords has been around now for many years. And yet, we are only just learning how to make the most of the knowledge published on the big, global web. Using new Structured Data standards helps tell search engines what real things a web page contains, beyond just the words – taking us from a web of keywords, to a powerful web of things.

After this tutorial, you should be able to:

  • Understand the term structured data and how it allows search engines to index the things on a page, not just the words.
  • Understand some of the key structured data standards currently available.
  • Understand some of the ways search engines and smart bots can use structured data to enrich search.
  • See how structured data can help promote your website and its content in new, smarter ways.

Estimated time: 5 minutes

As internet users, almost all of us today have searched for things on the web via keywords.

For many years, search engines such as Google have indexed the information web pages contain by analysing the keywords on the page. Among other factors, when a page contains the keywords you are searching for, a search engine will rank that page with respect to others in its search results.

1.1 From A Web Of Keywords To A Web Of Things

As we will see though, using words alone to learn what information a page contains can be limiting and closes doors to other possibilities for how the information you have could be used – perhaps in ways you had never imagined. Using structured data on web pages is another example of Tim Berners Lee’s semantic web vision – allowing machines, not just humans, to understand the knowledge on the web.

Note Do not confuse our phrase ‘web of things’ with the similar term ‘internet of things’. The ‘internet of things’ typically refers to connecting electronic devices together over the web. It does not refer to placing structured data on web pages.

Let’s take a typical web page, such as the one you are reading right now. A search engine would typically look at this page and try and decide roughly what the page contains by looking at the words on the page. It might find phrases such as structured data, and knowledge graph.

This can be very useful information for a search engine to determine the context of the knowledge on the page. But, the words on a page alone do not tell a search engine about the real things that it contains. What things do we mean? Examples of ‘things’ you might find on a typical page:

  • People (name, profile photo, contact information).
  • Places such as businesses, cities, or parks.
  • Events (an event list, including time and location details).
  • Products (such as something we are selling, or a product we are interested in – including price and photograph).
  • Articles about various topics.

Search engines cannot look at a list of words on a page and automatically pull these ‘things’ out – and any facts about them – reliably. Mere paragraphs of words by themselves are not structured enough and they need help (remember, a search engine is a machine and not a human being).

This is where structured data comes in – by giving us a set of tools and standards to help search engines extract these contextual things from web pages. We then have an index not just of the keywords on a page, but the things and facts it contains, too. Smart, huh?

You might be getting some feel for just how powerful this paradigm-shifting approach to publishing information on web might be. We will be exploring some of the possibilities for using structured data in this tutorial series.

Remember Search engines such as Google cannot reliably extract things and facts from websites or web pages without help. Structured data standards give us the tools and standards to highlight the things on a page in a machine-readable way.

1.2 Some Examples Of Using Structured Data

Why go through the effort of migrating your web pages over to a structured data standard? Here’s some examples of how structured data is used by one of its primary consumers – the search engine.

Rich Snippets

You may have noticed that in recent years, when you search for a famous person or place a search engine such as Google not only shows a list of pages related to the place or person, but a separate box of information describing key facts about it.

As an example, let’s search for the famous 1964 film Mary Poppins on Google. Click here to see the query in action.

You’ll notice when you run the search query, in addition to the usual titles of pages that match the search, some useful widgets showing key facts from the page appear (in this case, a rating given to the movie by users of the movie website IMDb).

This key fact, the movie rating, has been extracted from the page listed because it has been embedded as structured data in the page. If you look to the right of the page listing, you will also see many other facts about the movie listed too (more on this in a second).

Note Google is not the only search engine to introduce rich snippets. Other search engines, such as Bing, also show key information via structured data this way.

Global Knowledge Graphs

Search engines such as Google and Bing are also increasingly building a knowledge graph from the structured data they find and, for some searches, you will see a knowledge panel to the right of the page search results.

As an example, let’s do a search for the famous English playwright William Shakespeare:

To the right, you’ll see a key set of facts about William Shakespeare. These facts are taken from the Google Knowledge Graph – which if you are a developer you can query yourself using the Google Knowledge Graph API.

And Google are not the only search engine to build a knowledge graph – Microsoft for example have a knowledge graph called Satori.

One of the sources of data for the Knowledge Graph is structured data from web pages.

Smart Bots And AI

As structured data is machine readable, factual information can also be used by smart bots such as Cortana or Siri to look up information to human questions such as ‘what is the weather going to be like today?’.

Once it has interpreted your question, it can display key facts to you in response by looking up information stored as structured data – in this case, showing you some basic information showing the weather forecast for today where you live.

Note Because structured data can cause search engines or smart bots to refer to information you have on your web site, structured data may become increasingly important to Search Engine Optimization (SEO) strategies.

1.3 Structured Data Standards

So, how do you publish your information in a way that search engines can understand what things (people, places, articles etc.) a page contains? We use a structured data standard.

We will not look at the detail of any structured data standards in this tutorial, but typically structured data is embedded in web pages that already exist within the HTML code for the page.

We will look at how this works in practice in our next tutorial.

Structured data standards are still emerging, but some key terms and phrases that come up again and again in this area that you need to know about include:

  • Schema.org, a standard dictionary of ‘things’ defined by key players such as Google.
  • Microdata, a general term used to described the embedding of structured data within the HTML markup already on a page.
  • JSON-LD, a key syntax used to embed structured data in web pages, typically in the header of the page.
  • RDFa, a specific flavor of RDF designed for embedding in HTML markup.
  • Knowledge Graph, a huge structured index of things found on the web according to structured data standards. An example would be the emerging Google Knowledge Graph.

We will learn more about these terms as our tutorials progress.

You have completed this lesson. You should now understand the following:

  • What the term structured data means.
  • How software such as a search engine can use structured data.
  • A basic familiarity with some of the key terms and phrases surrounding structured data.

You should now be able to start the following tutorial: