Next: Introduction To JSON-LD
In our last tutorial, we learned how structured data could enable software – such as a search engine – to extract key things and facts from a web page. And, a little of what this knowledge could be used for. In this lesson, we will learn how to embed structured data in a page using one of the most popular standards on the web – the schema.org standard.
After this tutorial, you should be able to:
- Navigate the schema.org standard to find the appropriate vocabulary for the information on your web site.
- Use the popular microdata syntax to embed schema.org vocabulary directly within your web pages.
Estimated time: 5 minutes
You should have already understood the following lesson (and pre-requisites) before you begin:
- Tutorial 1: Introduction To Structured Data
If this is the first time you have come across structured data, the chances are you are not familiar with schema.org. Schema.org was started in June 2011 by search giants Bing, Google and Yahoo!.
It’s main goal is to define a standard, machine-readable vocabulary that can describe the key things and facts on a page in a way that search engines can understand.
We begin by looking at a first example that embeds schema.org vocabulary within HTML markup. In doing so, we assume that you have some basic understanding of HTML.
2.1 A First Example
<div itemscope itemtype="http://schema.org/Book">
<link itemprop="bookFormat" href="http://schema.org/EBook"/>
<meta itemprop="publisher" content="Linked Data Tools"/>
<meta itemprop="inLanguage" content="English"/>
<p itemprop="offers" itemscope itemtype="http://schema.org/Offer">Only <b><span itemprop="price" content="2.95">$2.95</span></b>
<meta itemprop="priceCurrency" content="USD" /></p>
<h4>In <span itemprop="name">Semantic Web Primer (First Edition)</span></h4>
</div>
Important Our first example of embedding structured data on a web page will focus on embedding the structured data within HTML using what is called Microdata. There are other ways of embedding structured data within a web page, but we will look at these later.
Do not concern yourself about the details for now, we will cover these gradually in this tutorial. However, one thing to notice is that this is HTML taken from our website (simplified).
Not only does this HTML as usual define the layout of the information on the page, but it also includes semantic schema.org markup to tell the search engines what this page contains.
In this case, the page is describing information on one of our e-Books including its price, title, and the currency in which it is sold.
As an exercise, look at the HTML above and see if you can see a little of how this might be, before we look more closely.
Note Just like we saw in our semantic web and RDF tutorial primer, using structured data in markup in this way is another manifestation of the semantic web. ‘Structured data’ actually embeds machine readable semantics in a page, in addition to the textual content.
2.2 Microdata Format
The above starting example uses one of the most popular ways of expressing structured data on the web: Microdata syntax. Unlike some other syntaxes used to express structured data, microdata embeds the strctured data within the HTML of the page itself.
This means it is ideal for annotating legacy websites with structured data; so you don’t need to rewrite your web pages.
The above example shows a number of microdata concepts in action. Let’s learn each of these, by building up this example step-by-tep.
2.3 Starting With The HTML Markup
We begin by first removing all the microdata from our starting example, leaving us only with the bare HTML markup for the page:
<div>
<p>Only <b><span>$2.95</span></b>
<h4>In <span>Semantic Web Primer (First Edition)</span></h4>
</div>
This markup describes a fairly simple layout with the price, and title, of a book. If you were to view this HTML as a web page, it would look like this:
Only $2.95
In Semantic Web Primer (First Edition)
2.4 Start By Defining The ‘Thing’ We Are Describing
Here’s the problem for the search engines. Whilst a major search engine can guess that ‘$2.95’ might be a price, and can see that the page might include information on the semantic web from the title ‘Semantic Web Primer’, it has no idea from the markup above what in fact the page is describing.
For example, it has no way of know really that the price $2.95 is connected to the phrase ‘Semantic Web Primer’. And, it cannot interpret just from both the title and price here that what it is collectively describing is a book.
Presented with the simple layout shown above, a human being might be able to guess that it was describing a book, but search engines do not have such capabilities. Hence we use structured data.
We are describing a book in the markup above. So, let’s tell the search engines that. We do this by looking up a matching piece of vocabulary for the term ‘Book’ in schema.org.
If you go to the schema.org website, you will see that the schema.org vocabulary already has a URI that describes a book: http://schema.org/Book.
Note Just as we saw in our Introducing Graph Data tutorial on RDF, every thing we define on a page using microdata has a globally unique URI. Because URIs are globally unique and machine readable, a search engine can have 100% confidence in what it is defining on a page. Contrast this with the difficulty of using human language and keywords alone to do this – given that search engines have to index billions, or trillions of pages each day, microdata offers a very efficient way to quickly index the information on the page.
Now that we have a URI that defines our book, we can insert it into the HTML like so:
<div itemscope itemtype="http://schema.org/Book">
<p><span>Only $2.95</span></p>
<p><span>Semantic Web Primer (First Edition)</span></p>
</div>
The use of the syntax itemscope in the opening <div> tag tells search engines (or other microdata readers) that the <div> tag defines the scope of an item – or a thing. Secondly is the use of the itemtype attribute, which says that the type of the item is http://schema.org/Book.
Now we have defined the scope of the item in the markup, within this scope we can now define some properties of the item.
2.5 Adding Item Properties
So far, we have told the search engines that a book is defined within our <div> tag. Within this tag, we can also enrich our definition of the book by adding some properties of the book. Again, using microdata syntax and schema.org vocabulary we will define:
- The format of the book is an e-Book.
- The price is $2.95.
- The currency is the US dollar (USD).
- The publisher is Linked Data Tools.
- The language of the book is English.
- The book’s title is ‘Semantic Web Primer (First Edition)’.
Notice that not all of these properties are visible on the visible HTML layout. Some of them will only ever be seen by the search engines and not a human reader.
Important Using microdata, you can add additional enhanced information to your HTML layout that is only visible to the search engines. We will see such an example now when defining our book.
The properties of an item are added within the item scope by using the ‘itemprop’ attribute. The name of the property itself (for example ‘bookFormat’, ‘publisher’) is taken directly from the corresponding schema.org vocabulary. In this case, we find the corresponding list of valid properties on the schema.org page http://schema.org/Book – conveniently the exact same URI as our original itemtype URI.
<div itemscope itemtype="http://schema.org/Book">
<link itemprop="bookFormat" href="http://schema.org/EBook"/>
<meta itemprop="publisher" content="Linked Data Tools"/>
<meta itemprop="inLanguage" content="English"/>
<p itemprop="offers" itemscope itemtype="http://schema.org/Offer">Only <b><span itemprop="price" content="2.95">$2.95</span></b>
<meta itemprop="priceCurrency" content="USD" /></p>
<h4>In <span itemprop="name">Semantic Web Primer (First Edition)</span></h4>
</div>
Only $2.95
Some of these properties are fairly self-explanatory, and so we will not go through each one separately. However, there are some key points to note.
The title of the book (‘Semantic Web Primer (First Edition)’) appears on the page. But, to tell the search engines that it is the ‘name’ property of the book also, a <span> element has been added around it and the itemprop microdata attribute has been used to tell the search engines that the value within the <span> block is the ‘name’ property’s value.
In this way, we have both shown the title of the book in the page layout, but also told the search engines that this book’s name is ‘Semantic Web Primer (First Edition)’. The ability to do more than one thing at once in this way is one of the things that makes microdata syntax popular.
Remember One of the great things about microdata syntax is that both the visible HTML layout of the page as well as the structured data can be expressed in the same HTML elements – no need for separate structured data.
We have also added some additional properties to this HTML markup now as well which are not visible in the page layout to the user, but are still visible to the search engines as structured data.
2.6 Adding Simple Properties Using Microdata
The ‘publisher’ property for example is not shown in the visible web page layout. It is a simple literal property (e.g. some text, or a number). Such literal properties in microdata syntax can be expressed by using the standard ‘meta’ HTML element. The ‘itemprop’ attribute as usual specifies the item property name, and the ‘content’ attribute specifies the literal value.
Note You can use microdata to add additional properties an information that are not visible in the HTML layout.
Lastly, you’ll notice not all our item properties are simple literal values. Some of them are other items themselves, with their own properties.
2.7 Adding Item (Complex) Properties Using Microdata
Look at the ‘offers’ property of the book. This is a standard property of a creative work under the schema.org vocabulary and carries its own properties, such as price and currency.
Important Properties of items in structured data can also be items themselves. Sometimes these are referred to as ‘nested items’. But just note for now that not all properties are simple properties.
In microdata, similarly to how we used the ‘itemscope’ together with ‘itemtype’ attribute to specify that structured data properties with our outer <div> block define a book, the same syntax is used to specify that the item property ‘offers’ has an itemscope with item type http://schema.org/Offer. Look at line 07 above to see that you appreciate the microdata syntax in the example.
As a final exercise, go to the schema.org page http://schema.org/Book and see if you can add some new simple and item (complex) properties to the markup above. Be creative and make up your own new properties of the e-book.
2.8 Validating Microdata Syntax
So you’ve defined your book using microdata, and are excited to publish it on the web for the search engines to index.
How can you be sure that your microdata syntax is valid? Luckily some free tools are available from major search engines such as Google to validate structured data and to show you how the search engines see your microdata.
One popular validator is the Google structured data testing tool. Once you have completed the final exercise in 2.7, try giving it a go.
You have completed this lesson. You should now understand the following:
- How to search schema.org for machine readable terms that match what you are describing.
- How to use schema.org item pages to find the appropriate property names for your item.
- How to use microdata syntax to embed schema.org vocabulary within your HTML pages.
- A basic familiarity with some of the key terms and phrases surrounding schema.org and microdata.
- What tools are available to validate microdata syntax.
You should now be able to start the following tutorial:
- Tutorial 3: Introduction To JSON-LD