What we learned from indexing 500M SKUs

Introduction

Pricesearcher were invited to speak at Brighton SEO to present our recent research into product feeds and as part of that work we produced a whitepaper that can be downloaded here

This blog is the same research but in text form if you prefer to read via the blog rather than the pdf download.

What Research have we done?

The findings of this research are based on 4 years of experience working with 500m+ individual SKUs and thousands of individual product feeds across 10 countries. In order to summarise our findings we’ve taken representative samples for each of the 10 countries where Pricesearcher has active sub-domains (UK / USA / Ireland / Denmark / Finland / France / Germany / Italy / Norway / Sweden). These samples represent at least 10m SKUs minimum for each country. Any international findings are presented separately. The main findings are UK centric and these are based on a sample of over 10m SKUs from thousands of retailers.

EXECUTIVE SUMMARY
The research conducted has been distilled into the following top 10 data insights that we believe are the most useful to the ecommerce professional.

At the Product Level
1. Average Product Title string length (characters + blanks) is 48 which is approximately 8 words.
2. The relevant brand term is included in 53% of all product titles.
3. Average product description has a string length (characters + blanks) of 552 which is approximately 90 words.
4. Shipping costs are included in 44.9% of products as an additional field in the feed.
5. Regarding the use of bar codes – 66.9% of all products have a GTIN barcode number in the product feed as a unique identifier.
6. Category level information is not provided in 7.9% of products.
7. Additional Product information (Size, colour, dimensions, etc) is not provided in 40.2% of products.

At the Product Feed Level
8. Methods of sharing a product feed are 61% URL and 21% via API Plug-in (Magento, Shopify, WooCommerce) and 18% FTP (Upload or download).
9. Product Feed File Formats are 51% TXT/CSV and 49% XML. However Google Shopping feeds are typically XML based, Non-google shopping feeds are typically TXT / CSV based.

Product Price Changing Frequency – International Comparison
10. The UK is the most dynamic environment with the prices for every single product in our data set changing every 6 days on average. France and Germany are not far behind with prices changing on average every 8 days. However the USA has half the level of price movements compared to the UK with a change every 14 days.

Product Feed Detailed Analysis

Diving into the detail we go into the different variables involved with product feeds and draw out the data insights that we able to share. Going through each topic in turn:

Product Title
The table below shows the distribution of title string length (characters + blanks) within
Pricesearcher for the UK territory.

Product Title Plot

Data Insight #1. Pricesearcher data shows an average 47.5 Product title string length
(characters + blanks) for UK products. This is typically 8 words.

However string length should not be considered as having an absolute bearing on the subsequent performance of the product feed, rather it’s the inclusion of the correct keywords into an accurate sentence. Having a very low number of characters may put an unnecessary strain on your ability to include a full range of keywords that consumers may use to search for your products.

For reference regarding Product Titles:
Google Shopping Feed for Product Title – Max 150 characters.
Bing Ads Feed for Product Title – Max 150 characters.

Data Insight #2. Our research shows that the relevant brand term is only included in 52.9% of all product titles.
Product title must include the keywords that you wish to rank for including brand, model and other key attributes that consumers may use to discover your products. Having the correct keywords within the title is an obvious consideration yet our analysis is that it’s only mentioned in just over half of all our UK SKUs.

Product Description
For text based searches this is another very important element of your product feed which will impact its performance.

Data Insight #3. The average product description has a string length of 552 which is
approximately 87 – 93 words.

Product Description Plot

The long tail of the distribution is from roughly 1,000 words onwards above which only a small proportion of feeds tend to go. A very small number (<5) of feeds were excluded as outliers as they had significantly more than 2,000 product description string length in their feeds.
Google Shopping Feed for Product Description – Max 5,000 character limit
Bing Ads Feed for Product Description – Max 10,000 character limit.

The table below shows the distribution of product description string length (characters + blanks) within Pricesearcher for the UK.

Product Picture
Image based searches are increasing and are now being offered by ecommerce companies in a bid to help shoppers find similar products to those seen in magazines, articles, other offline / online locations where a photo file exists. A lot of the innovation in this space has been pushed forward by the investment in driverless car technology by Google and Uber as it relates to how computers are able to identify and classify images e.g. other cars, traffic lights, pedestrians, cyclists, animals, etc. Its vital that your product feed has imagery that is visually appealing and also gives the consumer re-assurance that they are ordering the right product. Consumer expectations are different for different categories but

Its vital that your product feed has imagery that is visually appealing and also gives the consumer re-assurance that they are ordering the right product. Consumer expectations are different for different categories but certainly lifestyle images (with a model shot against a white background) are the preferred standard for fashion and other similar lifestyle goods. It is a requirement for products listed in Pricesearcher to have an image as part of the feed and so there is no distribution data to be displayed as they exist in 100% of cases (unless there is a fault with the feed).

Product Price (+ Shipping)
The price of the product or service may not be straight-forward with different component parts including sales taxes / VAT, shipping, bulk order discounts, promotional pricing and other similar factors.

At Pricesearcher we display one price which is the current selling price but its exclusive of shipping costs. All taxes must be already included and any promotional pricing accurate. Shipping is then an additional cost that the consumer may have to pay.

Data insight #4. From our sample 44.9% of product have shipping cost as an additional field in the product feed which may allow the consumer to have a fuller picture on the total cost earlier in the journey.

Product URL
The link that will take customers through to the product landing page is a requirement of any product feed and so there is no distribution data to be shown here.

Bar Codes – GS1
Unique product identifiers or bar codes, are an important part of a product feed.
GS1 are the not-for-profit organisation who issue bar codes Global Trade Item Numbers (GTINs) across 150 countries with GS1UK responsible for issuing bar codes here in the UK.

Data Insight #5. From our data we see that GTIN’s are present in 66.9% of products in our sample.

The guidance from google is that ‘’A GTIN uniquely identifies your product. This specific number helps us to make your ad richer and easier for users to find. Products submitted without any unique product identifiers are difficult to classify and may not be able to take advantage of all Google Shopping features’’. Google shopping feeds are a very common (55%) feed format for Pricesearcher where GTINs are typically included. Some additional bar code acronyms have been included in the glossary of terms for reference.

Product Category
Product categorisation has not been standardised in ecommerce and different platforms like Amazon, Google, Alibaba, etc have adopted different ways of categorising products. Amending product feeds so they’re set-up and optimised for different sales channels and different marketplaces is a key part of what many third party integration companies offer.
Pricesearcher does not require any particular product feed taxonomy and so retailers and brands are free submit any feed they have in any format and our technology is able process them standardising the data. This means that we receive thousands of different feeds in different formats and structures and are able to deliver aggregate, anonymised data insights.

Data Insight #6. Category level information is not provided in 7.9% of products.

Additional Product Information
Across the different categories of ecommerce from apparel, electronics, home, garden, toys, gaming and many others there is a lot of additional information that are of importance to the consumer as they go through the purchase consideration process.
For example in footwear size (UK, EUR, US) and colour information are attributes that may be included in a more detailed product feed so that consumers can filter by attribute earlier in the journey.

Data Insight #7. Additional Product information is not provided in 40.2% of product feeds.

Methods of distributing your product feed
There are several different methods by which a product feed can be shared. We are able to accept any feed in any format and so have a mixture of URL, FTP and Plug-in APIs used in our sample.

Methods of Feed Distribution Plot

Data Insight #8. Sharing the feed on a URL is the most common method (61% of products) and then after that it’s via FTP (18%) and then finally plug-ins (Shopify 12%, Magento 1 – 5%, Woo Commerce 4%).

Sharing a feed on a URL has advantages in that it can be easily shared and updated on a regular basis, although this will not be pulled through to your end locations on a real-time basis. At Pricesearcher we re-crawl the URL 3 times a day as a default and for some very large feeds as often as every 15 minutes.

File Transfer Protocol – (FTP) requires username and password credentials to be shared in order to access the information and so there is more control. Changes to the product feed will not be picked up in real time as a ‘fetch’ of the data typically takes place 3-4 times a day.

Plug-ins – API’s (Application Programming Interface) updates changes on Pricesearcher as soon as product are updated by the retailer on their own site in real time. Once installed they run in the background and are the most convenient method and take minutes to set-up.

The File Format – TXT / CSV, XML
Our research has shown the following distributions of feeds submitted to Pricesearcher by file format. We have merged TXT and CSV due to their closeness of format.
The ‘Google’ feeds are the google shopping format and this is a very commonly used format for Pricesearcher because it’s typically a feed that most retailers have and is in good condition in terms of completeness and accuracy.

Content Management Systems (CMS) typically output product feeds in TXT format hence the high proportion in Custom (i.e. Non google shopping) feeds however there are no respective advantages or disadvantages of XML over TXT/CSV it merely comes down to the system being used and the exports that are generated.

Feed File Format Distribution Plot

Data Insight #9. Google Shopping feeds are typically XML based, Non-google shopping feeds are typically TXT / CSV based. Overall we see a 51% TXT/CSV to a 49% XML split.

How often do prices change – International Comparison
Pricesearcher operates in 10 countries and so we are able to offer some insight into different behaviours regarding price movements. Of particular interest is how frequently prices change in the following countries.

Price Update Frequency Plot

Data Insight #10. The UK is the most dynamic environment with the prices for every single product in our data set changing every 6 days on average. France and Germany are not far behind with prices changing on average every 8 days. However the USA has half the level of price movements compared to the UK with a change every 14 days.
As mentioned at the start of this blog the samples used to determine these findings are based on 10m+ SKUs per country as a minimum. Evidence suggests that the larger retailers with feeds >10k products are more dynamic in their pricing and utilise re pricing software to maintain competitive positioning whereas smaller operators / brands with a relatively stable product range are more established in their pricing and do not change it as much.

Conclusions
Product feeds are part of the fabric of ecommerce and a vital part of your
communication with online shoppers. The more compelling, accurate and complete
the information put into them the better performance you should get out of them.

Our survey was based on over 10m product SKUs from the different countries in
which we operate including the UK, USA, Germany and France. The 10 data insights
given should be used as guidelines and benchmarks to help you better understand
the wider ecommerce environment and hopefully arm you with more information
for your role in ecommerce.

Thanks for taking the team to read this blog we’d love to hear your feedback
if you have any suggestions for us.

If you would like your products to appear on pricesearcher and receive free referral traffic then simply upload your feed here

About Ben Morgan

Head of Commercial for Pricesearcher

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s