Should Investors Be Skeptical of Alternate Data Set-Based Investing?

by Plotinus

As society becomes increasingly data oriented is it time for investors to take a breath and look at the complications that surround alternate data. Sophisticated investors should want to verify whether investing based on such data is the investment edge fount-of-knowledge some think it to be.

This subject—and the challenge it poses to the prospective investor—brings to mind a statement ascribed to IBM from 2012 that promoters of Big Data have made time and again; that 90% of all data has been created in the last two years.1 Such statements are meant to evoke the awesomeness of the growth of new data and encourage why you as an investor should buy in to profiting from this exponentially growing pool of information.

Such statements, of course, require two important assumptions:

a) the statement was correct when made; and

b) the statement continues to be true.

It can be illuminating for the interested but skeptical investor to consider what an illustration of this exponential growth shows. For this, let’s start the clock from 2010, the starting point from the first citation, when 90% of all data was created in the previous two years.

At first blush, this is an awesome accumulation of data.

Upon further consideration, however, it should occur to the sophisticated and skeptical investor that this is an awesome accumulation of data for which we are unsure of four key factors:

1. its quality

2. its longevity

3. its accessibility; and

4. its relevance

All of these important factors are extremely difficult to properly gauge scientifically in the absence of historical verification. This means that an awesomely small amount of this accumulated data will meet the threshold of having a verifiable minimum historical record of 3 years, 5 years, 10 years, 20 years—the very type of track record information typically looked for and taken into consideration by sophisticated investors.

The percentage of available data that has a history is shown in the following bar chart:

Although there is possible advantage to be found in some novel alternative data sources, the edge benefit will be novel: most likely short-term in nature and likely to be of benefit on a first-come, first-serve basis, before the edge evaporates.

For those investors who look to allocate to portfolio managers who are deploying a successful, repeatable investment process relying on such ‘new’, accumulated alternate data they will find themselves facing a very difficult task.

Alternative sources of data may hold the keys to unlocking long-term alpha, but an investor who demands proof of this must be prepared to wait in a cloud of uncertainty for 3, 5, 10, 20 years, while their chosen alternative data sets mature and prove their importance and relevance to a particular investment process over the course of these periods of years.

Such positive thinking, when considering just the line graph illustrated above, can quickly turn into disenchantment when one considers that the figures shown in the bar chart make no attempt to account for attrition. In other words, the data in the bar chart assumes that there is 100% data continuity from year-to-year. So that 3% of data with a minimum of 3 years of history is actually the overly optimistic, ideal scenario.

In the days before Big Data, database curation and maintenance—to ensure that data was continuous or flagged for discontinuities—was a labor-intensive exercise in quality control. In today’s Big Data world, obviously, much more quality control has to be left to automation. Unfortunately, this is a difficult task to perform on alternative datasets, so the total portion of curated data sources has become significantly reduced.

Investors are obviously more familiar with looking at data from public markets, yet this is could not be more different than newly-minted alternate data. Data from public markets is battle-hardened data, shaped by the demands and rigors of market participants and regulatory oversight. What it does not possess in novelty is more than made up for in reliability and utility. This is why, for the skeptical investor, it is this data that is most likely to be the benchmark against which alternate data will be measured as to whether it has a significant role to play in a long-term investment process.

This being the case, the challenge for technologists who wish to offer investors new ways of alpha creation is how to craft investment strategy processes that can best harness the power of technology. This will most likely require the ability to acquire and analyze some of the volumes of new data we are seeing, but in a way that enables the money management firm to use new data in conjunction with traditional, more reliable data sources.

1 Bringing Big Data to the Enterprise, IBM 2012.

© 2021 Plotinus Asset Management. All rights reserved.
Unauthorized use and/or duplication of any material on this site without written permission is prohibited.

Image Credit: Pitinan at Can Stock Photo.