Have AI Data Wars Begun?
Did Elon Musk fire the first salvo of an AI data usage war? We are referring to his April 19, 2023, post “They trained illegally using Twitter data. Lawsuit time.” in relation to Microsoft’s use of Twitter data to train its AI model.
Whether or not this provocation proves to be an empty threat, it is the kind of comment for which investors, who are either already allocated to one or more AI-based investment strategies, or who are in the midst of due diligence evaluations to determine which prospective AI-based strategies to add to their investments, should be on the alert.
Three Potential Data Vulnerabilities For AI-based Portfolio Managers
For the money manager deploying AI to run all or part of an investment strategy, data vulnerability has three major arenas where it can manifest: ownership, enduring quality/relevance, and regulation.
Mr Musk’s tweet, as you can see, can expose a portfolio manager’s investment methodology to that first type of data vulnerability: Ownership. Imagine the repercussions of such litigation. An AI investment model trained on what might later be determined to be illegally obtained data cannot unlearn whatever it has learned from using that data; so how would that be dealt with? What if an AI investment model that has been trained on “illegal” data is dependent on the continued availability of that data for its proper functioning, and suddenly the data is no longer available to be used? What happens then? We are barely scrapping the surface here in looking at the potential legal complications.
It is a mire, in fact all three regions of data vulnerability are mires, the kind that could draw a halt to a technically perfectly functioning AI-based trading system, in an instant.
2. ENDURING QUALITY/RELEVANCE
Digital storage and processing capability have become progressively greater and increasingly less of an impediment for data analysis. Naturally it follows that the volumes of stored data have over time expanded exponentially. This is what is termed Big Data. The intrigue of Big Data has a lot to do with the newness of these expanding data sets and the potential that this data has—to unlock new insights that without this new data would have been previously inaccessible. The bulk of current AI-based investment methodologies rely on voracious consumption of Big Data.
The data vulnerability risk with Big Data is that very little of this newly minted data has a provable history, due to its newness. In fact, only 3% of all current data has a three-year provable history and 0.3% a five-year history (something we discussed in a previous commentary). In a portfolio strategy construction context, with such a small fraction of data being historically verifiable, this means that the normal kind of rigorous analysis and testing that an educated, sophisticated investor would expect from a more traditionally constructed investment strategy is not possible to perform.
This recognition is important from two different perspectives: quality and relevance. Quality is fairly obvious; the AI analytics sector is dependent on high quality data. This automatically requires that this data can be validated. New data may appear to be of high quality, but this simply cannot be known, given the short time horizon for which it has been in existence. In other words, its use in an analytic portfolio management system brings with it the introduction of an additional unknown in terms of quality.
Relevance as a data vulnerability risk is a little more complex. The actual impact of the novelty of the new—or what is sometimes referred to as alternative data sets—is extremely hard to quantify. If new datasets bestow an advantage (in an investment context) is that because the data is actually very valuable, or is it because the initial users were the first to discover it and therefore have the advantage of first exploitation? If this data then becomes commonplace does it lose all, some or any of its advantage? From a longer-term standpoint, it is very difficult to determine whether such data will remain relevant when there is no precedence on which to base an assessment. If it cannot remain relevant then any temporary “investment edge” it generates could then dissipate and then disappear.
Regulation is already governing data use; take the EU’s vanguard data protection law, the General Data Protection Regulation (GDPR), or, in the pipeline, the American Data Privacy and Protection Act. The sudden awareness of all things AI, and given the potential future impact of AI on societies globally, it is very much in regulators’ sightlines.
This month, the Biden Administration said it was exploring regulation with regard to requiring accountability in the use of AI. The Spanish government recently pledged to establish AI regulation during its forthcoming presidency of the EU. Then from the corporate space, the Business Software Alliance (BSA), a business advocacy group consisting of some (but not all) of Silicon Valley’s heavyweights, suggested that Congress should pass legislation on the use of AI.
Much of this regulation will focus on the privacy of data and the use of personal data for exploitative private gain. Prospective investors in AI-based investment strategies need to recognize that this is the very kind of focus that could eliminate parts of, or entire, datasets that are currently being used by AI systems running some of today’s AI-based investment strategies. Once again from the money manager’s point of view, it would be good practice to understand what the data used in running a particular strategy looks like and what might or might not be possible to use in the not-so-distant future. Again, this is not the kind of thing that would be typically thought of by prospective investors from the investment standpoint in an analysis of an AI-based strategy: A very good functioning trading strategy today could be wiped out tomorrow due to regulatory restrictions on data use.
Vetting Managers with Potential Data Vulnerabilities
Diligent, sophisticated investors, who appreciate there are potential benefits of having an AI-based investment strategy allocation within their portfolio, will want to look for portfolio managers who, without prompting, demonstrate maturity of thought regarding AI model construction and can explain how they, if employing potentially vulnerable data, aim to detangle and replace such elements from their investment process and still have a viable strategy, while not exposing investors to strategy drift.
AI-Based Investment Methodology Built to Avoid These Data Vulnerabilities
Anyone familiar with our views at Plotinus will know that the issue of data vulnerability is something we are very sensitive to. The awareness of the intrinsic risks inherent in constructing investment models whose building blocks expose the portfolio manager and investor to the potential data vulnerabilities of ownership, enduring quality/relevance, and regulation played a role in what has shaped Plotinus’s design and use of the AI in its AI trade decision-making approach.
Our firm’s aim is to craft AI-based investment strategies that not only produce edge for investors but do so in a manner that will ensure long-term deployment and usability. In that regard, understanding the use of data is key, and that means understanding potential security of and access to that data use.
Such “thinking ahead,” if you will, played a role in our risk management protocol considerations and was among the factors the led us to focus our investment process development on the use of derived data. Our derived data approach to employing AI in our investment strategy means that we retain control over the data that our AI system needs in order to be able to function now, and uninterrupted into the future.
Be in no doubt, in the sphere of money management AI is a transformative tool when thoughtfully and correctly applied, and investment strategies employing AI clearly have a place in the sophisticated investors’ total portfolio. Now is the correct moment, as investors are beginning to undertake due diligence analysis of prospective AI-based portfolio managers and their investment strategies, to look to those that capably deal with the data vulnerability issue. Paying attention to this investment methodology risk exposure at this stage can help identify those AI-based strategies that may be better able to withstand when changes occur in the marketplace regarding data ownership, enduring quality/relevance, and regulation. ■
© 2023 Plotinus Asset Management. All rights reserved.
Unauthorized use and/or duplication of any material on this site without written permission is prohibited.
Image Credit: Vladru at Can Stock Photo.