Prattle – A New Generation of Sentiment Analysis

Founded in 2014, Prattle produces tradable signals by analyzing global central bank communications using a domain-specific linguistic algorithm in real time. Communications can include press releases, meeting minutes, speeches, papers, and more.

Built to treat central bank content and communications as data, Prattle’s methodology produces comprehensive, unbiased, quantitative assessments of central bank sentiment. Because central bankers are adept at forecasting economic downturns and, of course, at changing interest rates, central bank statements move markets. (The co-founders of Prattle are co-authors of a book called How the Fed Moves Markets.)

Importantly, Prattle’s dictionaries have been assembled not just by programmers but by analysts with domain expertise. The Prattle algorithm is trained on how each central bank’s specific words, phrases, sentences, paragraphs, and whole communications affected markets using machine learning techniques.

One output is a score that reflects central bank inflationary expectations and can be used by quants as a plug-in for their multi-factor models. Scores are normalized around zero and range between -3 and 3, and, as you’d expect, negative numbers indicate a negative outlook and positive numbers indicate a positive outlook.

Prattle’s software produces sentiment analysis scores far faster than a human analyst, or even a team of analysts, and avoids human bias in interpretation that can lead analysts and investors astray.

Now Prattle is preparing to roll out sentiment analysis for more than 3,000 publicly traded U.S. companies, with data drawn from quarterly earnings transcripts.

In January of 2017, Prattle announced a $3.3 million seed round led by GCM Grosvenor with participation from other investors including New Enterprise Associates, Correlation Ventures, and Plug and Play Ventures.

Evan Schnidman is Prattle’s co-Founder and CEO. Prattle has 18 employees in Boston, New York, and St. Louis.

Evan Schnidman

Q.   Evan, what’s new?

A.   We have officially set September 18th as the launch date for our new product. We’re very excited about that. We’re just finishing the new build and adding in some new features.

Q.   Who is your target market?

A.   Professional asset managers. We started out targeting the buy side, specifically midsize and large hedge funds. We’ve expanded into larger long-only firms, and now we’re starting to target a bit of the sell side, as well.

Q.   And how are you reaching them? Direct sales?

A.   Almost exclusively with direct sales, with the massive caveat that we’re on the Nasdaq Analytics Hub, which is for high-quality alternative datasets that are directly tradable, which they provide to much the same client base. So we have that channel partnership, but our core distribution mechanism is still very much direct sales.

Q.   How do you deliver your analysis and scores?

A.   We deliver our scores through three different mechanisms. Quant clients tend to prefer our API.

The primary mechanism discretionary or fundamental investors utilize is our portal. It provides graphing and charting tools, a whole bunch of metadata and a variety of ways to test how that data interacts with market information – price data or other forms of data that are standardized.

The third mechanism of transmission is actually a derivative of the second, which is to say we also do push notifications. Right now, it’s primarily email alerts. I’m always shocked to find that people still like to interact with email – my email box is always full – but we focus on getting them the things they need to know in an efficient way that fits within their existing work flow. For larger, enterprise clients we can also do desktop and browser notifications and mobile notifications, too.

We think the interesting thing about this is that a lot of the technology-enabled research platforms out there have built really impressive technology, but the fact that they’ve made them search-based is a real challenge for the user experience. Most clients just don’t have time to interact with a search-based system. They want tradable information delivered to them directly, based on the parameters they’ve set for things they think are interesting. Some of this is about being able to target portfolio managers and traders, not just researchers.

Q.   What kind of back-testing did you do to demonstrate the predictive capabilities of your scores?

A.   I am a recovering academic, so a lot of this came directly from my academic research. I was a game theorist, originally, and starting grad school in 2008 I decided to model the behavior of the most interesting small group of decision makers in the world, which at that moment in time was obviously the Federal Reserve. I built a model analyzing how the Fed interacts with their various principals and how they interact internally.

Modelling the internal decision dynamics of the Fed was a solvable problem. Modelling how the markets respond to Fed communications, however, was not. I dug into why that was and focused on what people were looking at and on the decision mechanism. We quickly figured out how Fed watching had been done – the state of the art, if you will, at that point in time and frankly, even still today is very much a form of close reading, similar to the way you might read poetry. It struck me that there had to be a better solution.

In building out the model, we ended up building out a unique lexicon for every individual central bank based entirely on historical market behavior. In some sense, our model is based on years of testing the best ways to actually get information about a central bank. Instead of using a dictionary-based approach, and trying to fit and refit that dictionary to what we think would be a market outcome, we’ve actually built the market outcome directly into training the lexicon itself.

Beyond that, we’ve done a great deal of backtesting. My business partner Bill MacMillan and I published a book on this. We’ve written internal whitepapers and done our own testing and we also have the validation of the testing Nasdaq has done. They outsourced some of that to Lucena Research and Lucena has done dozens of backtests on our data, as well.

We write a weekly research piece and publish it every Sunday. We put together a calendar of upcoming central bank policy statements, speeches, etc. – any communication coming out of the G-10 central banks for the next week. We identify what those are, what the data is saying, and what will be the likely policy outcome for each one of those. Based on that weekly piece, which we’ve been doing now for almost a year and a half, we are 98% accurate in predicting what policy is going to be. Mind you, we publish it on Sunday but I typically write it on Friday or Saturday, so we are projecting almost a week in advance in some cases. Same-day market futures pricing is only 90% accurate.

Q.   You mentioned dictionaries. You started building your dictionaries not with machines but with people who have domain expertise. Why?

A.   We did and we didn’t. Human domain expertise is extraordinarily important for the purposes of dimension reduction. What is the most salient dimension upon which people are making decisions? With central banks, that actually varies a great deal. Most central banks are single mandate around inflation but they actually shade that in one direction or another. The U.S. Fed is a dual mandate central bank – inflation and employment. That make things even more complicated.

Traditional sentiment analysis is predicated on a dictionary-based approach where you typically have a dictionary of positive words, a dictionary of negative words. You go through, add up all the positive words, add up all the negative words, subtract the negative from the positive, and call it a score.

It works reasonably well when you’re analyzing Twitter or any other simple, short-form communication. For anything longer-form or more complex, you need to have a lot more training involved.

What people have traditionally done is built a more nuanced dictionary. You rank order terms from positive to negative. You build in not only words but phrases, often known as n-grams. That works reasonably well if you have a single expert who is very good, but the problem is you’ve endogenized that person’s bias.

The fix is to bring in team of experts, which helps mitigate bias (although you do have the bias of the group). But it also means you have a static dictionary which potentially works extremely well at time T+1 but by time T+2 and T+3 that performance degrades. You then have to reassemble that team of experts to re-weight that dictionary continually, which is obviously cost prohibitive. It also means you don’t have a historical record of data that’s particularly useful.

We took a different approach. We decided that we knew the salient dimensions. We were able to reduce down to the most relevant information, the most relevant thing that a central bank was deciding upon, and unfortunately, that’s not necessarily policy movement. In fact, even if it were policy movement, policy doesn’t change enough. It’s market outcomes. The thing that’s driving speech about policy – I should say text about policy –  is actually market outcomes. We trained our model to build a unique lexicon for each central bank based entirely off of historical market response.

In the case of the U.S., that’s the two-year, the 10-year, the S&P 500, and the trade-weighted dollar. We’re looking at fixed income, F/X, and equities. All of which, when weighted appropriately, allow us to scale a unique lexicon – not just individual words or phrases but everything out to sentences and paragraphs and how all those entities relate to one another.  We’ve actually built our lexicon based on that, which helps mitigate bias. It also means we’re able to natively build in a machine learning component so that the whole system learns. When new language is used or old language is used in a new way, the system is continually updating based off of how that new language relates to other known entities.

Q.   Is there value in Prattle’s analysis beyond the immediate move in response to central bank communications or company earnings reports?

A.   That’s a great question. We have to scale on short-, medium-, and long-term market response. We look at short term as anything under two days, medium term two days to two weeks, and long term as two weeks all the way out to six months. Our backtesting has demonstrated relevant correlations all the way out to 193 days, so six months is not arbitrary. That fits with macro-economic theory, which indicates that we expect central bank activity to take six months to fully manifest in the markets.

Q.   Is the market for analysis of earnings transcripts larger than the market for analysis of central bank statements?

A.   Absolutely. The central bank product is near and dear to my heart. I love it. It came directly from my academic research, but it is a fairly narrow market. That’s not to say there isn’t a lot of money sloshing around in things related to central banks. In fact, there’s more money. The issue is that a lot of it is passive investing, not people who are making algorithmic decisions. We love that product and it’s highly sustainable in the market.

But where it gets really interesting is applying the same technology, the same ability to build a lexicon unique to each individual organization’s corporate communications. We started out with earnings calls. We’ve now built a unique lexicon for 4,000 publicly traded companies in the U.S. alone.

We are launching the product with 3,000 because we obsessive about data quality and want to make sure everything is as perfect as it can possibly be. We’ll be slowly burning in the additional equities. Despite the existence of the Wilshire 5000, there are only about 4,000 publicly traded equities in the U.S. that actually produce earnings.

Q.   Many of the equities you cover are small caps or micro caps for which there is little or no traditional analyst coverage. This would seem to have very interesting implications.

A.   I’ll start with the massive caveat that you’re not getting a lot of institutional investor flow in the micro-cap world. Company with $25MM market caps – and they do exist – aren’t getting large institutional investors throwing money into them. It’s just not happening.

Apple has 72 or 73 analysts covering it right now. Once you get to the middle of the S&P 500, that number drops to the single digits. By the bottom of the S&P 500, it’s two or three. By the time you get to the middle of the Russell 2000 – these aren’t small caps; they’re easily in the mid-cap universe – there’s no analyst coverage.

Our data can serve as short-form analyst coverage in areas where it’s just cost prohibitive for traditional analysts to cover those stocks. There isn’t enough cash flowing into them; there isn’t enough demand. But people might want to know things about them, or at least know what’s going on in that industrial sector in the mid cap/small cap universe. We think we can fill that information void. In fact, we know we can fill it.

The other sort of supplement here is that we’re not just producing a single quantitative score representing how positive or negative that earnings call was, we’re also producing what is effectively a one-page analyst report. In addition, we’re providing the most salient remarks from that call. We’ve algorithmically identified the things that are most likely to move markets, as well as who spoke, what percentage of time they spoke, and the sentiment of every speaker including not just corporate officers but the analysts.

We’re actually able to analyze the analysts. We think there are some really interesting implications of that on the large cap side. On the small cap side, where there aren’t a lot of analysts, we’re providing information that no one else has.

There are also timing implications to that. The information expressed on Apple’s earnings call, well that gets digested by the market very, very quickly. You get to the mid cap/small cap universe, that signal decays much more slowly because there’s just less information out there. We, in fact, might be the only information out there.

Q.   Can you feed this information directly into quantitative trading models?

A.   Absolutely. I think in terms of quantitative models, still. I can’t help it. I’m a mathematician more than anything else. I love the applications of this in training a model. If you have your standard multi-factor model investing on quantitative data in the equity space, that model might include 10 or 15 or 20 variables: earnings beat/miss, quantitative guidance, what’s the broader market doing, what’s the industrial sector doing, what are ROI, ROA, ROIC, ROE.

The problem is that even when you add up all these known factors, there’s some percentage of price movement that’s unexplained. For most stocks, that’s somewhere between 10% and 20% of price movement. You could argue some of that is macro-market influence, but if you’re controlling for broader market movement and sector movement, that shouldn’t really be problematic. What we think that is the sentiment expressed in their corporate communications. So, by adding in a measure of that sentiment – a factor that was not previously quantifiable – you can vastly improve your model.

Q.   How could regulators use Prattle’s sentiment analysis? For example, if sentiment is moving down while all other factors are trending positive, is this possibly a red flag?

A.   The regulatory world is extraordinarily cash strapped. The SEC’s been understaffed for years and they’re not getting more budget any time soon. One of the questions is how they pick where they’re going to investigate because they can’t investigate everywhere.

One of the things we think a lot about and are really interested in is the fact that when signals diverge, there’s something going on. When all of the known quantitative factors are going up and sentiment is going down, that spread is indicative that you probably want to pay attention here. That’s not only an investment signal. That’s also a signal that, well, wait a second – is there a regulatory issue here? We think this can help regulators streamline their processes and identify areas where they need to pay more attention.

Q.   Obviously, the market for your service knows no geographic boundaries. How are you addressing markets in Asia and Europe?

A.   It’s still a direct sales effort, and as I said we do have that channel through Nasdaq that helps; they do have a sales team all over the world. Realistically, it is a challenge. In Asia certainly there is a FinTech movement afoot, and we think there’s a lot of growth there. But in the near term, medium term, the real growth will be in Europe, and that’s a story about MiFID.

With MiFID II, the new regulatory framework, one of the obvious things happening is that they’re going to have to line-item out the cost of research. You have the sell side panicking because all of a sudden they’re going to have to start sending out seven-figure bills to places that have never seen a bill before because it was lumped into trading costs. The sell side is trying to figure out how they cut costs, improve quality, and expand coverage all at once.

In the short term, this is going to be a win for the independent research providers that can undercut on price. In the long run, investment banks win these wars. They have too much money to not figure it out. What will happen is that a lot of the investment banks are going to start using tools like ours to help streamline their research processes. They don’t have to pay an equity analyst to cover eight or ten stocks; they can pay an equity analyst to cover 50 or 100, know where to pay attention based on our data, and produce better quality research because they have a quantitative overlay.

Q.   Outside of English speaking countries, Prattle is analyzing sentiment of a bank’s comments that have been translated from a native language. What does this say about the importance of translation?

A.   The vast majority of Central Banks around the world communicate in English. We are very lucky that English is the language of international markets.

We’ve done a great deal of testing on this. We took the most likely case where native language and English language instances would diverge, which is Japan. A huge percentage of their debt is domestically held and not a lot of people internationally speak Japanese. We think that if sentiment is going to divergence between the native language communications and the English language communications, Japan would be the place where that’s going to happen.

There’s no divergence. The only time we found divergence was about a year and a half ago for a six-week span. We couldn’t figure out what was going on until finally a client of ours in Asia said, “Oh yeah – they have a new translator and he’s not very good.” So, there was a brief period where things did decouple, but we’ve done extensive testing on this and the central banks are very careful to make sure that native language and English language overlap nicely.

I’ll give you the caveat that some central banks don’t release every communication in English. This has become a bit challenging in parts of Latin America. We’ve seen this issue pop up in Brazil and in Mexico at various points in time, particularly in the last year, year and a half. We’ve built the ability to natively analyze in both Portuguese and Spanish just to solve that problem.

Q.   Last year, Prattle merged with LH Meyer, Inc., the research firm founded and led by former Federal Reserve Board Governor Larry Meyer, and you created a parent company called Quiet Signal. What are the implications of that? Will you continue to use that structure?

A.   There are a number of interesting implications. We were going to form a holding company either way. Bringing on Larry and Ken Meyer – Larry is a former Fed governor who was leaving Macroeconomic Advisers, his prior firm, to go out and start his own, independent brand. We said, “Hey, why don’t you come join us?” There are synergies between our central bank sentiment data and what he had been doing in terms of analyzing central bank speak. We helped him get that off the ground and ultimately spun that out so we could be a pure technology company. Larry continues to run that brand successfully.

We kept Ken Meyer. Ken ran sales and biz dev for Macroeconomic Advisers for a number of years and we are very grateful to have him and his sales contacts. We’ve really benefited from that relationship and been able to sell to the discretionary or fundamental world in large part because of the credibility that granted us.

Here’s why the structure is interesting. When we were out raising capital at the tail end of last year, we knew we were going to be a pure technology company. That meant we were going to build on the things that were going to scale. That meant applying our core technology to other nuanced, market-moving communications. We raised capital for that purpose. But in the process, we had also built a number of other things that are, potentially, separate business lines. Quiet Signal is our parent company and Prattle is a wholly-owned subsidiary.

We’ve also incorporated a separate brand, called Portend, which is our underlying data science platform. We have not marketed this and don’t intend to any time soon. But, we may at some point choose to commercialize it, so we’ve placed the IP into a separate, wholly-owned sub.

When we started the company, we had no money. A pretty common problem. The other problem that we encountered was that people who are great quants are rarely great developers. What happens is, you build this complex, beautiful, quantitative model and have no clue how to properly deploy it.

At big companies, you have a quant team and a dev team. The quant team builds something and the dev team deploys it. But typically, something gets screwed up. We couldn’t go out and hire a bunch of unicorns who have both quant and dev skill sets – they cost too much. (Though thankfully my partner Bill MacMillan is one of those unicorns, as is our Director of Quant Analytics, Joe Sutherland.) But we run into the problem of how you hire junior-level folks and have them contribute to core code. So, we built technology that allows us to go from a static model in R or Python through a series of what are effectively just dropdown menus. It’s a scheduled queue. It’s also a micro ETL framework – our own, proprietary micro ETL framework – that gets you from a static model to a fully-deployed production piece of code that is highly stable and can go directly into your core code base, basically without any coding effort at all.  It means we can hire junior-level folks with limited dev skills but great quant skills and have them contribute to our core code base.

We think this has accelerated our development somewhere between 4X and 5X. There are a number of tools out there that do bits and pieces of what Portend does, but we wanted something that did everything, soup to nuts, for us.

Q.   What are your plans for the capital you’ve raised?

A.   The idea was to grow our team to build out this wholly-new equity capability, applying our core methodology to analyzing these corporate communications the same way we analyzed central bank communications. We had initially planned to launch with earnings-call data in September of this year and we are fully on schedule. In fact, we got ahead of schedule in large part because we had converted our systems to Portend. We ended up building in not only data from earnings calls but also analyzing 10-K and 10-Q regulatory filings. And we have a partnership now with FactSet, where we’ll be distributing FactSet Fundamentals, as well.

Q.   Grosvenor is an asset manager specializing in alternative assets and not a traditional seed-round investor. How did the investment come about, and why did you select a strategic investor as your lead?

A.   Grosvenor is a nearly $50 billion alternative asset manager focused mostly on the buy-side world. They know about 1,000 hedge funds intimately. They have been able to do two really important things for us. One is to help make introductions.

The other, frankly I think more valuable thing has been their guidance around how we frame a product so it really resonates with that client base. We used to talk, almost exclusively, about improving performance. Occasionally we’d talk a little bit about cost cutting. We still talk about performance a lot because people want to have their investment portfolios perform better. But if you can do it at a lower cost, and cover more stocks in a better way than you ever have before, that’s obviously an important thing to say.

The other factors were things we genuinely had never spent time on or talked about. One of the things people at Grosvenor think a lot about – and I think anyone who runs a hedge fund thinks a lot about – is fiduciary duty. If there’s data out there on every publicly-traded equity, and you’re trading equities and not looking at it, that’s a fundamental problem. We can talk to people about helping them fulfill their fiduciary duty. And because what we do is analyze only publicly available communications, we are never in danger of dealing in material, non-public information.

The other piece, and I think we’ve seen this more over the last year, is it’s really hard for a hedge fund to market itself to investors if it doesn’t have machine learning or AI as part of the process. We check that box in a big way.

Q.   Prattle’s dev team is in St. Louis. What are the pros and cons of the St. Louis ecosystem for a startup like Prattle?

A.   I went to Wash U for undergrad and got my master’s there, as well. St. Louis is a fantastic place. It’s a fantastic place to have the bulk of your staff. Having developers in St. Louis is about the same price as outsourcing to Eastern Europe. And you have people on your team you can directly manage, who are extremely loyal, and deeply skilled. We’re pulling high-quality people from Wash U and elsewhere, and we’ve been able to build out our core dev team there. The thought for now is that we really won’t ever see a reason to pull out of that location.

Q.   Will the pool of qualified engineers be big enough as you expand?

A.   If I had to hire 200 developers tomorrow I’d be in real trouble. But I’d be in trouble in Boston, New York, or Silicon Valley, too. If I needed to hire ten or twenty, it would not be hard.

It’s a big enough ecosystem. There’s a bent to the community there – biotech has dominated for a long time. Part of that is the Wash U med school is fantastic. But Wash U went from having no financial engineering – they had a quant finance program but no financial engineering – to having a financial engineering lab as part of the engineering school that is now its own major. We’ve been able to ride that wave a little bit. We think FinTech is and should be growing in St. Louis, and we’re part of that.

Q.   You participated in the Plug and Play FinTech accelerator. What was that experience like and what did you get out of it?

A.   It’s a deeply Silicon Valley thing. I really, really like their setup because they force the big corporates to put their money where their mouths are. If you want to engage with the startup community, you have to put money into it. Their partners are big investment banks, insurance companies. Anyone in financial services can pay to join and get access to innovative startups.

Q.   What else would you like people to know about Prattle?

A.   Our long-term vision is to build a comprehensive corporate intelligence framework. An entire database devoted to alerting people when something happens that they need to know about at various corporations, or with various potential investment opportunities.

We embarked on this with earnings calls, the things that obviously move stock prices. The next step was Ks and Qs; we’ve built that as well. The next move for us is to analyze other publicly available corporate communications in the U.S.: press releases, speeches by corporate officers, etc. Primary source communications that move stock price. Then do that for every developed nation in the world.

The next piece is regulatory documents. Something that people don’t know is that when the EPA filed suit against VW for the emissions scandal, eight months prior the Korean regulator had taken action against Volkswagen for the exact same issue. Unless you were reading the Korean regulator’s web site, in Korean, the day that this happened, you had no way of knowing. Talk about a hell of a short signal!

Part of our belief about building a comprehensive corporate intelligence database is not just tracking what companies are saying about themselves, but what the regulators are saying about companies. Being able to identify regulatory mentions and regulatory actions and being able to overlay positive/negative sentiment. Not all mentions are bad: passing a stress test is good, getting a drug approved is great. Being able to overlay that is important.

The longer-term vision is an alert-based system that can notify a trader or portfolio manager in real time that, hey, this thing happened with a company you care about.

It gets even more interesting if you do some rudimentary supply chain analysis. Say you’re invested in a steel company that is the core steel provider to VW, which just had the Korean environmental regulator file suit. That chain is a clear link to dumping that stock.

From the startups’ perspective, there are a whole bunch of people there who want to meet with you. We’ve done a number of accelerator programs. We’re very familiar with what they offer. By and large, you’re not going to get great ideas about product out of an accelerator program. That just isn’t what they’re for. You get better at pitching, you get better at framing yourself. But what you really get, from Plug and Play in particular, is intros.

Q.   Does that require a graph database?

A.   Yes. That’s not a problem for us. We’ve already begun building this regulatory database. I should note that our regulatory database – just for the U.S. and just for the last couple of years – is bigger than our database for every publicly available corporate communication and every publicly available central bank communication for the last 15 years. Regulators produce a lot of documentation, so this is not a trivial task.

# # #