The first instinct of many academics, when given a natural language processor, is to feed it something profoundly unnatural. How else to explain the recent spate of investigations pitting AI against Fedspeak?
To be fair, there are very good reasons for wanting to ignore central bankers, and the desire to delegate the task of listening to their strangled ambiguities predates ChatGPT by at least a century. But it’s the ability to automate the process that has been causing excitement recently, as in articles here, here, here, and here, as well as an Alphaville post here.
It’s now the subject of a JPMorgan flagship note that we can’t link to, so we’ll summarize here. It opens with a big claim:
With central bank communications now at the forefront of policy setting, everything from official policy statements to individual speeches are scrutinized for suggestions of policy direction. It is in this context that machine learning and natural language processing (NLP) find fertile ground.
The use of NLP to assess central bank communications has been around for some time. However, previous attempts failed to gain traction because they lacked the sophistication to generate actionable results. Simply put, the technology wasn’t ready for prime time yet. This has changed. We believe that NLP is ready for the successful application that many have long awaited.
This is what many have been waiting for a long time:
Yeah? If monetary policy stance is of such fundamental importance, and if clear communication is the post-GFC prerequisite, why turn the job over to an algorithm? JPMorgan suggests five reasons.
AI offers a second opinion to human economists; his interpretations are systematic and transparent; it is faster to come to a conclusion; spits out metrics instead of essays; and his findings are invulnerable to hindsight. “While an economist can offer an informed judgment of a particular central bank’s speech, this assessment is often multidimensional and can be context dependent that can be lost in a matter of weeks or even days,” says JPMorgan. “By contrast, the HDS is unique and permanent in the historical record, making it ideal for measuring how central bank thinking is changing over time and how it compares to past episodes.”
The HDS referenced above is the JPMorgan Hawk-Dove Score. It’s an update of the bank’s 2019 attempt to process Fedspeak using BERT, a language model developed by Google. The rebuild uses ChatGPT and could theoretically make sense to any central bank on the planet, because it frames their entire world around three rules.
And here’s how you rate individual committee members of the Federal Open Market Committee, based on recent speeches, with positive numbers meaning aggressive and negative numbers meaning moderate. . .
…which is not how humans see things at all. Bullard and Kashkari are generally considered the most aggressive, Cook is the dove and Barr sits in the middle of the pack alongside Powell. Fortunately, because the FOMC is the most microanalyzed committee in the world, it’s reasonably easy to guess where the garbage that comes in becomes garbage:
One of the reasons that Governor Barr appears as the most dovish member of the FOMC is that he is the vice president of supervision and many of his speeches are less relevant to macromonetary policy and at the same time have numerous references to financial stability, a concept that can be interpreted as moderate (although not always). ( . . . ) President Bullard is often noted for having a wide range of views that are difficult to pin down. Also, because he often presents in slide format without a speech, it’s harder to quantify.
The hawk-dove ratios for the committees of the European Central Bank and the Bank of England also need humans to add context.
Schnabel is probably too close to the center of the ECB because his most aggressive speeches have been recent, so they haven’t moved the average yet. Broadbent probably has an excessive index like a pigeon because he speaks rarely and cautiously. The opposite may be true of Pill and his unique presentation style. Etc.
Whether it is possible to apply this level of granular analysis to less studied rate-setting committees is a question the paper does not investigate.
An interesting theme in the JPMorgan hawk-dove study is that on all three committees examined, the chairs lean toward the hawks. That’s a surprise, as the chairs are expected to be in the middle of the pack, but it’s probably an accurate reflection of recent communications. Chairpersons may need to play a larger role in communicating their committee’s direction of travel due to background noise, although a speech-by-speech analysis does not make it easy to spot any patterns:
To be clear, the hawk and dove detector works. JPMorgan’s machine is at least as good as the average economist at identifying mood swings in music:
But the paper only briefly addresses whether it is a leading indicator or a temperature control, and its findings are complicated by periods in which rates had a zero lower bound.
Roughly speaking, JPMorgan finds that when the three-month average of its hawkish measure of speakers rises between meetings by 10 points, it’s worth about 10 basis points to short-term interest rates a week ahead. That is what the following graph apparently illustrates:
“The debate over these rankings is just as likely as the debate over who is the better soccer player or baseball player,” JPMorgan says accurately.
While ten basic performance points can’t be sniffed at, the company remembers sports performance metrics as expected targets, which often seem more useful for prolonging arguments than predicting outcomes. And in the end, isn’t a drawn out argument what (human) economists want most?