Bank of England publishes EcoFinBench NLP benchmark and finds dictionary sentiment models lag modern approaches

The Bank of England published a staff working paper introducing EcoFinBench, a natural language processing (NLP) benchmark suite for economics and finance that tests a range of model classes for sentence-level classification. The paper also releases two new monetary-policy-focused datasets, Bluebook (text-only) and Greenbook (text plus numeric metadata), and reports that widely used dictionary-based methods substantially underperform more data-driven models. EcoFinBench evaluates dictionary models, word-count models, topic models and transformer models (including BERT-base, FinBERT and FLANG-BERT) on text-only datasets and on a multimodal setting. On text-only tasks, financial transformer models generally achieve the strongest macro F1 scores, while relatively simple word-count models can perform competitively, including matching or exceeding transformer performance on some benchmarks. By contrast, on the multimodal Greenbook dataset, transformer models perform poorly relative to an AutoGluon ensemble that fuses text features with numeric data, and the authors highlight that existing solutions for multimodal economic and financial datasets remain weak. Across datasets, Loughran and McDonald dictionary approaches lag materially, with the simplest variant trailing FinBERT by 33 to 72 macro F1 points on text-only tasks. The authors invite additions from the research community and outline planned extensions including additional datasets and tasks, models that can handle longer text sequences, and a more interactive benchmark interface, with potential future expansion beyond English and to generative models.

Original source Back to tracker