Scottish Business Sentiment Index

Provides an overview of the development of the Scottish Business Sentiment Index, including the evolution of the index in response to global, UK and Scottish events.


3 Technical Details & Construction

This section details the methodology used to construct the Business Sentiment Index (BSI), covering both overall sentiment analysis and topic-specific sentiment extraction. It outlines the use of the Loughran and McDonald (2011) dictionary for measuring sentiment in business news and explains how the Latent Dirichlet Allocation (LDA) model is applied to identify relevant topics. Additionally, it describes how topic-specific BSIs are derived and discusses the rationale behind the selection of the optimal number of topics for analysis.

3.1 Overall Business Sentiment Analysis

The overall sentiment of business news is measured using the Loughran and McDonald (2011) (LM) dictionary, which is specifically designed to tailor word classifications for corporate and financial contexts. Unlike general-purpose sentiment dictionaries, the LM dictionary adjusts the classification of financial terms that might otherwise be misinterpreted as negative (e.g., “liability” and “capital”) and refines positive and uncertain wording to better reflect the unique language used in regulatory filings, earnings announcements, and media coverage of firms.

In practice, each article’s sentiment score is calculated by taking the difference between its count of positive and negative words and dividing this difference by the total word count in the article. These article-level sentiment scores are then averaged for each month and multiplied by one hundred to produce the monthly Business Sentiment Index (BSI), expressed in percentage terms. This index is designed to capture fluctuations in the tone of business reporting over time.

Figure 3a: Overall Business Sentiment Index (monthly series)
The line chart tracks monthly business sentiment in Scotland from 2006 to 2024. Sentiment fluctuates over time, with the lowest point (-1.12%) in October 2013, indicating heightened negativity in business news.

Notes: The figure plots the monthly series of the Overall Business Sentiment Index covering the period from January 1, 2006, to September 30, 2024, and constructed using Factiva data.

For example, as shown in Figure 3a, the BSI value of -1.12% in October 2013 (the lowest in our sample) indicates that the count of negative words exceeded positive words by a proportion equivalent to 1% of the total word count in the relevant articles of that month, reflecting an overall negative sentiment in business news. Compared to the previous month’s BSI score of -0.64%, the sentiment in October 2013 was approximately twice as negative, declining by 0.5 percentage points. In numerical terms, this means that, relative to September 2013, the proportion of negative words exceeding positive words increased by an additional 0.5 percentage points of the total word count.

To assist policymakers and stakeholders in interpreting the BSI, it is important to understand how sentiment fluctuations translate into economic trends. A positive BSI reflects business optimism and may indicate economic expansion, while a negative BSI suggests deteriorating business sentiment. To enhance interpretability, Figure 3b provides a classification of sentiment trends, ranging from mild to strong improvements or declines compared to last period. Furthermore, for further insights on aggregation and normalization procedures affecting the BSI, see Section 4.6.

Figure 3b: Interpretation of changes in Overall Business Sentiment Index compared to last period
A stylized guide categorizing changes in the BSI as mild/strong improvements or declines relative to the prior period. For example, a +0.5% change signifies a "mild improvement."

Notes: This is a stylised figure on how to interpret changes in the Overall Business Sentiment Index compared to the previous period.

3.2 Topic-specific Business Sentiment Analysis

To construct topic-specific Business Sentiment Indices (BSIs), relevant topics must first be identified. This is achieved using the Latent Dirichlet Allocation (LDA) model, an unsupervised machine-learning method designed to uncover latent thematic structures in a corpus without requiring manually labelled training data, which can be highly labour-intensive (Blei, Ng, and Jordan 2003). The LDA model autonomously learns the structure of the data and detects latent patterns or thematic groupings. In this context, the model identifies underlying topics in the corpus and automatically associates news content with these topics.

A key advantage of the LDA model is its ability to be readily updated as new data become available, making it well-suited for tracking the evolution of topic-specific content over time in a cost-effective and near real-time manner, depending on the highest frequency at which data are available.[2]

Formally, the LDA characterizes each document as a probabilistic mixture of latent topics, with each topic modelled as a probability distribution over all words in the model's vocabulary. In simpler terms, a topic consists of a group of words, each contributing to varying degrees in the topic. The model estimation process involves placing Bayesian priors on these distributions, enabling a principled inference of coherent themes.

The LDA model generates two primary outputs: (1) the probability distribution of words within each topic, which is used to identify the most significant words—those with the highest probabilities—for labelling each topic (see Appendix for details on topic modelling results); and (2) the probability distribution of topics within each document, which facilitates the construction of article-specific time-series shares of topic coverage.

Once topics have been identified, topic-specific BSIs are constructed at a monthly frequency as a weighted average of the article-specific Overall BSIs (computed earlier), weighted by the article-specific shares of topic coverage (as determined by the LDA model).

Operationalising the LDA model requires specifying the total number of topics, a key parameter in model selection. The primary consideration in topic selection is the extent to which topics of interest are meaningfully grouped. To assess this, the LDA model is applied to configurations with 30, 40, 50, 60, 70, and 80 topics,[3] revealing that 50 topics provide the most coherent and interpretable groupings.

In particular, a smaller number of topics tends to merge distinct themes, while an excessive number of topics results in topics that are difficult to interpret. For example, in a 30-topic model, the topic on taxes combines tax-related discussions with political themes, as indicated by the keywords in this topic, i.e. “government uk minister snp labour tax party leader policy public scotland sturgeon country people eu secretary vote independence support political” (see Topic 24). Conversely, in a 60-topic model, the keyword 'tax' appears both in Topic 18—“government business uk tax support small industry scheme economy sector minister rate eu economic scotland policy budget plan country change”—and Topic 53—“pay gbp cost tax pension bonus million scheme paid salary increase payment wage charge price benefit income executive annual rate”—diluting its thematic clarity.

Upon manually reviewing the 50 topics generated by the model, Topic 24 is determined to be uninformative and irrelevant to the context of Scottish business. The words in this topic do not form a coherent or actionable theme within the business domain, making it difficult to label or utilise effectively for analysis.[4] Therefore, the final output includes 49 topic-specific BSIs suitable for further analysis.

As illustrated in Figure 4a, the shares of topic coverage (averaged across all periods), range from 4.7% for the topic “Company Profits and Financial Results” to 0.5% for the topic “Sports Championships and Competitions”. While some topics receive larger coverage than others, the dispersion is not substantial, with an average topic coverage of 2%, a standard deviation of 0.9% over all periods, and the top 10 topics collectively covering approximately 32% of the content. "

Figure 4a: Shares of topic coverage for 50 topics, averaged across all periods (%)
Bar chart showing the distribution of coverage for 50 topics in Scottish business news. "Company Profits and Financial Results" (4.7%) is the most covered, while "Sports Championships" (0.5%) is the least.

Notes: The figure plots the % share of topic coverage averaged across all periods for each of the 50 topics produced by the LDA topic modelling using Factiva data. Note, that the topic “Other” refers to topic 24 in the 50-topic model, which is meaningless and uninformative in the context of Scottish business news, and thus not relevant to use for topic-specific sentiment analysis.

To provide a broader analytical scope, results are also presented based on an aggregation of the 50 topics into 20 broader categories. This aggregation is performed manually, with a detailed mapping of the topics provided in the Appendix. Figure 4b displays the share of topic coverage averaged across all periods for the 20 broad topics. The most prominent topics are “Business” and “Finance” with coverage shares of 22% and 9%, respectively, while the least covered topics, “Education and Training” and “Automobiles and Vehicles,” each account for 1.2%. Overall, the top 10 topics represent approximately 75% of the content.[5]

Figure 4b: Shares of topic coverage for 20 aggregated topics, averaged across all periods (%)
Bar chart of 20 broader topic categories. "Business" (22%) and "Finance" (9%) dominate, while "Education" and "Automobiles" (1.2% each) are least covered.

Notes: The figure plots the % share of topic coverage averaged across all periods for each of the 20 aggregated topics produced by the LDA topic modelling using Factiva data. Note that the aggregation to 20 broad topics is done manually and a detailed list of the exact topics used from the 50-topic model is provided in the Appendix. Note, that the topic “Other” refers to topic 24 in the 50-topic model, which is meaningless and uninformative in the context of Scottish business news, and thus not relevant to use for topic-specific sentiment analysis.

It is important to highlight here that a topic-specific BSI at the monthly level is constructed as a topic-share weighted average of article-specific overall BSIs in each month. As such, the variation in topic-specific BSIs can be driven by both the overall BSI in each article and the shares of topic coverage in each article.

It is important to note that a topic-specific BSI at the monthly level is constructed as a topic-share-weighted average of article-specific overall BSIs. Also, recall that the Overall BSI is a direct outcome of the Loughran and McDonald (LM) general-purpose sentiment dictionary and is not explicitly linked to the extent to which a given topic is covered within an article. Consequently, variations in topic-specific BSIs can be influenced by both the overall BSI of individual articles and the shares of topic coverage within those articles. Figure 5 illustrates this point where there is no clear cut assortative ranking between the correlation coefficient of each of the 50 topic-specific BSIs with the Overall BSI and the respective shares of topic coverage. For instance, for Topic 46 “Tourism and Travel” with low shares of topic coverage (1.24%) the respective topic-specific BSI exerts a high correlation with Overall BSI—with a correlation coefficient of 0.7.

Figure 5: Correlation of each of the 50 topic-specific Scottish Business Sentiment Index (BSI) with Scottish Overall BSI and shares of topic coverage for 50 topics, averaged across all periods (%)
Two scatter plots: (1) Correlation between topic-specific BSIs and the Overall BSI (e.g., Tourism: 0.7). (2) Topic coverage shares (e.g., Tourism: 1.24%). No clear link between coverage and correlation strength.

Notes: The figure plots the correlation coefficient for each of the 50 topic-specific Scottish Business Sentiment Index (BSI) with the Scottish Overall BSI (left panel) and the % share of topic coverage averaged across all periods for each of the 50 topics (right panel) produced by the LDA topic modelling using Factiva data. Note, that the topic “Other” refers to topic 24 in the 50-topic model, which is meaningless and uninformative in the context of Scottish business news, and thus not relevant to use for topic-specific sentiment analysis.

Contact

Email: economic.statistics@gov.scot

Back to top