Scottish Business Sentiment Index
Provides an overview of the development of the Scottish Business Sentiment Index, including the evolution of the index in response to global, UK and Scottish events.
3 Technical Details & Construction
This section details the methodology used to construct the Business Sentiment Index (BSI), covering both overall sentiment analysis and topic-specific sentiment extraction. It outlines the use of the Loughran and McDonald (2011) dictionary for measuring sentiment in business news and explains how the Latent Dirichlet Allocation (LDA) model is applied to identify relevant topics. Additionally, it describes how topic-specific BSIs are derived and discusses the rationale behind the selection of the optimal number of topics for analysis.
3.1 Overall Business Sentiment Analysis
The overall sentiment of business news is measured using the Loughran and McDonald (2011) (LM) dictionary, which is specifically designed to tailor word classifications for corporate and financial contexts. Unlike general-purpose sentiment dictionaries, the LM dictionary adjusts the classification of financial terms that might otherwise be misinterpreted as negative (e.g., “liability” and “capital”) and refines positive and uncertain wording to better reflect the unique language used in regulatory filings, earnings announcements, and media coverage of firms.
In practice, each article’s sentiment score is calculated by taking the difference between its count of positive and negative words and dividing this difference by the total word count in the article. These article-level sentiment scores are then averaged for each month and multiplied by one hundred to produce the monthly Business Sentiment Index (BSI), expressed in percentage terms. This index is designed to capture fluctuations in the tone of business reporting over time.

Notes: The figure plots the monthly series of the Overall Business Sentiment Index covering the period from January 1, 2006, to September 30, 2024, and constructed using Factiva data.
For example, as shown in Figure 3a, the BSI value of -1.12% in October 2013 (the lowest in our sample) indicates that the count of negative words exceeded positive words by a proportion equivalent to 1% of the total word count in the relevant articles of that month, reflecting an overall negative sentiment in business news. Compared to the previous month’s BSI score of -0.64%, the sentiment in October 2013 was approximately twice as negative, declining by 0.5 percentage points. In numerical terms, this means that, relative to September 2013, the proportion of negative words exceeding positive words increased by an additional 0.5 percentage points of the total word count.
To assist policymakers and stakeholders in interpreting the BSI, it is important to understand how sentiment fluctuations translate into economic trends. A positive BSI reflects business optimism and may indicate economic expansion, while a negative BSI suggests deteriorating business sentiment. To enhance interpretability, Figure 3b provides a classification of sentiment trends, ranging from mild to strong improvements or declines compared to last period. Furthermore, for further insights on aggregation and normalization procedures affecting the BSI, see Section 4.6.

Notes: This is a stylised figure on how to interpret changes in the Overall Business Sentiment Index compared to the previous period.
3.2 Topic-specific Business Sentiment Analysis
To construct topic-specific Business Sentiment Indices (BSIs), relevant topics must first be identified. This is achieved using the Latent Dirichlet Allocation (LDA) model, an unsupervised machine-learning method designed to uncover latent thematic structures in a corpus without requiring manually labelled training data, which can be highly labour-intensive (Blei, Ng, and Jordan 2003). The LDA model autonomously learns the structure of the data and detects latent patterns or thematic groupings. In this context, the model identifies underlying topics in the corpus and automatically associates news content with these topics.
A key advantage of the LDA model is its ability to be readily updated as new data become available, making it well-suited for tracking the evolution of topic-specific content over time in a cost-effective and near real-time manner, depending on the highest frequency at which data are available.[2]
Formally, the LDA characterizes each document as a probabilistic mixture of latent topics, with each topic modelled as a probability distribution over all words in the model's vocabulary. In simpler terms, a topic consists of a group of words, each contributing to varying degrees in the topic. The model estimation process involves placing Bayesian priors on these distributions, enabling a principled inference of coherent themes.
The LDA model generates two primary outputs: (1) the probability distribution of words within each topic, which is used to identify the most significant words—those with the highest probabilities—for labelling each topic (see Appendix for details on topic modelling results); and (2) the probability distribution of topics within each document, which facilitates the construction of article-specific time-series shares of topic coverage.
Once topics have been identified, topic-specific BSIs are constructed at a monthly frequency as a weighted average of the article-specific Overall BSIs (computed earlier), weighted by the article-specific shares of topic coverage (as determined by the LDA model).
Operationalising the LDA model requires specifying the total number of topics, a key parameter in model selection. The primary consideration in topic selection is the extent to which topics of interest are meaningfully grouped. To assess this, the LDA model is applied to configurations with 30, 40, 50, 60, 70, and 80 topics,[3] revealing that 50 topics provide the most coherent and interpretable groupings.
In particular, a smaller number of topics tends to merge distinct themes, while an excessive number of topics results in topics that are difficult to interpret. For example, in a 30-topic model, the topic on taxes combines tax-related discussions with political themes, as indicated by the keywords in this topic, i.e. “government uk minister snp labour tax party leader policy public scotland sturgeon country people eu secretary vote independence support political” (see Topic 24). Conversely, in a 60-topic model, the keyword 'tax' appears both in Topic 18—“government business uk tax support small industry scheme economy sector minister rate eu economic scotland policy budget plan country change”—and Topic 53—“pay gbp cost tax pension bonus million scheme paid salary increase payment wage charge price benefit income executive annual rate”—diluting its thematic clarity.
Upon manually reviewing the 50 topics generated by the model, Topic 24 is determined to be uninformative and irrelevant to the context of Scottish business. The words in this topic do not form a coherent or actionable theme within the business domain, making it difficult to label or utilise effectively for analysis.[4] Therefore, the final output includes 49 topic-specific BSIs suitable for further analysis.
As illustrated in Figure 4a, the shares of topic coverage (averaged across all periods), range from 4.7% for the topic “Company Profits and Financial Results” to 0.5% for the topic “Sports Championships and Competitions”. While some topics receive larger coverage than others, the dispersion is not substantial, with an average topic coverage of 2%, a standard deviation of 0.9% over all periods, and the top 10 topics collectively covering approximately 32% of the content. "

Notes: The figure plots the % share of topic coverage averaged across all periods for each of the 50 topics produced by the LDA topic modelling using Factiva data. Note, that the topic “Other” refers to topic 24 in the 50-topic model, which is meaningless and uninformative in the context of Scottish business news, and thus not relevant to use for topic-specific sentiment analysis.
To provide a broader analytical scope, results are also presented based on an aggregation of the 50 topics into 20 broader categories. This aggregation is performed manually, with a detailed mapping of the topics provided in the Appendix. Figure 4b displays the share of topic coverage averaged across all periods for the 20 broad topics. The most prominent topics are “Business” and “Finance” with coverage shares of 22% and 9%, respectively, while the least covered topics, “Education and Training” and “Automobiles and Vehicles,” each account for 1.2%. Overall, the top 10 topics represent approximately 75% of the content.[5]

Notes: The figure plots the % share of topic coverage averaged across all periods for each of the 20 aggregated topics produced by the LDA topic modelling using Factiva data. Note that the aggregation to 20 broad topics is done manually and a detailed list of the exact topics used from the 50-topic model is provided in the Appendix. Note, that the topic “Other” refers to topic 24 in the 50-topic model, which is meaningless and uninformative in the context of Scottish business news, and thus not relevant to use for topic-specific sentiment analysis.
It is important to highlight here that a topic-specific BSI at the monthly level is constructed as a topic-share weighted average of article-specific overall BSIs in each month. As such, the variation in topic-specific BSIs can be driven by both the overall BSI in each article and the shares of topic coverage in each article.
It is important to note that a topic-specific BSI at the monthly level is constructed as a topic-share-weighted average of article-specific overall BSIs. Also, recall that the Overall BSI is a direct outcome of the Loughran and McDonald (LM) general-purpose sentiment dictionary and is not explicitly linked to the extent to which a given topic is covered within an article. Consequently, variations in topic-specific BSIs can be influenced by both the overall BSI of individual articles and the shares of topic coverage within those articles. Figure 5 illustrates this point where there is no clear cut assortative ranking between the correlation coefficient of each of the 50 topic-specific BSIs with the Overall BSI and the respective shares of topic coverage. For instance, for Topic 46 “Tourism and Travel” with low shares of topic coverage (1.24%) the respective topic-specific BSI exerts a high correlation with Overall BSI—with a correlation coefficient of 0.7.

Notes: The figure plots the correlation coefficient for each of the 50 topic-specific Scottish Business Sentiment Index (BSI) with the Scottish Overall BSI (left panel) and the % share of topic coverage averaged across all periods for each of the 50 topics (right panel) produced by the LDA topic modelling using Factiva data. Note, that the topic “Other” refers to topic 24 in the 50-topic model, which is meaningless and uninformative in the context of Scottish business news, and thus not relevant to use for topic-specific sentiment analysis.
Contact
Email: economic.statistics@gov.scot