While financial institutions (FI) have come a long way in streamlining their financial data, they are now facing a familiar challenge in repeating the process with the Environment, Social and Governance (ESG) data. One of the main differences in sourcing ESG data is the heavy dependence on external providers. Given the myriad uses of this data, FIs need to ensure its quality to drive adoption within various lines of businesses. ESG datasets are generally categorized as private[1], public[2], internal[3] and specific[4]. The trick is in finding the right balance between the existing and upcoming data sources to ingest relevant data which aligns with the business needs of the firm. To that extent, the pertinent issues to be addressed by any firm are the identification of the right framework to build an ESG ecosystem, the data sources that the firm should subscribe to, and deciding whether these data sources should be integrated with an existing information ecosystem instead of creating a new one.
In this point of view, the first of a two-part series, we explore how organizations can build an integrated, future-ready data ecosystem. The key highlights of this article include:
- ESG related challenges faced by FIs
- How FIs can approach ESG data sourcing by leveraging the latest technologies such as cloud, AI/ML among others to transform the acquired data
Challenges
The convergence towards harmonized taxonomies and standardized disclosures is inevitable[5]. Hence, it is important for FIs to prepare in advance. Building data repositories with access to consistent and comparable data is crucial to building ESG analytical models, creating forward looking indicators, pricing climate risk appropriately, and enabling accurate reporting and disclosures. This is not straightforward, as FIs face several challenges with ESG data.
Building a robust ESG data ecosystem
As FIs continue to make progress on incorporating ESG into their value chain, they should also start thinking about the ESG data that is crucial to achieving these objectives. In this section, we introduce an ESG data framework that is complemented by tech enablers such as ESG focused cloud stacks, AI-ML analytics, and automation of ESG data quality processes to provide a glimpse into what FIs are doing – and ways in which we can help other FIs.
Framework: Addressing the ESG data challenge
With standardization of disclosures by firms, and data distribution by the providers, remaining a work in progress, FIs have been forced to create pools of specialized data analysts who help in integrating multiple ESG data sources and scouring for opportunities (e.g., web scraping) to find alternative data. Obviously, this manual process is risky and open to inaccuracies, especially when there is an increasing focus on mandatory climate risk disclosures[6]. To remedy this, we suggest a series of steps that FIs need to perform to take control of ESG data:
By combining these steps with our proposed framework (Refer Figure 3), FIs can address key data challenges while building core/foundational ESG infrastructure to support scalability and repeatability.
Details of the framework include:
- Creating an ecosystem of diverse and relevant data sources
- Ensuring the quality, accuracy, and reliability of the data to be used by downstream applications
- Using advanced technologies to fill the gaps and improve accuracy
- Leveraging transparent and accurate data to drive a purposeful, fully aligned firm
Cloud as an ESG enabler
The exponential growth in the use of cloud (refer Figure 4) with its distinctive data analytics tools, AI technologies and collaborative ecosystem, can surely enable FIs to source the data from the four types of datasets (private, public, specific, internal) – introducing the possibility of integrating real-time data sources.
Additionally, there is an opportunity to integrate with the existing ecosystem (including data lakes and data meshes) and create new value propositions. For example, to unlock funds, certain firms would prefer to directly share information with their investors, creditors, and insurers. This valuable channel opens new opportunities for asset managers. Cloud hyperscalers are already offering data standardization services to process ESG data. For example, AWS Data Exchange simplifies the discovery, subscription, and use of third-party sustainability data in cloud.
Hyperscalers also collaborate with organizations to provide ESG, public, weather, air quality, satellite imagery, and other such data to its subscribers. Services offered by the hyperscalers are secure and transparent. The use of AI and analytics services over this data is critical to build solidity to this approach and lay the foundations for a robust ESG ecosystem.
AI-ML and Automation of DQ
It is important to create good, clean data under the principles of data quality (DQ) dimensions[7]. The pressing issue for FIs is in gleaning high quality data and making best use of the unstructured and non-standardized ESG data made available to them. Machine learning (ML) algorithms can identify gaps in the data provided by the third-party data providers.
Si# | DQ Issue | ML Technique |
1 | Improving data quality | Use of supervised, regression, classification, and anomaly detection algorithms can resolve common issues such as character encoding, multiple rows of data and inconsistent formatting, among others |
2 | Filling Data Gaps | Techniques such as imputation, deep latent variable models (DLVMs) and other unsupervised learning algorithms use patterns, clusters, statistical correlations, and causal structures to create meaningful synthetic data[8] |
3 | Removing Incongruous Records | Random Forest algorithms or Logit model[9] can flag off/remove incorrect records that may have been reported inadvertently |
4 | Extracting ESG insights | Natural Language Processing, reinforcement learning and other emerging AI/ML methods[10] can convert text[11]/ images[12] to data |
Finally, any new ESG data framework will revolve around the “known unknowns” such as data governance, data lineage, data controls, rules engine that strive to continuously improve the sourced data and reliability of the data provider. These features can be built on top of any existing ecosystem or introduced into a new framework in the initial stages itself (refer Figure 5). Task automation can solve the issues of timeliness and remove manual errors. Data quality platforms (DQP) with dashboards and remediation platforms aim to create a unified view of data sources and enable the continuous monitoring of DQ and process metrices. Users close to this data will patrol the framework and improve the platform. They will have a free rein to contest the massaged and streamlined data created by the ML models. Their inputs will serve the double purpose of ensuring the quality in the current dataset and improving the accuracy of the models. Data controls and ESG-specific metrics will help in measuring the accuracy, consistency, and relevance of the data.
Conclusion
The core expertise of the FIs offering front-end client services lies in offering innovative products and effectively allocating capital. Hence, it is not realistic to expect them to put a lot of time and effort in sourcing the ESG data. We expect this gap to be filled by external vendors or participants in a financial market infrastructure (E.g., Custodians[13]). But it is important for such firms to ensure that a strong ecosystem exists to source different types of data from diverse sources and build a single source of truth for themselves and to map their organization objectives and business requirements with data availability. While this can also be built on top of an existing data ecosystem, care should be taken to ensure that it supports the different jurisdiction specific taxonomies and enables the comparability of the data both within (minimum viable) and across taxonomies (utopian).
In Part 2 of this series, we will focus on how firms can consume this SG data to improve portfolio performance, manage risk and comply with reporting and disclosures. We will explore how this data can be used to create forward looking indicators, stronger models, increase granularity in research and improve aggregation and analysis.
Additional contributors: Vishesh Mangal and Nivedha Elango
[1] Data exclusive to the individual firms in which FIs are investing/lending/insuring
[2] Financial disclosures, social media, news articles and other publicly available information
[3] Data that is generated within the FIs
[4] supplied by specialized ESG data providers such as MSCI, Refinitiv, Sustainalytics
[5] https://www.ngfs.net/sites/default/files/medias/documents/progress_report_on_bridging_data_gaps.pdf
[6] SEC.gov | Statement on Proposed Mandatory Climate Risk Disclosures
[7] accuracy, consistency, completeness, timeliness, integrity, conformity, and veracity
[8] https://www.bloomberg.com/professional/blog/imputation-of-missing-esg-data-using-deep-latent-variable-models/
[9] https://www.bis.org/ifc/events/ifc_nbb_workshop/ifc_nbb_workshop_2d3.pdf
[10] https://www.nature.com/articles/s41561-020-0582-5
[11] https://www.econstor.eu/bitstream/10419/230148/1/FMII_FMII12132.pdf
[12] https://www.bis.org/ifc/publ/ifcb56_23.pdf
[13] https://www.ssga.com/investment-topics/environmental-social-governance/2019/03/esg-data-challenge.pdf

Amitabh Nangia
Associate Partner
20+ years experience across asset management, commercial & investment banking, investment operations & research. He has worked on client engagements including ESG, automation, business & IT transformations (in areas such as investment ops, finance, retirements, wealth management, etc.). Contact him at amitabh_nangia@infosys.com.

Karthikeyan RJ
Senior Principal
20+ years of experience in the financial services industry advising clients on banking, risk and regulatory compliance solutions. He has played the role of a product manager for core banking solutions, regulatory tech and business intelligence solutions. Karthik also has a strong data advisory background helping clients solve business problems built on data insights. More recently, he helps FS organizations transition to low carbon economy by building green portfolios and complying with sustainability regulations. Contact him at karthikeyan.rj@infosys.com.

Aadarsh Raghavan
Principal
15+ years of strategy & technology consulting experience in asset and wealth management, asset servicing, risk management and regulatory compliance. He has worked on various advisory and implementation engagements for large financial institutions in the UK, the US and India across business transformation, product strategy and design, process consulting, data management, and program management. He is also a certified FRM. Contact him at aadarsh.raghavan@infosys.com.