How ESG ratings are built

The process of creating comprehensive ratings for thousands of companies is mind boggling. It's also fraught with a multitude of challenges.

By Joel Makower

May 11, 2022

Part Two of a three-part series. Read Part One here. Part Three here.

Rating a company on its environmental, social and governance performance and policies is a daunting task.

It can require mining mountains of data from dozens of sources, eventually boiling it all down to a single metric — a letter grade or numeric score, in most cases. And it means doing this for thousands of companies across dozens of sectors — and keeping it current, reflecting changes in company leadership, strategy and circumstances, not to mention sectoral trends — year in and year out.

That’s not all. The scores, and some underlying data, need to be assessed comparably and transparently against a consistent methodology, adjusted for each sector to reflect the nature of the rated companies’ business operations. And as some large companies may be in multiple businesses — General Motors, for example, makes vehicles but also has a financial services business, GM Financial, that underwrites leases — a given company’s rating may require assessing them from multiple angles.

As mind-boggling as the process seems to be, it’s also fraught with a multitude of other challenges.

As mind-boggling as the process seems to be, it’s also fraught with a multitude of other challenges, say regulators, industry experts and many rated companies themselves. At stake are potentially trillions of investment dollars seeking to align with society’s expectations, not to mention the carrying capacity of the earth’s natural systems.

Over the past few months, as I’ve dived into the world of ESG ratings, I’ve come to appreciate the magnitude and the complexity of the task confronting ESG ratings firms. I’ve also begun to understand some shortcomings and limitations of these firms — and of the ratings themselves.

While those criticisms don’t by any means disqualify ESG ratings as a key tool for investors, companies and others, they should give pause to those who rely on them as a measure of companies’ progress in sustainability.

Sprawling landscape

To begin to understand the nature of ESG ratings, one must appreciate the sprawling landscape they aim to describe.

Let’s start with environmental issues — the "E" in ESG. Among the topics about which a ratings firm might inquire are: greenhouse gas emissions; other air emissions; water use and discharges; carbon footprint: policies to end the use of fossil fuels; soil contamination; compliance with the Paris Agreement targets; energy consumption and intensity; use of renewable energy; production of hazardous waste; deforestation; product reusability and recyclability; reliance on declining natural capital stocks; operations affecting protected or threatened species and susceptibility of facilities to extreme weather events.

Now for "S," the social dimensions, which, by definition, focus largely on people. Among the topics: human rights, use of child or forced labor in supply chains, employee and supplier diversity and inclusion, equitable representation and compensation, discrimination, personal data security and privacy, product safety, worker safety and well-being, cybersecurity risks, community relations, human capital development, family leave policies and animal welfare.

Finally, there’s "G," for governance, a topic that looks at the organizational structures, policies and behaviors of a company across a broad range of fronts, some of which doesn’t fit easily within the environmental and social buckets. Among the issues: company compliance with local and national laws, board diversity, executive compensation, board engagement and oversight of ESG issues, business ethics, conflicts of interest, transparency and accountability, codes of conduct, corruption and bribery, tax reporting and policy engagement.

Keep in mind that these are merely examples of the myriad topics that may be considered under each of the three buckets when assessing a company. A typical ratings firm may evaluate companies on as many as 700 criteria. Which explains why the questionnaires they send to companies can be 300 to 400 pages long.

A typical ratings firm may evaluate companies on as many as 700 criteria.

Also note that some of these topics are short-term in nature, while others are longer-term. Some involve activities that can be controlled directly by the company, while others lie outside of the company’s direct influence — in supply chains or in customers’ use of products, for example, or in host governments’ laws and customs.

Still other considerations in creating an ESG score are a company’s business model, financial strength, geography and "incident" history — that is, the number of accidents, lawsuits, fines and other circumstances that could indicate sloppy or unethical practices and, thus, increased risk to a company and its shareholders.

You might ask how it’s even possible to assess and score such a vast menu of diverse items, let alone roll them all up into a single metric. To understand how this is done involves delving into the methodology documents publicly available on most raters’ websites. For the hardy, here they are for five of the most commonly used ratings agencies: ISS; Moody’s; MSCI; S&P Global; and Sustainalytics.

As you’ll see, these are mostly dense documents, describing in various levels of detail the research process and the scoring calculations. They do not make for casual reading.

But don’t let all this verbiage, which the ratings organizations uniformly tout as transparency, be confused with clarity.

"Just because it's published doesn't mean it's understandable," said Suzanne Fallender, who spent 15 years in corporate responsibility at Intel before moving to head global ESG at the real estate and logistics firm Prologis earlier this year. She spent the early part of her career at the ratings firm ISS. She’s looked at ratings from both sides now.

"I think some have done a pretty good job of explaining the methodology," she told me. "Some are not as transparent. And even when firms are transparent on the methodology, it's a lot to go through. From the corporate side, teams need to take the time to really understand not just what's getting measured, but how it's getting factored, how they're weighted and how that might change over time. And because these things evolve, even if you think you understand something now, the methodology may change next year."

3 buckets

While each ratings organization has its own process and methodology to create ratings, in broad strokes the process involves three major buckets of activity:

Materiality: Determining which indicators are relevant for a given company and sector
Data harvesting: Gathering information about the company from various sources
Scoring: Weighting and evaluating the data to create a rating

Let’s take a brief look at each.

Materiality. To assess a given company requires understanding what’s material for that company — that is, what environmental, social and governance issues are deemed fundamental to a company’s financial success or that can create legal, regulatory, reputational or other risks. According to the Value Reporting Foundation, "a matter is material if it could substantively affect the organization’s ability to create value in the short, medium and long term."

Obviously, that means starting with a company’s sector; there tends to be a high level of commonality among companies doing similar things. But it also means understanding the company itself: Where in the world it has facilities, what activities take place at each location, the kinds of resources it uses and where it sources them, and other issues.

As I said, some companies’ operations don’t fall neatly into a single bucket, operating multiple businesses or across industry sectors. This can add a degree of difficulty, potentially unwittingly biasing the company positively or negatively.

Data harvesting. Information can come from a wide range of sources, primarily from the rated company itself, with other data harvested from regulatory filings, proprietary databases, media reports and in-house research. Not all companies engage with raters for any number of reasons, and even companies that do engage don’t share all the data that a rater might be seeking, either because they don’t have it or because they don’t want to.

Filling in those gaps requires the rating organization to engage in something called "imputation" — essentially, making educated guesses, albeit very sophisticated ones, involving statistical regression models, input-output calculations and other techniques. This is a dirty secret of ESG ratings: Half or more of the data used to create them is imputed, not actual, verifiable information.

The ratings organizations are emphatic that their years of doing these things yield accurate results, though with each rater using its own methodologies — each with its own built-in biases — it can lead to vastly different assumptions about a given company. (More on that later.) And not all raters use imputation to fill in gaps, meaning that one could falsely assume that a company doesn’t have any carbon emissions, and therefore no risk in that regard, simply because it did not disclose any such information.

One benefit of imputation is that it creates a strawman analysis that the rater can leverage to get the attention of a company that has provided incomplete information or none at all. "The role of imputation is to fill the gaps where there is no disclosure but also to provide a juxtaposition for the company to view how we have actually analyzed their business, despite them not reporting on that," Richard Mattison, president of S&P Global Sustainable1, explained. He added that when viewing S&P’s ESG ratings, readers have the option to include or exclude imputed data from their view. "We're very transparent about the difference that imputation makes in our scoring."

No data companies really want to do imputation, "but they have to do it," Mattison said. "We'd far rather as a group take really consistent, regulated, well-disclosed information and use that as a starting point for our assessments and gather that intelligence without anything else. That would reduce a lot of the variance you might see between some of these scores."

Scoring. Finally comes the creation of the actual scores. Again, each rater has its own methodology for doing this. And the challenge is considerable: How do you roll up all this information into an aggregate score? How much weight do you give each item, let alone the individual environmental, social and governance buckets? Do you weigh them equally or prioritize one over the others? At most agencies, the weightings and scoring systems are continually evolving, as modeling improves and stacks up against real-world data.

Another lesser-known fact is that companies generally are ranked against their peers, not against the entire universe of companies. So, a high-scoring oil and gas firm may be exactly that: a fossil-fuel extractor, refiner, transporter and retailer that is leading its sector in how it addresses environmental, social and governance risks.

The product of all this work shows up as a single score but also as a substantive report. Kristina Rüter, global head of ESG methodology at ISS, talked me through an example. "For every single indicator, you see the score, you see the weight and you see a little sentence or text about what has been assessed and how. So, for example, there is text that says there's a comprehensive due diligence system implemented with regard to human rights, and what elements that includes. This text also discloses what is lacking for a better assessment."

Black box?

Not everyone is a fan of the process by which ratings are created. "It's very points-based. It's not performance-based," said Katie Schmitz Eulitt, director of investor relationships at the Value Reporting Foundation, the successor to the Sustainability Accounting Standards Board. "Without knowing what's happening in the black box, one could infer that it seems like they're still looking at points and policies, not at performance."

The whole process seems shrouded in mystery to many, they said, despite the raters’ uniformly touting their transparency.

That term — "black box" — came up in many conversations I had for this series, in particular from rated companies, many of which did not want to be quoted in order to maintain good relationships with the ratings firms. The whole process seems shrouded in mystery to many, they said, despite the raters’ uniformly touting their transparency. They don't seem to understand how raters get to their particular ratings but also on what it will take for a company to rate better next time.

"Some of it is, ‘How did they come up with that number? We thought we provided the top answers for that,’" said Doug Sabo, chief sustainability officer at Visa. "We want to go in the right direction on ESG. The ratings firms want companies to go in the right direction. Investors want the same thing. There's alignment of mission. But can you make it easier for us to understand if we aren't performing at the top, what more could we do? Sometimes it's hard to get that feedback."

The corporate view of the opacity of the ratings process is underscored by how difficult it can be to get incomplete or erroneous data corrected.

"I've worked with companies of all sizes and many different sophistication levels when it comes to ESG performance, and rankings and ratings," said Evan Harvey, chief sustainability officer at Nasdaq, which itself is a rated company. "And this is far and away the No. 1 topic of conversation. I mean, now it might be the [Securities and Exchange Commission] proposal. But up until very, very recently, ‘Our company is being unfairly ranked or rated by these firms’ was the No. 1 concern. ‘Can you fix that for me?’"

Companies, Harvey said, feel like "they are totally out of control on their own narrative and getting this data fair. And whether their score is positive or negative, they feel like it's just misrepresented a lot of the time."

One challenge has been that many raters require information to be public in order for it to be counted, according to Visa’s Doug Sabo.

"There's an inherent challenge in some cases where their methodology requires public information. And if it's a topic that has sensitivity to it, then it may result in your getting scored lower than from other assessors who do look under the hood and under NDA, and really understand it." Sabo cited cybersecurity policies as one example of a sensitive topic about which his company did not disclose publicly.

Getting to talk to a real person at the agencies could ease a lot of companies’ frustration, Sabo said. "In a lot of ratings agencies, it's often a mailbox and not an individual. It's hard to know where's the doorbell to ring to say, ‘Hey, can we actually have a live conversation about something because we have some more that we can share?’"

"You have to stay on them," counseled Emilio Tenuta, chief sustainability officer at Ecolab, of the raters. "There's a response rate that makes sense. You have to stay with it and build a rapport with them and really connect with them. To their credit, they are more than ever willing to collaborate on where the gaps are and how to remediate them."

The great divide

And then there’s the challenge of divergence: how much ratings for the same company differ among ratings firms.

"The ecosystem is an overtaxed system," said Evan Harvey. "These firms now have disproportionate influence in the investing space. They often have small teams, they often have relatively limited resources and/or acumen when it comes to judging things. The most frustrating thing for companies is, ‘The same data goes in ISS, the same data goes in Sustainalytics, and I get two different ratings out. How is that possible? How could one firm say I am a sustainable company and the other firm say I'm essentially on the verge of failing the planet?’"

That’s yet another constant refrain, as State Street Global Advisors reported back in 2019. Unlike the nearly perfect 0.99 correlation of credit ratings between Moody’s and Standard and Poor’s — meaning that the two firms’ ratings were almost perfectly aligned — it found that the correlation across four major sustainability-related ratings firms was as low as 0.48, meaning that their ratings of companies were consistent only about half the time.

If you ask the raters about this, they’ll say that it comes with the territory.

"I think that's not only something we have to live with, but I think it's also beneficial, Kristina Rüter of ISS explained. "There are different approaches on the market." For example, she said, some raters don’t consider the company’s business model. “We see some competitors go out with very positive scores for tobacco companies. They do not consider the business model as such, and they apply a purely relative scoring system, a best-in-class system where the best has a positive scoring."

She continued: "Given the difference in approaches, it is obvious that they will yield different results. And it's important that we have transparency on these approaches and that investors make an informed decision on what approach they want to use, which is most aligned with their investment strategy."

S&P’s Rich Mattison agrees that there’s strength in diversity. "In a perfect world, you might say that every agency has the same purpose for an ESG score, and that every agency is using consistently disclosed information. So therefore, what you're revealing is a difference in opinion around a tightly defined parameterized set of criteria. But that would also crowd out innovation around actually why we're looking at a range of different things. It is because these are emerging topics."

"I mean, it's a business," said Suzanne Fallender at Prologis. "It's a business. They're all trying to be the highest-quality or most useful rating. They're trying to differentiate on their side."

Fallender is another who believes that rated companies may have to live with the differences.

"From a corporate perspective, how you need to interpret that is not trying to chase each one of them to the same level. But really understanding what's consistent across those, what can you glean that makes sense, what is coming across multiple ratings. I very rarely look at one rating and it tells me everything I need to know."

But Katie Schmitz Eulitt from Value Reporting Foundation thinks we may be too focused on the wrong things. "What we're really aiming for here is the improvement of real-world outcomes, right? If you're spending so much time focusing on ‘Well, you got this wrong on our rating,’ that diverts attention away from improving performance on whatever you're being graded on in the first place. I think there's so much attention paid to improving the score and not the actual outcome."

Next: Are ESG ratings really necessary?

Thanks for reading. You can find my past articles here. Also, I invite you to follow me on Twitter and LinkedIn, subscribe to my Monday morning newsletter, GreenBuzz, from which this was reprinted, and listen to GreenBiz 350, my weekly podcast, co-hosted with Heather Clancy.