The Unintended Consequences of Data Standardization

Building a more equitable, effective, and efficient social sector will require understanding and addressing these risks.

The benefits of data standardization within the social sector—and indeed just about any industry—are multiple, important, and undeniable. Access to the same type of data over time lends the ability to track progress and increase accountability. For example, over the last 20 years, my organization, Candid, has tracked grantmaking by the largest foundations to assess changes in giving trends. The data allowed us to demonstrate philanthropy’s disinvestment in historically Black colleges and universities. Data standardization also creates opportunities for benchmarking—allowing individuals and organizations to assess how they stack up to their colleagues and competitors. Moreover, large amounts of standardized data can help predict trends in the sector. Finally—and perhaps most importantly to the social sector—data standardization invariably reduces the significant reporting burdens placed on nonprofits.

Yet, for all of its benefits, data is too often proposed as a universal cure that will allow us to unequivocally determine the success of social change programs and processes. The reality is far more complex and nuanced. Left unchecked, the unintended consequences of data standardization pose significant risks to achieving a more effective, efficient, and equitable social sector.

Three Common Challenges With Data Standardization

Fundamentally, when data and information become standardized, they tend to reflect and normalize the lived experience of those in power. After all, it is generally those in power who create the standards. For example, standardized testing in schools promised to create a fair assessment of students’ knowledge across schools year-over-year, increasing the collective ability to benchmark, measure progress, and reward positive performance. However, decades of research have shown that standardized tests often disadvantage those who are different from the test designers (e.g. in vernacular, in test-taking preferences, or in access to resources). Moreover, some studies suggest that US standardized tests aren’t as accurate in predicting performance for women and people of color.

Are you enjoying this article? Read more like this, plus SSIR's full archive of content, when you subscribe.

Another potential flaw in standardized data lies in valuing certain data points or results over process and understanding, which can incentivize gaming the system. In standardized testing, research has found that schools and teachers anxious about standardized testing started teaching to the test, perhaps at the cost of actual learning. Another example is the rise of p-hacking: As more academic journals required a standardized threshold of statistical significance (called a “p-value”) to get published, researchers began tweaking analyses, or “hacking,” to increase their odds of finding statistically significant results, despite the fact that this practice decreases the validity of the findings.

Data standardization can also fall short of its intended purpose because people tend to overestimate the universality of their data, metrics, and approaches. It’s easy to assume that humans are interchangeable and that the findings from one study can be generalized to people of different identities, circumstances, or cultures. Such extrapolations—even when they seem benign—can lead to unintended inequities and inaccuracies. For example, studies have shown cultural differences in how people respond to Likert scales (e.g. survey questions asking participants to respond on a scale from one to seven). Compared to white respondents, Black and Brown participants are more likely to choose extreme response categories, while Asian respondents are more likely to avoid extreme responses. The rapid rise of new technologies adds increased risk by creating circumstances where erroneous extrapolations about data can lead to standards that are exclusionary and dangerous. For example, motion detectors calibrated to pale skin may fail to recognize dark skin, which could be deadly if self-driving cars don’t recognize dark-skinned pedestrians. Meanwhile, virtual reality headsets designed to fit male skulls result in women being more prone to cybersickness, a substantial disadvantage as virtual reality gains popularity in the workplace and everyday life.

The social sector is not immune to these challenges. Normalizing the perspectives of those in power is practically woven into the fabric of the US social sector. Grantmakers—who control a large proportion of the sector’s resources—are often the ones who determine what data is collected in grant proposals as well as how to measure the success of a given program. Subsequently, nonprofits may feel tempted to focus on achieving a specific data point, metric, or result that grantmakers value—even at the cost of what would most benefit their communities and missions. Similarly, universalist conclusions can result in incorrect assumptions that data about one program or nonprofit can be taken out of context and applied elsewhere.

Creating More Accurate, Equitable, and Inclusive Data Standards

The path toward a more equitable, effective, and efficient social sector with data at its center must be more inclusive, transparent, and introspective. It will require those in power—those setting the standards—to dig deeper than surface-level universal metrics and to look beyond the scope of their own experience and knowledge. It will not be simple, fast, or easy. However, there are practical steps we can take to minimize the negative effects of data standardization.

Standardize the “building blocks” rather than the “whole house.” Standardizing small pieces of data or information that can then be used in diverse ways can deliver many of the benefits of standardization while reducing the risks that can come from overvaluing or universally applying specific data points. For example, rather than forcing a completely standardized grant application process, we can identify smaller aspects of grant applications that can be standardized. Candid has been experimenting with this approach with our Demographics via Candid campaign. This initiative invites nonprofits to share a short overview of key demographic information about their organizations on Candid’s nonprofit profile and allows grantmakers to import that data into their various grant application processes, essentially starting a social sector demographic data registry. This approach eliminates repetitive demographic data entry for grantseekers and offers the social sector a way to benchmark and assess demographic information. At the same time, it also allows for customization when context requires it—for example, collecting additional demographic information if it is relevant for the specific grant in question.

Be clear about definitions and operationalizations. In general, a lack of clear definitions is frustrating, but it becomes infinitely more important to address when creating standardized data and information used across various people and contexts. After all, what value is there in collecting standardized data if everyone is interpreting questions or answers differently? A best practice is to include definitions for any jargon or key words as part of the data collection, interpretation, and dissemination process. Candid does this in our demographic data collection guidance, which provides organizations with standard definitions and best practices to use when collecting demographic data within their organization. Another good example is the public definitions in the NTEE IRS Activity Codes, a part of the taxonomy system the IRS and NCCS uses to classify nonprofit organizations.

A slightly more nuanced but equally important step is to be clear about operationalizations. Operationalization refers to how researchers or evaluators measure and convert abstract concepts into data. For example, “happiness” can be defined as a positive emotion ranging from contentment to joy. However, this definition can be operationalized in many different ways—tracking behaviors such as smiles and laughter, asking people on a survey whether they feel happy, coding responses to journal entries, etc. Similarly, in the social sector, BIPOC-serving nonprofits may be defined as nonprofits who serve the BIPOC community. However, this can be operationalized in many ways—whether it is stated in nonprofits’ missions or programs, whether the local community meets a certain percentage of BIPOC representation, whether a specific program meets a certain percentage of BIPOC representation, etc. Similarly the term “BIPOC-led” can be defined as organizations’ whose leaders identify as BIPOC. But this definition can also be operationalized in various ways depending on the definition of “leader.” Is it the CEO, the founder, the board of governors, or the entire executive team? What proportion of leadership is required? Being clear about operationalizations is a prerequisite for knowledge-building in any field. Without it, it is impossible to tell whether differences in metrics and results are due to progress in the field or differences in measurement.

Support data transparency in methodologies and limitations. Research, data, and information are too often expected to be taken at face value. Organizations leveraging data or research in any capacity should always clearly state the methods used, limitations, and any known assumptions that could have influenced the data or findings. In particular, data used as a benchmark or a standard for the field should be explicit and transparent about how the data was created, what tests of validity were conducted, and any best practices regarding its use. Doing so will help end users understand what perspectives and norms were included, what sample was used to create the data, and what additional questions remain. Numerous resources exist for data collectors aiming for transparency, such as the American Association for Public Opinion Research’s Transparency Initiative and the National Institute of Health’s guide to writing about limitations.

Embrace participatory research methods. When attempting to create standardized data, it’s important to hear directly from the groups those standards will impact. Participatory methodologies seek to include and center a broader range of voices and perspectives when designing research and data initiatives. This approach is valuable as it can test assumptions that initial researchers and analysts might be making based on their own knowledge, expertise, or experience. It also challenges assumptions and overgeneralization, and can prevent normalizing and centering the experience of those in power. For example, a recent report on BIPOC-led nonprofits in New York
convened a group of nonprofit leaders in New York to determine what operationalizations of “BIPOC-led” to include in the analysis.

Question assumptions early and often. It might sound cliché, but one of the most important elements to support solid data standards and center equity is to challenge assumptions and unpack potential biases. This could be as simple as having a list of questions that those in power should be asking any time data standards are created or analyzed. For example:

Despite an influx of data analytics and tools, the human element intrinsic to the social sector means that no matter how much data we have or how much effort we put into standardization, we are still susceptible to misleading information, hidden biases, and inaccurate conclusions about the people and communities we are committed to serving. However, by adopting approaches that incorporate more voices, clarify terms, are transparent about methodologies, and reduce burdens and barriers, the social sector will build more equity and improve its ability to deliver lasting, meaningful change.

Support SSIR’s coverage of cross-sector solutions to global challenges.
Help us further the reach of innovative ideas. Donate today.

Read more stories by Cathleen Clerkin.

Measurement & Evaluation

Cathleen Clerkin
Cathleen Clerkin, PhD., is associate vice president of research at Candid.