When I try to check in for international flights online, I’m not always able to. The country of birth listed on my passport — Hong Kong — is only sometimes listed as an option when selecting a country, depending on the airline. When it isn’t, I have to head to the airport early, explain to the staff the unique characteristics of my passport and pray that they’re willing to help me out. Usually, this is more of an inconvenience to me than anything else, but it raises the question: Who gets to decide what a country is? And if someone is arbitrarily making that decision, what lines are being drawn in the context of other, even more fluid realms of data, like gender, sexuality or race?
The answer lies in the history of the database. When the relational database — an efficient multi-table index — was first proposed in 1970, it quickly became the industry standard. Even database systems that were introduced later to mitigate technical concerns with the classic relational database (i.e. NoSQL databases or distributed SQL databases) were built with the same assumptions about data as their predecessors. As a result, many of today’s data systems still operate under societal norms from five decades ago, which is especially harmful to marginalized groups. Our technology is in fact holding us back from social progress. Societal views have evolved to become more inclusive — thus, software developers must start building systems that can adapt to changing views about the human experience.
One of the most eye-opening examples of database design’s harm to marginalized groups is its reductive representation of gender identity. In the mid-20th century, predominantly male and cisgender software engineering circles seldom recognized nonbinary gender identities. As such, when standard database schemas were created, gender was invariably represented as a binary (zero or one), setting up a system that ignores the idea that gender is a spectrum. Having databases store the variable of gender in a binary saved space, so the practice became the norm, despite the fact that it meant they could not capture all gender identities. In fact, this design was probably considered elegant and clever.
But since the 1970s, even as perceptions about gender identity have changed dramatically, databases have, for the most part, remained static in how they represent gender. Although many tech companies have made surface-level changes in their user interfaces that acknowledge nonbinary gender identities, few have actually overhauled the ways they store this information in their databases. Take Facebook, for example: Even after users were given the opportunity to specify their genders freely, their identities were still eventually flattened to a zero, one or null — meaning “nothing” — in the site’s internal database. Nearly all entities that use gender data, from external advertisers to internal algorithms, are therefore forced to handle gender as a binary, even though much of society now sees it as a spectrum.
The same can be said about ways our predecessors chose to use databases to store information about race (most databases have a limited set of categories that don’t capture everyone) and even names (databases often assume a first-last name structure that a number of cultures don’t use). Databases habitually abstract away their data’s insignificant details to access, analyze and process them more efficiently. However, if the developers who created the databases in the 1950s have the final say on which details are insignificant, we trap ourselves in an outdated worldview.
Fixing this issue is more complicated than it seems at first glance, though. Since tech stakeholders rely on receiving gender data as a binary, tech companies have found it difficult to fundamentally revamp their databases and services to be more inclusive of the gender spectrum. On the other hand, if tech companies don’t make changes to their databases, tech stakeholders aren’t pushed to think of gender as nonbinary, either. If this cycle is to be broken, any legacy code that expects gender as a binary would need to be replaced. Chains of functions that feed each other information based on assumptions about gender being represented in this way would all need to be entirely rebuilt from scratch — yet removing them all at the same time, rather than gradually, would break the entire system. Thus, cleaning out legacy code will end up being a slow and painstaking process.
All that said, making small, surgical fixes to the standard database design will not be a permanent solution. Inevitably, social conventions will evolve, and we’ll have to make new changes. Our databases should reflect those progressions. What is important is that technology is given the flexibility to accommodate society’s developments as we see fit. Software engineers need to maintain a capacity for adaptability in order to foster true inclusivity for future generations.
While this is a daunting challenge, none of this should be taken as an excuse to be complacent in the meantime. Maximizing inclusivity to the extent that we can is an undeniably worthy goal. Moving forward, we must be more cognizant of the fact that when we release software to the public, its impact will be ongoing. The software we build today should reflect our best judgements on what trade-offs are intelligent and what design decisions are fair, but ensuring that we produce flexible code is what will guarantee that these judgments don’t impede our future progress.
Anika Bahl ’24 can be reached at firstname.lastname@example.org. Please send responses to this opinion to email@example.com and other op-eds to firstname.lastname@example.org.