In the early years of the post-genomic era, an intriguing study published in Nature analyzed DNA sequences from over a thousand individuals from various regions across Europe. Using a mathematical dimensionality reduction technique, researchers condensed this genetic data into two dimensions for visualization. When the national origins of the participants were plotted on this map, a rough outline of the European continent emerged*. This was a remarkable finding: purely biological data, processed entirely through mathematical and computational means, enabled the prediction of participants’ regional origins with an accuracy of approximately ±100 km.
The underlying reason is rather simple: aside from major wars and large-scale migrations, human mobility has historically been quite limited. People traditionally married within their villages—or at most, neighboring villages. Even in cultures where consanguineous marriages were uncommon, the gene pool rarely extended beyond a radius of 20–30 km. Thanks to this shared genetic legacy, we can still observe region-specific facial features and physical traits. Yet this also creates a kind of vicious cycle in determining which genes or genetic regions influence specific traits. We don’t have a user manual for what each gene does; instead, we rely on crude statistical comparisons. For example, researchers may compare the DNA of 50 individuals with multiple sclerosis (MS) to that of 50 healthy individuals to try to identify genes or genomic regions associated with the disease. This is exactly where the vicious cycle arises: some traits—and even some diseases—can be correlated with geography or environment. In such cases, geographically or environmentally specific genetic markers may be falsely interpreted as causally related to the trait or disease being studied. There are, in fact, ways to eliminate such confounding effects, but these methods tend to be more labor-intensive, time-consuming, and expensive.
Gene-centric, reductionist, and militantly atheistic scientific circles often show reluctance to address such confounding effects. In introductory statistics courses, we illustrate the distinction between correlation and causation with humorous examples. For instance, in the U.S., there is an almost perfect correlation between ice cream consumption and shark attacks throughout the year—but no one thinks that eating ice cream causes shark attacks. The true driver behind both is hot weather: we eat more ice cream and swim in the ocean more often, increasing the likelihood of shark encounters. In statistics, this kind of hidden variable is called a confounder.
The errors in measuring gene-trait associations are similar—though far less obvious—making them easier to overlook. Geography and environment imprint patterns on the genome, and those same environments can independently influence physical traits. When we compare DNA sequences associated with those traits, we may find correlations that are as causally unrelated as the ice cream–shark attack example. So why are gene-centric scientists more prone to this fallacy? Because imagining the gene as something passive—shaped by and reactive to its environment—undermines the reductionist narrative. It weakens the credibility of a reductionist theory of evolution. A Platonic view of life as a complex, indivisible whole can better capture biological reality and threaten the ideology that’s been constructed over the past 150 years.
Gene-centered circles sometimes persist in this view even at the risk of ridicule. For instance, a study published as recently as late 2024 in the prestigious journal PNAS attempted to find a “wealth gene” by comparing the DNA of over 350,000 individuals in a UK biobank with their socioeconomic status**. Indeed, they identified patterns in the genome that correlated with wealth. Although the paper briefly mentioned its limitation by to not fully account for familial and environmental confounders, this omission should have invalidated the study altogether. Of course, centuries of aristocratic gene pools would form certain patterns—and of course, a correlation between the DNA of these socioeconomically privileged groups and their wealth would yield another classic ice cream–shark attack fallacy.
So far, we’ve only discussed DNA sequences. There’s also the epigenetic dimension of biology, which requires attention where, the DNA sequence remains unchanged, but environmental factors can trigger molecular modifications that alter gene expression—and these changes can be inherited. For example, in certain genetically predisposed forms of diabetes, lifestyle choices like healthy eating and regular exercise can cause molecular alterations without changing your DNA code. These modifications might lead to milder forms of the disease and can even be passed on to your children. Of course, epigenetic changes can also be negative. This means that what we biologically pass on to the next generation isn’t limited to DNA sequences; it may also include the molecular consequences of our diet, environmental exposure, stress, or perhaps even our spirituality.
Findings like these, which change the direction of biological information flow and force us to explain biology as an inseparable whole rather than reducible parts, continue to challenge the wheels of reductionist biology.
Ultimately, the relationship between geography and genes can be interpreted not just through DNA sequence similarities shaped by gene pools, but also through epigenetic mechanisms. While reductionist biologists continue to chase after supposed genes for wealth, immortality, homosexuality, religious belief—or even genes that determine where we settle and thus our geography—life continues to unfold along a complex, unpredictable trajectory shaped by our inherited legacy, our environmental context, and our acquired choices. In the end, echoing the often misattributed slogan to Ibn Khaldun, “Geography is destiny,” we might as well say: “Geography is the gene.” And we wouldn’t be entirely wrong.
* doi: 10.1038/nature07331
** doi: 10.1073/pnas.2414018122