A clade is a group of organisms with a common ancestors. Identifying clades in human genetics maps out the ancestry of human groups. Like most genetic analysis of human populations, it is based on identifying alleles, patterns of variation in human genes.
As far as I am aware, the most complete study currently available identifying the structure of human clades is here. Entitled: Human population history revealed by a supertree approach and compiled by two researchers from the University of South Bohemia, the 2016 study incorporates the following chart.
The geographic mapping of large clades obviously has some connection to our current folk notion of race, but is hardly a close match. While it is true that cluster analysis can get us to groupings of human populations something like the folk notion (pdf) of race, such as in this 2002 study (pdf), cluster analysis simply looks at similarities while clade analysis is based on identifiable underlying causal structure (specifically, ancestry).
In the 2002 study, Genetic Structure of Human Populations, by biologists from a range of institutions and countries, a relatively mathematically robust grouping was found at K=5 clusters, which does match the folk notion of race quite well. But mathematically robust groupings were found at various numbers for K. As the authors conclude:
The challenge of genetic studies of human history is to use the small amount of genetic differentiation among populations to infer the history of human migrations. Because most alleles are widespread, genetic differences among human populations derive mainly from gradations in allele frequencies rather than from distinctive “diagnostic” genotypes. Indeed, it was only in the accumulation of small allele-frequency differences across many loci that population structure was iden- tified. Patterns of modern human population structure discussed here can be used to guide construction of historical models of migration and admixture that will be useful in inferential studies of human genetic history.
Which is what identifying clades does much more directly.
So clades, not clusters. If the human biodiversity folk are intellectually serious, they should base their analysis on clades, not on whatever clustering seems otherwise convenient. While the folk notion of race is not entirely silly, it is nowhere near analytically robust enough to be of use to analyse well, anything, really.
In particular, classifying people by race strips them of their cultural and civilisational legacies, which are much more important collections of causal factors than genetic clusters than match patterns of ancestry fairly poorly. As the authors of the 2016 study note:
The linguistic classification fits rather poorly on the supertree topology, supporting a view that direct coevolution between genes and languages is far from universal.
Thus, for example (links added):
The poor fit of Macro-Altaic and the families that constitutes it (especially the Turkic) is in agreement with the fact that there is only a weak unifying genetic signal for the Turkic-speaking populations across Eurasia. The expansion of Turkic languages has probably been largely mediated by language replacements rather than demic expansion.
We are the cultural species. A basic reality that race talk both ignores and gets in the way of understanding. Even ancestry is at best a partial match with culture.
Race talk is pretty dreadful for analysis of social patterns but remains good for one thing: racial stigmatisation (brilliantly analysed by economist Glenn Loury). Which all sorts of people have found race talk useful for, and still do, but that is not remotely a recommendation for race talk. Indeed, it remains true that implicit or explicit racial stigmatisation is by far the dominant reason for the use of race talk. Hence, the best way to understand race talk is to look for the patterns of stigmatisation that underlie it.
So, clades not clusters and even clades don’t get us all that far, analytically speaking.
[Cross-posted from Thinking Out Aloud.]