lunes, 15 de octubre de 2012

Macro-haplogroup L (mtDNA)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Haplogroup L
Time of origin 151,600–233,600 YBP[1]
Place of origin Eastern Africa[2]
Descendants L0, L1-6
In human mitochondrial genetics, L is the mitochondrial DNA macro-haplogroup that is at the root of the human mtDNA phylogenetic tree. As such, it represents the most ancestral mitochondrial lineage of all currently living modern humans, and thus by definition the mitochondrial haplogroup carried by Mitochondrial Eve
It is believed to suggest an ultimate African origin of modern humans. Its major sub-clades include L0, L1, L2, L3, L4, L5 and L6, with all non-Africans exclusively descended from just haplogroup L3.
Haplogroup L3 descendants notwithstanding, the designation "haplogroup L" is typically used to designate the family of mtDNA clades that are most frequently found in Sub-Saharan Africa. However, all non-African haplogroups coalesce onto either haplogroup M or haplogroup N, and both these macrohaplogroups are simply sub-branches of haplogroup L3. Consequently, L in its broadest definition is really a paragroup containing all of modern humanity, and all human mitochondrial DNA from around the world are subclades of haplogroup L. Haplogroups M and N are sometimes referred to as haplogroups L3M and L3N respectively. Mitochondrial Eve is defined as the female human ancestor who is the most recent common ancestor of the most deep-rooted lineages of humanity: haplogroups L, L0 and L1-6.
An alternative theory maintains that it is incorrect to call mtDNA macrohaplogroup L a paragroup containing all of humanity; or to assert that the other macrohaplogroups M and N constitute subsets/subclades of L. Though M and N may have evolved from L, each macrohaplogroup—L, M, and N-- is definitively distinct, separate, and independent. In addition, the Haplogroup L Phylogeny diagram (below, right) incorrectly depicts macrohaplogroup M as a linear extension of haplogroup L3; M should instead be depicted as a sub-branch (like macrohaplogroup N) that is above and perpendicular to L3.[citation needed]
Haplogroup L phylogeny


















Studies of human mitochondrial (mt) DNA genomes demonstrate that the root of the human phylogenetic tree occurs in sub-Saharan Africa. The data suggest that Tanzanians have high genetic diversity and possess ancient mtDNA haplogroups, some of which are either rare or absent in other regions of Africa. A large and diverse human population has persisted in eastern Africa and that region may have been an ancient source of dispersion of modern humans both within and outside of Africa.[2]
Mitochondrial Eve is the ancestor of this macro-haplogroup and she is estimated to have lived approximately 190,000 years ago.[1]


Putting aside its sub-branches, haplogroups M and N, L haplogroups are predominant all over sub-Saharan Africa; L is at 96-100%, apart from spreading areas of Afroasiatic languages, where it is lower. Low frequencies are in North Africa, Arabian Peninsula, Middle East and Europe.

Sub-Saharan Africa

With the exception of a number of lineages that returned to Africa from Eurasia after the out of Africa migration, all Sub-Saharan African lineages belong to haplogroup L. The "back-to-Africa" haplogroups including U6, X1 and possibly M1 have returned to Africa possibly as far back as 45,000 years ago.[3] Haplogroup H, which is common among Berbers, is also believed to have entered Africa from Europe during the post post-glacial expansion.[4]
The mutations that are used to identify the basal lineages of haplogroup L, are ancient and may be 150,000 years old. The deep time depth of these lineages entails that substructure of this haplogroup within Africa is complex and, at present, poorly understood.[5] The first split within haplogroup L occurred 140-200kya, with the mutations that define macrohaplogroups L0 and L1-6. These two haplogroups are found throughout Africa at varying frequencies and thus exhibit an entangled pattern of mtDNA variation. However the distribution of some subclades of haplogroup L is structured around geographic or ethnic units. For example the deepest clades of haplogroup L0, L0d and L0k are almost exclusively restricted to the Khoisan of southern Africa. L0d has also been detected among the Sandawe of Tanzania, which suggests an ancient connection between the Khoisan and East African populations.[6]

Macro-haplogroup L (mtDNA) composition within Africa. Approximate frequencies in: 1. North Africa.[7][8] 2. Sudan.[8] 3. Ethiopia.[8][9] 4. West Africa.[7] 5. East Africa (Kenya, Uganda, Tanzania).[8][10][11] 6. Southeast Africa (Mozambique).[12] 7. Native Southern Africans (!Xung, !Kung and Khwe khoisans).[10][13] 8. Mbenga Pygmies (Baka, Bi-Aka and Ba-Kola).[10][14] 9. Ba-Mbuti Pygmies.[10] 10. Hadza/Sandawe.[10]

North Africa

Haplogroup L is also found at moderate frequencies in North Africa. For example, the various Berber populations have frequencies of haplogroup L lineages that range from 3% to 45%.[15][16] Haplogroup L has also been found at a small frequency of 2.2% in North African Jews from Morocco, Tunisia and Libya. Frequency was the highest in Libyan Jews 3.6%.[17]

West Asia

Haplogroup L is also found in West Asia at low to moderate frequencies, most notably in Yemen where frequencies as high as 60% have been reported.[18] It is also found at 15.50% in Bedouins from Israel, 13.68% in Palestinians, 12.55% in Jordan, 9.48% in Iraq, 9.15% in Syria, 6.66% in Saudi Arabia, 2.84% in Lebanon, 2.60% in Druzes from Israel, 2.44% in Kurds and 1.76% in Turks.[17][19][20]


L lineages are relatively infrequent (1% or less) throughout Europe with the exception of Iberia where frequencies as high as 22% have been reported and some regions of Italy where frequencies as high as 2 to 3% have been found. According to a study in 2012 by Cerezo et al., about 65% of the European L lineages most likely arrived in rather recent historical times (Romanization period, Arab conquest of the Iberian Peninsula and Sicily, Atlantic slave trade) and about 35% of L mtDNAs form European-specific subclades, revealing that there was gene flow from sub-Saharan Africa toward Europe as early as 11,000 yr ago.[21]
In Iberia the frequency is higher in Portugal (5.83%) than in Spain where frequencies of (1.61%). Furthermore, in western Iberia, increasing frequencies are observed for Galicia (3.26%) and northern Portugal (3.21%), through the center (5.02%) and to the south of Portugal (11.38%).[22] Relatively high frequencies of 7.40% and 8.30% was also reported respectively in South Iberia (Spain and Portugal) and in the present population of Priego de Cordoba by Casas et al. 2006.[23] Significant frequencies were also found in the Autonomous regions of Portugal, with L haplogroups constituting about 13% of the lineages in Madeira and 3.4% in the Azores. In the Spanish archipelago of Canary Islands, frequencies have been reported at 6.6%.[24] According to some researchers L lineages in Iberia are associated to Islamic invasions, while for others it may be due to more ancient processes as well as more recent ones through the introduction of these lineages by means of the modern slave trade.The highest frequencies of Sub-Saharan lineages found so far in Europe were observed by Alvarez et al. 2010 in the comarca of Sayago (18.2%) which is according to the authors "comparable to that described for the South of Portugal"[25][26] and by Pereira et al. 2010 in Alcacer do Sal (22%).[27]
In Italy, Haplogroup L lineages are present in some regions at frequencies between 2 and 3% in Latium (2.90%), Tuscany,[19] Basilicata and Sicily.[28]

The Americas

Haplogroup L lineages are found in the African diaspora of the Americas. Haplogroup L lineages are predominant among African Americans, Afro-Caribbeans and Afro-Latin-Americans. In Brazil, Pena et al. report that 85% of self-identified Afro-Brazilians have Haplogroup L mtDNA sequences.[29] Haplogroup L lineages are also found at moderate frequencies in self-identified White Brazilians. Alves Silva reports that 28% of a sample of White Brazilians belong to haplogroup L.[30] In Argentina, a minor contribution of African lineages was observed throughout the country.[31] Haplogroup L lineages were also reported at 8% in Colombia,[32] and at 4.50% in North-Central Mexico.[33] In North America, haplogroup L lineages were reported at a frequency of 0.90% in White Americans of European ancestry.[34]

Frequencies (> 1%)

Region Population or Country Number tested Reference %
East Africa Somalis 26 Watson et al. (1997) 50.00%
East Africa Sudan 112 Afonso et al. (2008) 72.50%
East Africa Ethiopia 270 Kivisild et al. (2004) 52.20%
North Africa Libya (Jews) 83 Behar et al. (2008) 3.60%
North Africa Tunisia (Jews) 37 Behar et al. (2008) 2.20%
North Africa Morocco (Jews) 149 Behar et al. (2008) 1.34%
North Africa Tunisia 64 Turchi et al. (2009) 48.40%
North Africa Tunisia (Takrouna) 33 Frigi et al. (2006) 3.03%
North Africa Tunisia (Zriba) 50 Turchi et al. (2009) 8.00%
North Africa Morocco 56 Turchi et al. (2009) 25.00%
North Africa Morocco (Berbers) 64 Turchi et al. (2009) 3.20%
North Africa Algeria (Mozabites) 85 Turchi et al. (2009) 12.90%
North Africa Algeria 47 Turchi et al. (2009) 20.70%
Europe Italy (Latium) 138 Achilli et al. (2007) 2.90%
Europe Italy (Volterra) 114 Achilli et al. (2007) 2.60%
Europe Italy (Basilicata) 92 Ottoni et al. (2009) 2.20%
Europe Italy (Sicily) 154 Ottoni et al. (2009) 2.00%
Europe Spain 312 Alvarez et al. (2007) 2.90%
Europe Spain (Galicia) 92 Pereira et al. (2005) 3.30%
Europe Spain (North East) 118 Pereira et al. (2005) 2.54%
Europe Spain (Priego de Cordoba) 108 Casas et al. (2006) 8.30%
Europe Spain (Zamora) 214 Alvarez et al. (2010) 4.70%
Europe Spain (Sayago) 33 Alvarez et al. (2010) 18.18%
Europe Spain (Catalonia) 101 Alvarez-Iglesias et al. (2009) 2.97%
Europe South Iberia 310 Casas et al. (2006) 7.40%
Europe Spain (Canaries) 300 Brehm et al. (2003) 6.60%
Europe Spain (Balearic Islands) 231 Picornell et al. (2005) 2.20%
Europe Portugal 594 Achilli et al. (2007) 6.90%
Europe Portugal (North) 188 Achilli et al. (2007) 3.19%
Europe Portugal (Central) 203 Achilli et al. (2007) 6.40%
Europe Portugal (South) 203 Achilli et al. (2007) 10.84%
Europe Portugal 549 Pereira et al. (2005) 5.83%
Europe Portugal (North) 187 Pereira et al. (2005) 3.21%
Europe Portugal (Central) 239 Pereira et al. (2005) 5.02%
Europe Portugal (South) 123 Pereira et al. (2005) 11.38%
Europe Portugal (Madeira) 155 Brehm et al. (2003) 12.90%
Europe Portugal (Açores) 179 Brehm et al. (2003) 3.40%
Europe Portugal (Alcacer do Sal) 50 Pereira et al. (2010) 22.00%
Europe Portugal (Coruche) 160 Pereira et al. (2010) 8.70%
Europe Portugal (Pias) 75 Pereira et al. (2010) 3.90%
West Asia Yemen 115 Kivisild et al. (2004) 45.70%
West Asia Yemen (Jews) 119 Behar et al. (2008) 16.81%
West Asia Bedouins (Israel) 58 Behar et al. (2008) 15.50%
West Asia Palestinians (Israel) 117 Achilli et al. (2007) 13.68%
West Asia Jordania 494 Achilli et al. (2007) 12.50%
West Asia Iraq 116 Achilli et al. (2007) 9.48%
West Asia Syria 328 Achilli et al. (2007) 9.15%
West Asia Saudi Arabia 120 Abu-Amero et al. (2007) 6.66%
West Asia Lebanon 176 Achilli et al. (2007) 2.84%
West Asia Druzes (Israel) 77 Behar et al. (2008) 2.60%
West Asia Kurds 82 Achilli et al. (2007) 2.44%
West Asia Turkey 340 Achilli et al. (2007) 1.76%
South America Colombia (Antioquia) 113 Bedoya et al. (2006) 8.00%
North America Mexico (North-Central) 223 Green et al. (2000) 4.50%
South America Argentina 246 Corach et al. (2009) 2.03%

See also

The African Diaspora: Mitochondrial DNA and the Atlantic Slave Trade


Between the 15th and 19th centuries ad, the Atlantic slave trade resulted in the forced movement of ∼13 million people from Africa, mainly to the Americas. Only ∼11 million survived the passage, and many more died in the early years of captivity. We have studied 481 mitochondrial DNAs (mtDNAs) of recent African ancestry in the Americas and in Eurasia, in an attempt to trace them back to particular regions of Africa. Our results show that mtDNAs in America and Eurasia can, in many cases, be traced to broad geographical regions within Africa, largely in accordance with historical evidence, and raise the possibility that a greater resolution may be possible in the future. However, they also indicate that, at least for the moment, considerable caution is warranted when assessing claims to be able to trace the ancestry of particular lineages to a particular locality within modern-day Africa.


Slavery has been a feature of human societies since antiquity, but the scale of the Atlantic slave trade was unprecedented. This African diaspora was the result of the enslavement in the Americas of probably ∼11 million people, and at least 2 million more died during the middle passage. The traders spread from northwestern Africa around the Atlantic coast, reaching Angola by the 17th century and Mozambique by the 19th. Historical records suggest that western Africa contributed ∼8 million people, west-central Africa (from Cameroon down to Angola) ∼4 million, and Mozambique/Madagascar in the southeast ∼1 million more, although agreement on details is lacking (Curtin 1969; Fage 1969; Thomas 1998). The scale of the displacements and the European prejudice against African populations meant that, rather than the assimilation of the imported populations, as had often been the case in antiquity (Lewis 1990), the result was often the creation of new and distinctive populations in the Americas (Sans 2000).
African mtDNAs fall into several distinctive paragroups (paraphyletic clusters of mtDNA lineages) and monophyletic haplogroups. The nomenclature has been amended following the suggestion of Mishmar et al. (2003), so that L1a, L1d, L1f, and L1k are here renamed L0a, L0d, L0f, and L0k, respectively. These haplogroups, along with L1b’c, L1e, L2, and L3A, evolved in sub-Saharan Africa (Chen et al. 1995, 2000; Watson et al. 1997), whereas U6 is thought to have evolved in northern Africa (Rando et al. 1998; Macaulay et al. 1999). Each of these major clusters can be divided on the basis of various diagnostic mutations into subhaplogroups (Torroni et al. 2001; Salas et al. 2002), which can be used as markers of recent migrations when found in other parts of the world.
Alves-Silva et al. (2000), for example, found that 28% of a sample of mainly “white” subjects from Brazil were of recent African maternal ancestry, with substantial variation from region to region. Some lineages could be readily attributed to arrivals from western Africa, which had already been extensively sampled, but almost 50% belonged to haplogroups L1c and L3e, which are rare in western Africa. Their presence at much higher frequencies in small west-central African data sets suggested that the Brazilian L1c and L3e mtDNAs might be of largely west-central African origin (see also Bandelt et al. 2001). Indeed, major sources for Brazilian slaves are thought, on the basis of historical records, to have been Congo and Angola (e.g., Curtin 1969; Thomas 1998). Alves-Silva et al. (2000) predicted that, when these regions were sampled for mtDNA variation, L1c and L3e would be found at high frequencies.
Brehm et al. (2002) identified mtDNA lineages transported proximally from the western Atlantic coast to the Cabo Verde islands, many en route to the Caribbean and Brazil, and Pereira et al. (2001) initiated work on the slave trade from southeastern Africa. They discovered numerous matches between Mozambican mtDNAs and those of America and, to a lesser extent, those of recent African ancestry in Europe (see also Salas et al. [2002]).
There have also been several attempts, using autosomal, mtDNA, and Y-chromosomal markers, to address the question of the extent of recent African versus European and Native American ancestry in American populations (Chakraborty et al. 1992; Parra et al. 1998, 2001; Mesa et al. 2000). However, attempts have not hitherto been made to trace American ancestry to particular regions within Africa. The accumulation of mtDNA data from both African populations and American populations with putative African ancestry has now reached the point where this can be pursued. In this study, we synthesize this body of evidence and attempt to quantify the contributions of the major subregions of Africa to mtDNAs in the Americas and Eurasia.

Material and Methods

Population Samples

We compiled a database from 481 individuals carrying African mtDNAs belonging to seven available American samples harboring a major African component. The sample from North America included 101 African Americans from the United States (HvrBase database; for details, see Handt et al. 1998). The sample from Central America/Caribbean included 8 from Mexico (Green et al. 2000), 25 Carib from Belize (Monsalve and Hagelberg 1997), 112 from the Dominican Republic (Torroni et al. 2001; A.T., unpublished data), 41 Chocó from Columbia, and 37 Garífuna from Panama and Belize (A. Salas, M. Richards, M.-V. Lareu, S. Silva, M. Matamoros, V. Macaulay, and A. Carracedo, unpublished data). The sample from South America included 157 Brazilians: 29 from the study by Bortolini et al. (1997), 68 from the study by Alves-Silva et al. (2000), and 60 from the study by Santos et al. (2002). The northern African sample (650 total) included 30 Mauritanians, 25 Saharans, 60 Berbers from Morocco, and 32 Moroccans (Rando et al. 1998); 50 Moroccans from the Souss Valley (Brakez et al. 2001); 68 Egyptians (Krings et al. 1999); 85 Algerian Berbers (Côrte-Real et al. 1996); and 300 Canarians (246 from the study by Rando et al. [1998] and 54 from the study by Pinto et al. [1996]).

The western African sample (694 total) included 20 Hausa, 14 Kanuri, 60 Fulbe, 10 Songhai, 23 Tuareg, and 21 Yoruba (Watson et al. 1996); 14 further Yoruba (Vigilant et al. 1991); 23 Serer, 50 Senegalese, and 48 Wolof (Rando et al. 1998); 119 Mandenka (Graven et al. 1995); and 292 individuals from the Cabo Verde islands (Brehm et al. 2002). The west-central African sample (179 total) included 50 individuals from the islands of São Tomé and Príncipe and 45 Bubi from the island of Bioko (Mateu et al. 1997); 11 Fang from Guinea (Pinto et al. 1996); 43 Angolans (Santos et al. 2002); 17 Biaka and 13 Mbuti (Vigilant et al. 1991). The eastern African sample (335 total) included 37 Turkana, 27 Somali, and 25 Kikuyu (Watson et al. 1996); 96 Nubians and 76 Sudanese (Krings et al. 1999); and 74 Ethiopians (Thomas et al. 2002). The southeastern African sample included 417 individuals from Mozambique (Pereira et al. 2001; Salas et al. 2002). The southern African sample (99 total) included 25 !Kung and 31 Khwe (Chen et al. 2000); and 43 further !Kung (Vigilant et al. 1991).

We also surveyed the mtDNA first hypervariable segment (HVS-I) of ∼15,000 individuals from Eurasia and found 113 mtDNAs of recent African ancestry. We excluded another 28 sequences from the database, possibly belonging to L3 but not characterized by any diagnostic sites in our data.
All sequences correspond to HVS-I of the control region; the nucleotide positions considered included at least positions 16090–16365, according to the numbering system of the Cambridge Reference Sequence (Andrews et al. 1999). For some purposes, we also used HVS-I information from outside the minimum segment, as well as RFLP and HVS-II data. Length variation was excluded. Transitions are indicated by the nucleotide position minus 16,000, and transversions are indicated by a suffix.

Phylogeographic Analysis

The haplogroup classification of mtDNA sequences was updated from Salas et al. (2002); see table 1 for HVS-I motifs (cf. Chen et al. 1995; Watson et al. 1997; Rando et al. 1998; Quintana-Murci et al. 1999; Alves-Silva et al. 2000; Bandelt et al. 2001; Pereira et al. 2001; Torroni et al. 2001). Richards et al. (2003) discuss mtDNA haplogroups that may be regarded as of recent African ancestry.
Table 1
HVS-I Sequence Motifs Used for Haplogroup Classification
We constructed phylogenetic networks (Bandelt et al. 1995, 1999) by hand, on the basis of the information from Salas et al. (2002), with the aid of the Network 3.0 package (Fluxus Engineering Web site). We performed principal component (PC) analysis based on haplogroup frequencies, broken down as follows: U6, M1, L0a, L0d/k, L1b, L1c, L2a, L2b, L2c, L3b, L3d, L3e, L1/L2 remainder, L3A remainder, and non-African haplogroups. The entire data sets were used in the case of Africans, whereas only mtDNAs of recent African ancestry were used in the case of Americans. We excluded some outliers from the PC analysis, as in Salas et al. (2002), including the Khoisan samples, with their high levels of L1d/L1k; some northern African populations, with moderate levels of U6; and the Biaka, with their very high frequency of L1c. However, it should be noted that three of the L1c HVS-I sequence types in the Biaka sample match American types (see fig. 3). We also excluded the Mexican sample because of its small size.
Figure  3
Phylogenetic network of shared mtDNA HVS-I sequence types of recent African ancestry in America, Eurasia, and Africa. The charts show: (a) the proportion in each main region of those specific haplotypes shared by >15 individuals (yellow frame ...
We estimated sequence diversity as equation M1, where pi is the frequency of each of the k different sequences in the sample and n is the sample size, and we evaluated the mean pairwise nucleotide difference, π, as well as the number of segregating sites, S, through use of Arlequin 2.0 (Arlequin's Home on the Web). We also computed haplotypic and segregation site mutation-drift statistics (θk and θS, respectively) (Helgason et al. 2003).

Admixture Analysis

To quantify the magnitude of the impact of each African region on cluster frequencies in the Americas, we fitted the following model. The number of mtDNAs in each cluster in the sample from a region of the Americas (ni:1[less-than-or-eq, slant]i[less-than-or-eq, slant]C, the number of clusters) was assumed to be a draw from a multinomial distribution with parameters equation M2, the sample size in the American region, and equation M3 (1[less-than-or-eq, slant]i[less-than-or-eq, slant]C), where R is the number of source regions in Africa, fji is the frequency of the ith cluster in the jth source region (assumed to be known), and the aj are the admixture coefficients. This model describes samples from an urn with C different kinds of ball, where the urn has been created by mixing together R other urns in proportions given by the admixture coefficients. We chose to analyze this model in a Bayesian framework, which meant that we had to explore the distribution of the admixture coefficients, given the data. The prior distribution of the admixture coefficients was taken to be uninformative—namely, uniform on equation M4. The posterior distribution of aj was explored with the Metropolis-Hastings algorithm, using a simple proposal, and was summarized by the posterior mean of each aj and its root-mean-square deviation about the mean.
To assess model fit, we examined plots of standardized residuals. The analysis was first performed with all African regions present, and then, to test robustness, was repeated with those regions removed whose admixture coefficients were within 2 SDs of 0.


Data Summaries and Demography

Genetic diversity indices are shown in table 2. The haplotype mutation-drift statistic (θk) and haplotype diversity (H) show a distinctive pattern that is less clearly reflected in the segregating sites mutation-drift statistic (θS) or the mean number of pairwise differences (π). θk and H are similar, with eastern Africa having the highest values and southern Africa the lowest, and with minor variations in ranking. They suggest that founder effects were not important during the formation of slave communities of the Americas but that reductions in population size may have subsequently affected populations in Central America.
Table 2
HVS-I Diversity Indices in the African mtDNAs from North, Central, And South American Population Samples and in Africans from Different Regions in Africa
The reduction in diversity for Central American lineages is also indicated by the estimator π, but it is not reflected in this statistic for southeastern Africa (where founder effects associated with Bantu dispersals have evidently elevated the frequencies of some types in several haplogroups [Salas et al. 2002]). This is likely to be because, as a measure of pairwise nucleotide differences, π exaggerates the effect of the genetic divergence between haplogroups that may survive even after a substantial reduction in haplotype diversity; thus, π is a poor indicator of population size reduction (Helgason et al. 2003). This is evident in southeastern Africa—for example, between L0a and L3e—and is even more pronounced in west-central Africa, where, for example, L0a, L1c, and L3e are all significant. Since the haplogroup compositions of North, Central, and South America are very similar, the reduction in diversity in Central America becomes evident even with this statistic. The mutation-drift parameter based on segregating sites (θS), by contrast, shows a reduction in southeastern Africa but is not sensitive to the effect in Central America. In general, it therefore appears that neither of these latter two statistics is consistently sensitive to demographic processes.

Haplogroup Frequency Profiles

Haplogroup frequency profiles for the regions studied are shown in figure 1, and a scatterplot of the first two principal components is shown in figure 2. The first principal component (PC1), accounting for 30% of the variance, separates the populations into a spectrum with northern Africans at one pole, followed by northeast Africans, eastern Africans, western Africans, and west-central Africans, with Bioko at the opposite pole. Southeastern Africans and Angolans sit tightly together within the west-central African cluster. The American samples also cluster fairly tightly together in PC1, with the Brazilian and several of the Central American samples localized very closely with the southeastern Africans, whereas the African American, São Tomé, and Caribbean samples are positioned closer to the western Africans.
Figure  1
Frequency profiles of the major haplogroups in Africa and haplogroups of recent African ancestry in the Americas and Eurasia
Figure  2
Scatterplot of the first two principal components based on haplogroup frequencies. For practical reasons, the “Other-L3” and “Other non Afr” haplogroup categories, with coordinates (0.0171, 0.4210) and (0.6680, 0.0768), ...
PC2 accounts for 19% of the variance and shows an approximately east-west gradient, with northern Africans, west-central Africans, and southeastern Africans all positioned towards the eastern fringe of the western African variation. The Caribbean sample sits with the southeastern Africans, whereas the other American samples are positioned more towards the western pole. The African Americans, Brazilians, and mainland Central Americans cluster tightly together and with the São Toméan and Dominican samples, further towards the western pole.

Admixture Analysis

To quantify the proportion of ancestry from each of the major regions of Africa present in the African component of the modern American mtDNA pool, we fitted an admixture model of the haplogroup profiles in the African regions to the profiles in the Americas (table 3). The estimated admixture coefficients for all regions other than western and west-central Africa were within 2 SDs of 0, and so we repeated the analysis with a more parsimonious model by suppressing those unsupported African regions (table 4).
Table 3
Estimated Admixture Coefficients for American Samples, in Terms of the Source African Regions
Table 4
Estimated Admixture Coefficients, when Unsupported Source Regions Are Excluded
The results for the Americas overall accord broadly with the picture built up by historians, in attributing major contributions from both western and west-central Africa. The proportion of western African ancestry is not significantly greater than that of west-central Africa, and a significant southeastern African component is not detected at this level of analysis.
Again in agreement with the historical evidence—and confirming the indications from the frequency profiles—the largest component in both North and Central America appears to derive from western Africa and to be somewhat greater in Central than North America.

The southeast also seems to have left a minor signature, although the error margins are wide. South America is, however, problematic. A higher—and possibly a majority—contribution from west-central Africa is indicated, but the admixture model provides a poor description of the data, as judged from a plot of standardized residuals (not shown). For example, the frequency of L3e in South America is larger than in any of the potential source regions, an observation incompatible with the admixture model. (It is most closely approached by the frequency of L3e in Bioko, which, however, may be an unrepresentative sample, since it appears to have been subject to extensive drift [θk=8.4].) Several other population samples that contribute to our west-central African sample are unlikely slave sources and have also been subject to heavy drift, in particular the Mbuti (θk=2.5). This suggests that a major source region for South America remains unsampled; more data will be necessary to make further progress here. A further sampling issue arises because South America is represented only by Brazil (although this country bears the largest community of African descendants in America). The existing data suggest that Brazilian slaves came largely from west-central Africa rather than western Africa, but data from other parts of South America will be needed to generalize this observation.

Haplogroup Composition of American mtDNAs of African Ancestry

Frequencies of mtDNA lineages in Africa are shown in figure 1, alongside mtDNA lineages of African ancestry in North, Central, and South Americans (cf. Salas et al. 2002). A network of HVS-I sequence types shared between Africans, Americans, and Eurasians is shown in figure 3 (excluding U6 and one unclassified American L3* type, which matches a western African mtDNA). Eighty-eight mtDNA types from these clades are shared between the African samples and America. Although this is only 36% of the total different American sequence types of African origin in our database, it represents more than half (54%) of the total sample size of individuals, since a number of types are present in more than one individual in the database.
Remember that sequence types of Eurasian ancestry (within haplogroups M and N) are also excluded. However, in agreement with the historical evidence, the contribution of northern Africa (which includes many western Eurasian sequence types [Rando et al. 1998, 1999; Macaulay et al. 1999]) to the American gene pool appears to be very minor. This is indicated by the low frequency of the indigenous northern African haplogroup U6 in Americans: this haplogroup is found in <2% of American lineages of African origin, which is lower than the frequency of U6 in western Africa, the likely source for most American U6 mtDNAs.
Most, but not all, of the known African subhaplogroups within L0–L3A are present in America. Overall, the continental frequency profiles of the Americas, especially those of North and Central America, are strikingly similar to those of western and west-central Africa (fig. 1), confirming that these were the major source regions for American mtDNA lineages of African ancestry. South America has a markedly higher proportion of L3e and L0a lineages, which likely reflects a relatively higher contribution of both west-central and southeastern African mtDNAs.
The great majority of haplogroup L0a mtDNAs, concentrated in South America, belong to subclusters L0a1 and L0a2, both of which are typically southeastern African. All American L0a types are shared with individuals from Mozambique, where two are elevated to extremely high frequencies (fig. 3) (Pereira et al. 2001; Salas et al. 2002). Several of the southeastern African L0a founder types and a number of derived types are also found in the Angolan sample, indicating substantial commonality between the more southerly Eastern and Western Bantu-speaking communities and pointing also to a possible west-central African contribution, especially to South America.
In contrast, American L1b sequences are most often shared with samples from western Africa, where L1b is concentrated (fig. 3; see also fig. 4 of Salas et al. 2002). Shared L1b types are also found in northern Africa; however, in all cases except for one, these also match western Africans, and in the one exception there is also a match with southeastern Africa. Haplogroup L1c, which is likely to be of west-central African origin, is the African mtDNA clade with the most unmatched representatives in America, particularly in South America—only ∼17% of American L1c types are shared with Africans. The Angolan sample, however, carries L1c at a frequency of ∼21%, which confirms the prediction of Alves-Silva et al. (2000) based on the pattern of L1c lineages in Brazil, where the frequency among mtDNAs of recent African ancestry is ∼19%. One L1c3 type matches eight individuals from Bioko (mislabeled as “Bi” in figure 5 of Salas et al. 2002). However, it also matches a number of individuals in Mozambique, suggesting a possible origin in west-central Africa for a lineage that was probably dispersed by the Bantu expansions (Salas et al. 2002). This is reinforced by the discovery of a derivative of this type in Angola. L1c is rare in Mozambique (∼5%), suggesting that it may be a “west-central Bantu” marker that has been brought to southeastern Africa by interaction with western Bantu-speaking communities.
Haplogroup L2a is the most common and widely distributed sub-Saharan African haplogroup and is also frequent in the Americas (∼19%). The wide distribution of L2a in Africa makes identifying geographical origins of lineages difficult. Nevertheless, almost all of the American L2a types match with or are one-step derivatives of some western African L2a type. Several common and widespread American types are shared with eastern Africa as well, but none are shared uniquely with eastern Africans alone. A number are also shared with southeastern Africans; again, however, all of these types are also present in western Africa. A largely western African provenance, with a possible minor southeastern African contribution, seems to be the simplest explanation for this pattern. The much less frequent subhaplogroups—L2b, L2c, and L2d—are not found in eastern Africa and are again most likely to be of largely western African provenance in the Americas, although two American types are shared only with southeastern Africans and may, therefore, have their immediate origin there.
L3A types in the Americas are, again, probably of largely western African provenance, with lesser west-central and southeastern African components. L3b and L3d mtDNAs are mainly western African, with a few types shared with eastern, southeastern (and occasionally southern, through interaction between Bantu-speaking and Khoisan-speaking groups), and (for L3b) west-central Africans. The same is true of L3e2, which includes one of the most common and widespread American mtDNA types with a number of derivatives observed only in America, as well as the much less frequent L3e4. One American L3e3 type is shared with the Bubi of Bioko but is more common in Mozambique, with a one-step derived type in the Angolan sample; thus, it most likely derives from west-central or southeastern Africa. Other L3e types are also found in Bioko and neighboring São Tomé, as well as in Angola and Mozambique. American L3e1 lineages, however, largely match southeastern Africans, implying a likely origin either there or somewhere on the west-central coast of Africa, between Cameroon and Angola.
The proportion of L3e in the Angolan sample is 14%, of which 5/6 are members of L3e1, the subclade that is also particularly frequent in Mozambique, where the frequency is similar (Salas et al. 2002). On the basis of the phylogeographic distribution (see fig. 9 of Salas et al. 2002), L3e2 appears to be more northern and to have spread into western Africa, where the derived subclade L3e2b predominates, possibly accounting for its greater prevalence in the Caribbean. The minor subclade L3e4 seems to have a similar distribution. The wide distribution of L3e1 and L3e3 is likely to be due mainly to Bantu dispersals (Salas et al. 2002). However, the 18th-century transfer of slaves from Mozambique and Angola to work in the sugar plantations of São Tomé and probably also Bioko (Iliffe 1995) and their subsequent transfer to the Americas may also help to account for the sharing of mtDNA types across this wide area.
A few haplogroup L3f types in America match those of eastern Africans—again, however, never uniquely; they are also shared by western Africans, southeastern Africans, or both. At the same time, several L3f types are shared uniquely by western Africans only. L3f is likely of eastern African origin (Salas et al. 2002), but the derived subhaplogroup L3f1 is also present in western Africa, and it is this component that is most commonly found in Americans.
Some mtDNA clades are notable for their absence (or virtual absence) in the Americas. Neither of the Khoisan-specific haplogroups, L1d and L1k, are found in America, nor are the eastern African haplogroups, L1e and L1f. This supports the historical view that neither Khoisan-speaking populations nor eastern Africa contributed significantly to the Atlantic slave trade. However, the eastern African haplogroup L3g is found in several American individuals, implying either a small eastern African influence (Thomas 1998, p. 706), more recent immigration from Africa into America, or hitherto undetected gene flow into western or southeastern Africa and thence to the Americas. There is indeed evidence for low levels of transcontinental gene flow in the modern mtDNA patterns of Africa (for example, “accidental” or “erratic” lineages of western African origin in eastern and southeastern Africa [Bandelt et al. 2001; Salas et al. 2002]). Eastern African L3f types, for example, may also have had an immediate origin in the eastern African component of west-central African Bantu-speaking populations.
Some of the American U6 types (those belonging to U6b1) are most likely of Canarian ancestry and not a direct consequence of the Atlantic slave trade. They probably reflect the genetic admixture of Spanish men with Guanches women after the Spanish colonization of the Canary Islands and the subsequent movements of these admixed populations to the Caribbean islands during the process of Spanish colonization of the region.

African mtDNAs in Eurasia

Eurasian matches to African mtDNAs are also indicated in figure 3, representing <1% of Eurasian mtDNAs. More than half of Eurasian U6 lineages occur in western Iberia (Salas et al. 1998; Pereira et al. 2000), and most of the remainder occur in southwest Asia. Both Portugal and the Near East are regions with known historical gene flow from northern Africa, but both were also centers for the importation of slaves. Eurasian L0–L3A are similarly concentrated in the Near East and Iberia. Eurasian L0a types match or are one-step derivatives of eastern African L0a types and are mainly Near Eastern or southern European. The L1b types can be readily derived from western or northern Africa and are similarly distributed. As in America, it is difficult to assign a provenance to L2a types in Eurasia; two types have a likely southeastern African origin, whereas the remainder could be from either eastern or western Africa.

The precise ancestry of Eurasian L3b types is similarly difficult to trace. The eastern African–specific L3f is found mainly in the Near East and southern Europe.
African types in Eurasia, unlike those in America, can therefore be attributed to gene flow from both eastern Africa—perhaps partly via the Arab slave trade (Richards et al. 2003)—and western and southeastern Africa, more likely as a result of the Atlantic slave trade. This latter scenario is made more probable by the surprisingly high proportion of shared types across all three continents, an observation made more remarkable by the fact that the shared types are not necessarily particularly common ones in Africa.

A contribution from the medieval Arab/Berber conquest of Iberia and Sicily is also possible (Semino et al. 1989; Côrte-Real et al. 1996). All of the Iberian L0–L3A types are shared with either western or southeastern Africans, so that it is possible that all of them might be accounted for by the Atlantic slave trade (Pereira et al. 2000). More than a quarter of these are, however, also shared with northern Africans, suggesting that a proportion may alternatively be attributed to the Arab/Berber conquest, especially since the northern African sample size for these types is much smaller than that for western Africa. The ratio of the frequencies of haplogroup U6 to L0–L3A in Iberia (35%) seems too small to be attributed solely to northern African immigration but too large to be accounted for by western African immigration alone (P=.0013 and P<10-5, respectively, in a one-tailed Fisher's exact test). Therefore, it seems unlikely that the impact of mtDNAs of recent African ancestry on Iberia was the result of a single process; indeed, earlier gene flow may also have contributed some of the lineages.


In this study, we have identified mtDNA lineages with a recent African ancestry in contemporary American populations. Of course, all modern human mtDNAs have what is often described in the literature as “recent African ancestry.” That is to say, Eurasian mtDNAs originated in a founder effect and dispersal of haplogroup L3 lineages (rapidly differentiating into Eurasian haplogroups M and N) from sub-Saharan Africa that probably took place ∼60,000–80,000 years ago (Watson et al.1997; Quintana-Murci et al. 1999).

 In this article, however, we define “recent African ancestry” as rather the result of gene flow within historical times; in the case of the Americas, this gene flow is largely (but not necessarily exclusively) due to the effects of the Atlantic slave trade.
A recent summary of the historical literature suggests that ∼62% of slaves came from western Africa to America (∼8 million slaves), ∼30% from west-central Africa (4 million), and ∼8% from southeastern Africa (1 million) (Thomas 1998, p. 806).

The North American and Central American source regions are thought likely to have been mainly in western Africa, supplemented substantially in South America by sources in west-central Africa and, to some extent, southeastern Africa.
The mtDNA composition in America, although indicating only the female line of descent, broadly corroborates historical research. In particular, and unsurprisingly, our admixture estimates, PC plots, and phylogenetic networks all stress the overwhelming impact of western and west-central Africa on the composition of American mtDNAs with recent African ancestry, with a likely small southeastern African component.

Western Africa appears to have been the most important source for North and Central America, although, perhaps surprisingly, North America appears to harbor the larger (but probably still a minority) west-central African component. The results from South America are more problematic, but the likely picture appears to be of a large, possibly majority west-central African contribution, with a substantial western African component as well, at least for Brazil. However, the west-central African contribution most likely derives largely from an area that so far has not been sampled for mtDNA variation, such as the Congo basin. A contribution in Brazil from southeastern Africa also seems likely, as suggested by the high level of matching between Brazilian and Mozambican lineages within L3e1 (fig. 3).
The contributions of northern, eastern, and southern Africa to the American mtDNA gene pool appear to be very small, which, again, is in good agreement with the historical picture. Eastern Africa is, by contrast, an important donor of mtDNAs to the Near East. This may be the result, at least in part, of more ancient movements of slaves, such as the Arab trade through Red Sea and Indian Ocean ports (Richards et al. 2003). However, there are also discernible western and southeastern African components to the (relatively few) mtDNAs of recent African ancestry within Europe, which are likely to be mainly attributable to the more recent Atlantic trade. Portuguese western, southwestern, and southeastern Africa were the main sources for the Atlantic slave trade to Europe (Thomas 1998, p. 805). A striking finding of this study is the high number of three-way sequence matches between African, American, and European mtDNAs.

Almost all of these are of likely western African (such as those in L1b) or southeastern African (such as those in L0a) origin. These matches are particularly prevalent in Portugal, which was indeed the principal destination for slaves within Europe. Nevertheless, the composition of Iberian mtDNAs of recent African ancestry suggests that other processes, such as the medieval Arab/Berber conquest, must also have been influential.
Overall, these results show that mtDNAs in America and Eurasia can, in many cases, be traced to broad geographical regions within Africa. This raises the possibility that a greater resolution may be possible in the near future. However, there are many difficulties with such an endeavor. A major problem at the moment is the poor sampling coverage in some parts of the continent, especially central Africa. Nevertheless, even when this is resolved, there are many features of African mtDNA variation that will persist to confound such an undertaking. In the first place, some of the major African mtDNAs, such as haplogroup L2a, seem to have been widely distributed within the continent in prehistoric times, and individual mtDNA types are often difficult to localize geographically.

Another important factor is the effect of the Bantu dispersals, which resulted in a fairly small number of mtDNA types from various haplogroups being widely dispersed quite recently throughout subtropical Africa. Further confounding influences for phylogeographic analyses result from the large-scale and widespread movements of people within the continent itself during the period at which the slave trade was in operation. It is likely that improved phylogenetic resolution, with the aid of more complete mtDNA sequences, will increase the phylogeographic resolution to some extent, but the problems of recent widespread movements will remain. The implication is that, although great progress can be made towards understanding the broad pattern of human dispersals using mtDNA, tracking the lineages of individuals is fraught with problems. Considerable caution is warranted when assessing claims of being able to trace the ancestry of certain American or European lineages to a particular region or population within modern-day Africa.


We thank James Walvin for a critical reading of the text. This work was supported by Ministerio de Ciencia y Tecnología grant DGCYT-P4. BIO2000-0145-P4-02 and by Ministerio de Sanidad y Consumo (Fondo de Investigación Sanitaria, Instituto de Salud Carlos III) grant PI030893; SCO/3425/2002. Financial support was also provided by the Italian Ministry of the University (Progetti Ricerca Interesse Nazionale 2002 and 2003) (to A.T., R.S., and A.C.), Progetto CNR-MIUR Genomica Funzionale-Legge 449/97 (to A.T.), Grandi Progetti di Ateneo (to R.S.), and the Instituto Pasteur Fondazione Cenci Bolognetti (to R.S.). A.S. is supported by the Isidro Parga Pondal program (Xunta de Galicia). V.M. was partly supported by a Research Career Development Fellowship from the Wellcome Trust.

Electronic-Database Information

The URLs for data presented herein are as follows:
Arlequin's Home on the Web, (for Arlequin 2.0: a software for population genetic data analysis)
Fluxus Engineering, (for Network 3.0: software for median network constructions)