Characterization of human ancestry has been of interest for decades as information about population structure can provide novel insight into the human past and remains an important topic in the rapidly evolving biomedical field. For example, because genetic variants conferring risk to a particular disease may be geographically restricted because of evolutionary forces such as mutation, genetic drift, migration and natural selection, the assessment of the genetic background in individuals chosen for a study is crucial in genetic epidemiology .
While still a topic of controversy , there is ample evidence that self-reported race, as for example used in the US Census, can predict ancestral clusters in a population sample. However, it does not completely inform on how genetic variation is apportioned within and between racial groups, nor does information on race reveal the extent of admixture [2, 3].
Especially in the context of mapping disease genes, more objective and accurate methods of defining homogenous populations for the investigation of specific population-disease associations are required. This is not only paramount for specific mapping approaches such as admixture mapping , but has also been recognized as a crucial prerequisite for genetic association studies, as the presence of undetected population structure can lead to both false-positive results and failures to detect genuine associations . Furthermore, it has been shown that the consequences of population structure on association outcomes increase markedly with sample size, and even modest levels of population structure within population groups cannot safely be ignored in the large studies needed to detect typical genetic effects in common diseases .
In order to assess genetic background diversity, a large number of ancestry informative marker (AIM) panels have been developed for particular applications. Genome-wide panels for admixture mapping have been developed for Hispanic populations , African Americans  or three-way admixture in the Americas , and smaller AIM panels have been designed to discern ancestry at either the global level [10–12] or within specific populations such as the Native and Mexican Americans [13–15], Europeans [16–20] or African Americans [21, 22]. In addition, genome-wide association studies (GWAS) are able to leverage ancestral information from the allele frequencies of the several thousand SNPs generated for whole-genome applications, alleviating the need for specific AIM panels .
However, determining ancestry and controlling for population structure is just as important in smaller genetic association studies. These include for example candidate gene studies involving only a few genetic markers, replication of GWAS findings, or consist of smaller, highly valuable collections of rare pathological phenotypes and historical collections with limited amounts of DNA. Genotyping these samples on large AIM panels or leveraging ancestry information from preexisting genotyping is often not practical or possible.
To address this specific need, we set out to develop a highly informative AIM panel that would allow us to infer a subject’s ancestral origin at the continental level and estimate admixture proportions among at least seven main geographic regions Africa, the Middle East, Europe, Central and South Asia, East Asia, Oceania and the Americas. The selection of such AIMs has to focus on SNPs with the largest allele frequency differences between the continental regions of interest to achieve the desired resolution at the continental level. Such high resolution is required because genetic diversity of human populations follows gradients or geographic clines within and among continents rather than specific clusters or clades [3, 23, 24].
We further aimed for the development of a feasible method to determine ancestry, as resources such as funding and available DNA are often limited for these applications. We therefore developed panels of AISNPs suitable for multiplex application on two commonly used platforms, the ABI SNPlex  and Sequenome iPLEX  systems. Additionally, all markers are also included on the Illumina HumanHap550 array, thus allowing for a combined analysis with studies genotyped on the Illumina whole-genome arrays.
Lastly, we specifically focused on the applicability of our panel to determine the ancestry of subjects from any of the worldwide geographic origins. To date, most research involving genetic association studies has focused on populations of European descent, where longer LD blocks require fewer genetic markers to be genotyped . However, current gene-mapping efforts specifically request more global research, thus increasing the need for global AIM panels. Furthermore, global ancestry determination is especially important in clinical samples ascertained in specific geographic regions such as Southern California that are inhabited by individuals with very diverse and often heavily admixed ancestries.
Here we describe the development of AIM panels based on the well-studied global reference populations from the HGDP-CEPH , which include 52 geographically diverse populations collected from seven continental regions. We then greatly expanded the reference population set by genotyping the AIMs in over 2,000 additional subjects of known ancestry with the goal of achieving the most comprehensive global reference collection possible. We report on these efforts and describe highly discriminative ancestry informative 41- and 31-marker panels for multiplex applications.