Developing criteria and data to determine best options for expanding the core CODIS loci
© Ge et al; licensee BioMed Central Ltd. 2012
Received: 2 November 2011
Accepted: 6 January 2012
Published: 6 January 2012
Recently, the Combined DNA Index System (CODIS) Core Loci Working Group established by the US Federal Bureau of Investigation (FBI) reviewed and recommended changes to the CODIS core loci. The Working Group identified 20 short tandem repeat (STR) loci (composed of the original CODIS core set loci (minus TPOX), four European recommended loci, PentaE, and DYS391) plus the Amelogenin marker as the new core set. Before selecting and finalizing the core loci, some evaluations are needed to provide guidance for the best options of core selection.
The performance of current and newly proposed CODIS core loci sets were evaluated with simplified analyses for adventitious hit rates in reasonably large datasets under single-source profile comparisons, mixture comparisons and kinship searches, and for international data sharing. Informativeness (for example, match probability, average kinship index (AKI)) and mutation rates of each locus were some of the criteria to consider for loci selection. However, the primary factor was performance with challenged forensic samples.
The current battery of loci provided in already validated commercial kits meet the needs for single-source profile comparisons and international data sharing, even with relatively large databases. However, the 13 CODIS core loci are not sufficiently powerful for kinship analyses and searching potential contributors of mixtures in larger databases; 19 or more autosomal STR loci perform better. Y-chromosome STR (Y-STR) loci are very useful to trace paternal lineage, deconvolve female and male mixtures, and resolve inconsistencies with Amelogenin typing. The DYS391 locus is of little theoretical or practical use. Combining five or six Y-chromosome STR loci with existing autosomal STR loci can produce better performance than the same number of autosomal loci for kinship analysis and still yield a sufficiently low match probability for single-source profile comparisons.
A more comprehensive study should be performed to provide the necessary information to decision makers and stakeholders about the construction of a new set of core loci for CODIS. Finally, selection of loci should be driven by the concept that the needs of casework should be supported by the processes of CODIS (or for that matter any forensic DNA database).
DNA database searching is now a fundamental tool for developing investigative leads. The purpose of a DNA database is to collect and store DNA profiles (for example, from crime scenes, offenders, or missing-persons cases) and enable comparison of the profiles. Because of recidivism, DNA databases essentially are designed to help solve future crimes. As of June 2011, searches on the Combined DNA Index System (CODIS) database have produced over 147,200 hits assisting in more than 141,300 investigations . Currently, the CODIS database contains more than 10 million forensic, offender and arrestee reference profiles, and the number of profiles continues to increase. The rapid growth of the database presents the following new challenges for CODIS, as for other DNA criminal databases: 1) to address the potential of increased adventitious hits; 2) to be able to increase power for current and new applications, such as missing-persons identification and familial searching; and 3) to enable international data exchange. However, the latter may be of more limited value, for example between the US and Europe or the US and Asia. Most associations are likely to be within the country and for neighboring or open-border countries such as in Europe.
General information on the STR loci selected by Hares , including chromosomal location, loci in kits or panels, mutation rates, and match probabilities, based on a Caucasian population1-3
13 Core loci
New FBI core loci4
1.54 × 10-3
3.70 × 10-4
1.54 × 10-3
3.70 × 10-4
1.36 × 10-3
2.49 × 10-4
1.68 × 10-3
2.55 × 10-4
3.71 × 10-3
4.93 × 10-4
1.66 × 10-3
2.69 × 10-4
1.98 × 10-3
3.19 × 10-4
1.37 × 10-3
7.23 × 10-5
2.06 × 10-3
3.33 × 10-4
1.54 × 10-3
3.70 × 10-4
5.20 × 10-5
6.03 × 10-5
1.54 × 10-3
3.70 × 10-4
3.25 × 10-3
4.68 × 10-4
1.74 × 10-3
4.03 × 10-4
2.60 × 10-4
2.53 × 10-4
1.03 × 10-3
5.25 × 10-4
2.23 × 10-3
7.93 × 10-4
9.75 × 10-4
5.48 × 10-4
1.75 × 10-3
1.18 × 10-3
1.70 × 10-3
1.65 × 10-4
1.05 × 10-4
6.40 × 10-3
3.00 × 10-3
2.59 × 10-4
2.53 × 10-4
1.54 × 10-3
3.70 × 10-4
20 + 4
12 + 4
The consideration of expanding and/or replacing the core loci is lauded. The CODIS system should be reviewed on a routine basis to improve capabilities and efficiencies with database searches. However, Hares and his Working Group  provided limited or no data or justifications for their selections. Indeed, some of the recommendations seem to be in conflict with the selection criteria originally defined by the Working Group. The purpose of expanding and deselecting CODIS core loci was to respond to current and projected challenges and improve performance to meet the needs of forensic applications. However, based on the selections, the resultant choice of loci may not provide the optimum performance of such a DNA database. Countries interested in establishing a DNA database and selecting their core loci might wish to proceed with caution if using the model described by Hares . Given the selection process described by Hares , it is worth asking whether the needs of CODIS should drive casework requirements, or the needs of casework should drive CODIS requirements. We believe the latter position is the correct one to take; however, the selection of new FBI CODIS core loci seems to be a greater reflection of the former position. The quality of casework evidence will always be the limiting factor and should be a primary driver for selecting loci. In addition, the power of the set of loci should be evaluated with regard to the number of potential hits for a given the application (for example, direct single source, mixtures, and indirect familial searching) and database size. The primary application of the CODIS database has been to search for the 'single source match' in the database, and most investigation leads fall into this category. The national level of CODIS (National DNA Index System; NDIS), requires that a forensic profile should contain a minimum of 10 loci. This allowance for fewer than 13 loci for forensic samples is a clear recognition that forensic DNA can be compromised, and full profiles are not always obtainable. Fewer than 10 loci are not permitted for upload to avoid generating too many adventitious hits. NDIS does accept additional loci beyond the 13 core loci, but currently does not use these loci in the initial search parameters. Currently, NDIS only accepts mixture profiles that meet the '4 by 4 rule' (that is, a forensic profile can have up to 4 alleles at a maximum of four core loci and no more than 2 alleles at any of the remaining 9 core loci, or 6 loci if only the minimum of 10 loci is submitted) . Hares  did not describe whether the '4 by 4 rule' (better described as a 9 by 2 rule) would still apply if the new core loci are adopted. Currently, it is assumed that the rule will continue, probably because the selection of new loci does not seem to account for the effect on quality and quantity of DNA derived from forensic samples. The criteria for additional loci should be considered as they apply to single-source data, mixture results (if this is a required search condition), and projected kinship applications given a database of size N. Although it is obvious that adding more loci in a virtual sense will increase power, changes to the CODIS core loci first should be based on the power and efficiency of the current loci, and equally as important (if not more so), whether they meet the needs of forensic applications. If they do not, then the alternative loci that would be most applicable to those needs should be selected. For example, the TPOX locus was relegated to the second-tier level, and although we agree with this based on the PD, the TPOX locus may in fact perform much better in casework than more informative loci, such as the FGA locus. The FGA locus is a large amplicon locus and is more likely to drop out with degraded or inhibited samples compared with the TPOX locus (at least for some kit configurations). Even when the amplicon size of the TPOX and FGA loci overlap, the wider spread of alleles for the FGA locus yields greater heterozygote peak height imbalance and allele dropout than the TPOX locus (and other STR loci), particularly for challenged samples. It does not appear that locus performance in casework analyses was taken into account during selection of the chosen loci ; if a locus in a compromised sample cannot be typed, it cannot be uploaded to a database. The selection of core loci is therefore more complex than just determining what loci are available, and most importantly, the needs of casework should be considered in the selection process.
The criteria that the Working Group  used to base its selection of core loci are: 1) No known association with medical conditions or defects (refers to whether there is a reported association of the locus with a medical condition or disease status); 2) low mutation rate (a locus with a mutation rate preferably of less than 0.30%); 3) high level of independence (refers to linkage equilibrium (LE) of the loci on the list to enable multiplication of genotype frequencies); 4) high level of discrimination (a locus with a probability of identity preferably of less than 0.10%) (note: this value is obviously a typographical error, and is more likely to be 0.1); 5) use by the international forensic DNA community (refers to the use - widespread or limited - of the loci by forensic DNA laboratories outside the USA); 6) number of loci versus discrimination factor (refers to balancing the total number of loci recommended with the level of discrimination they offer); 7) compliance with quality assurance standards (refers to the loci satisfying the requirements of the FBI Director's Quality Assurance Standards such as validation, being human-specific, etc.).
These are reasonable criteria, except for the omission of the potential effect on test performance with DNA degradation and inhibition. However, no systematic and scientific assessments of the selected loci, or how they comport with the selection criteria, were described. Additionally, the selection process did not provide any data on a number of issues, such as the power of the current core loci and the projected database sizes, the limitations invoked by the quality of casework materials, the perceived need for resolving Amelogenin Y-amplicon drop-out when searching for candidates, the justification of suggesting the low PD Y-STR locus DYS391 (particularly given the downgrading of the TPOX locus because of its low PD), alternative applications (for example, familial searching and missing-persons identification), and the reduction in sensitivity of detection that can occur if multiplexes become larger.
In this paper, we provide examples and simplified analyses as potential considerations while the community moves forward in modifying the CODIS core loci and for countries that are currently instituting DNA databases. We did not attempt to address all criteria in depth. Instead, we analyze and discuss the issues with examples to make the point that selection is a more complex process than Hares  seems to have taken into account, and the process should be given more in-depth consideration with wider community input. Indeed, the European Network of Forensic Science Institutes (ENFSI) used input from its multi-country members to produce a consensus-built standard clearly driven by the demands of typing challenged samples (supported by selecting a number of mini-STRs [4–6]). The examples provided in this paper are simplified analyses on the performance of various combinations of STR loci for their adventitious hit rates in reasonably large datasets for the primary forensic applications of single-source profile comparisons, mixture comparisons, and kinship searches, and for international data sharing. These examples could provide a basis for the issues to consider and the work that might be performed to support core STR loci selection criteria.
Methods and Results
We obtained the allele frequencies of the Caucasian population for the loci D10S1248, D12S391, D16S539, D18S51, D19S433, D1S1656, D21S11, D22S1045, D2S1338, D2S441, D3S1358, D8S1179, FGA, TH01, and vWA from Budowle et al., for D13S317, D7S820, D5S818, CSF1PO from Budowle et al., for DYS391 from Budowle et al., PentaD and PentaE from Budowle et al., and for SE33 from Butler et al.. The mutation rates in Caucasian populations for the loci CSF1PO, FGA, TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, D2S1338, D19S453, PentaD, and PentaE were taken from the American Association of Blood Banks (AABB) annual report for 2008 , and those for the SE33 locus from STRBase .The mutation rates of the loci D10S1248, D12S391, D1S1656, D2S441, and D22S1045 were not available, thus their rates were assigned as the average of the tetranucleotide markers. The mutation rates of 16 Y-STR loci of Caucasian and world population data were from Ge et al. and YHRD , respectively. Chromosomal locations of the STR loci were from NCBI  and STRBase  (Table 1).
Evaluation of the autosomal STR loci
Independence between loci
Current autosomal STR-based forensic applications assume independence between the core CODIS STR loci, so that the match probability, kinship index (KI), or likelihood ratio (LR) of each locus can be multiplied together. The community seems to favor independent loci, except where this is not possible, such as the lineage markers on the Y chromosome and the mitochondrial DNA (mtDNA) genome. The desire for independent autosomal STR loci is presumably due to the ease of calculation compared with a more complicated estimation of haplotype frequencies. We do not comment on the position of selecting independent loci; we merely acknowledge it and note that it was a criterion of the Working Group. It is likely that using systems that are relatively independent is easier for the community, and there are sufficient STR loci to select ones that meet the criterion of biologic independence.
Independence between loci usually requires that the loci are not genetically linked and that they are in LE. However, Hares  set a criterion of LE but seemed to neglect genetic linkage between the loci. This misunderstanding was also espoused by O'Connor et al., although those authors later provided a correction . LE between the alleles at two loci may sometimes be met at the population level, and the loci can be assumed to be independent for direct single-source and mixture comparison calculations without corrections . However, genetic linkage describes the situation where loci that are physically close to each other tend to be inherited together in families. Genetic linkage, measured by recombination fraction, should be considered before assuming independence for kinship analyses. Ideally, recombination should be close to 50% for the assumption that two loci are unlinked, and can be used independently in kinship analysis. The loci VWA and D12S391 do not significantly deviate from LE at the population level; however, they reside on the same chromosome about 6 Mb apart, and the recombination fraction is approximately 11%. The data indicate that the KIs of the VWA and D12S391 loci cannot be directly multiplied together . Indeed, these two loci do not meet the Working Group's third criterion regarding independence, that is, 'High level of independence (refers to linkage equilibrium of the loci on the list to enable multiplying genotype frequencies)' or the motivation 'to increase discrimination power to aid missing persons cases '. As can be seen by the chromosome locations of the 24 STR loci (Table 1), in addition to the VWA and D12S391 loci, the distance between the D5S818 and CSF1PO loci and the D21S11 and PentaD loci are about 26 Mb and 24 Mb, respectively. These additional two pairs may also be genetically linked. Phillips et al. described, with reasonable assumptions, recombination between the loci D5S818 and CSF1PO and between D21S11 and PentaD of 25.22% and 35.68%, respectively, based on HapMap data . Family-based linkage studies should be carried out to confirm the recombination fractions before selecting core loci that meet the criterion of independence. The effect of close linkage on forensic applications should be investigated further. We do recognize that it may not be possible to satisfy all desired criteria; however, given the large battery of available loci, there is no need to compromise the 'independence criterion' for autosomal STRs if building a better (that is, more informative) system for the future growth of databases is desired.
Single-source profile comparison
The primary application in CODIS searches is single-source profile comparisons. Discrimination power or match probability (MP) of the current and proposed STR loci should be evaluated. Most autosomal loci have MP values of less than 0.1 (Table 1). The SE33 and D1S1656 loci have the lowest MP (that is, are the most informative) of all loci in sections A and B. The TPOX locus has the highest MP (that is, is the least informative) of all the autosomal loci listed by Hares . The D22S1045 locus has the highest MP among the European loci. Based on PD or MP, these two loci seem to be better suited to section B as the Working Group recommended .
The expected match probability (EMP) of the kits/panels.1
Panel (number of STR loci)
Fst = 02
Fst = 0.01
Fst = 0
Fst = 0.01
Fst = 0
Fst = 0.01
New FBI core (24)3
6.28 × 10-30
5.12 × 10-29
3.63 × 10-18
1.15 × 10-17
3.49 × 10-11
4.86 × 10-11
New FBI core section A (20)3
9.54 × 10-25
4.77 × 10-24
3.83 × 10-15
9.37 × 10-15
1.74 × 10-9
2.29 × 10-9
13-loci CODIS core (13)
2.34 × 10-15
5.83 × 10-15
1.74 × 10-9
2.86 × 10-9
3.39 × 10-6
4.05 × 10-6
5.93 × 10-18
1.73 × 10-17
5.04 × 10-11
9.17 × 10-11
4.21 × 10-7
5.17 × 10-7
2.43 × 10-18
7.48 × 10-18
3.06 × 10-11
5.74 × 10-11
3.61 × 10-7
4.45 × 10-7
1.12 × 10-19
4.15 × 10-19
5.68 × 10-12
1.17 × 10-11
2.03 × 10-7
2.52 × 10-7
One essential criterion not addressed by the Working Group or described by Hares  is what may be considered 'manageable'. There should be some discussion on the number of associations per search that can be tolerated, as this will assist in determining the power needed. This concept of manageability is not a simple one to address, but obviously has an effect on performance goals.
Even with the larger databases that are expected in the near future, the current battery of loci provided in already validated commercial kits (for example, Identifier, PowerPlex16, and NGM) meet the needs for single-source profile comparisons, including those with a significant proportion of relatives and subpopulations. Adding more loci in a virtual sense will increase the PD, but on a practical level, little efficiency is gained for single-source comparisons even for a database containing more than 100 million reference profiles.
Average kinship index (AKI) of the short tandem repeat (STR) loci for full-sibling (FS) and parent/child (PC) relationships with Caucasian population data.1,2
Mixture profiles are very common in casework and are likely to increase as more high-volume crime evidence is subjected to DNA typing. Currently, the CODIS upload criteria preferentially selects for single-source profiles, and thus mixtures are not of great concern. However, to increase the number of developed investigative leads, the effect of mixtures should be considered when selecting core loci and in the context of how they are accommodated for uploading and searching within CODIS. Multiple potential contributors to a mixture profile may be found in a database search. The goal should be that the number of potential contributors should be small and manageable for investigative purposes (we note as stated above that the term 'manageable' has not been defined by the Working Group and this is something that perhaps should be addressed prior to evaluating the power of the loci).
It seems that, in a database with 1 million profiles, most 2-person mixtures will yield a small number of candidate contributors with 13 CODIS loci. With additional loci included in the core set, fewer candidate contributors are expected from a search with a mixture profile. For a database with 10 million or more profiles, the distributions are expected to move towards an increased number of candidate contributors. More precise distributions for larger databases can be obtained with more powerful computational resources.
International data sharing
International data sharing across countries is another reason espoused to expand the CODIS core loci . This criterion is more important for neighboring and open-border countries, such as those in Europe. The number of anticipated hits between, for example, Europe and the USA, is expected to be very few compared with all within-country searches. Thus, the requirement for international compatibility may not be as important as other selection criteria. The USA might be better served by ensuring compatibility with Canada and Mexico. More data are needed on the expected number of searches between the US and other areas such as Europe, Asia, and Latin America to determine the effect of compatibility. Regardless, most data sharing focuses mainly on single-source profile comparisons, and the 13 CODIS core loci share 7 loci in common with the European Standard Set ('S' in Table 1) and 8 loci in common with the new European Standard Set (including both 'D' and 'S' in Table 1). The EMPs of the shared 7 and 8 loci are 1.2 × 10-9 and 1.2 × 10-10, respectively, which on average seems to be practical for data exchange with current and larger sized databases, as the number of adventitious associations is expected to be low for single-source profile comparisons. However, a large proportion of the database profiles (in the USA and presumably in other countries) also contain the loci D2S1338 and D19S433, thus the EMP reduces to 10-13. Adding D1S1656, D2S441, D10S1248 and D12S391 (also which are generally well suited to typing relatively degraded samples) to the core loci, as Hares  suggested, can reduce the EMP to 10-16, but on a practical level, international data sharing with European databases may not need these additional loci.
China has the single largest forensic DNA database, which currently contains almost 12 million profiles. There was no discussion by Hares  on compatibility with China. There are five major commercial kits used in China, among which 11 loci (see Table 1) are shared by these five predominant commercial kits . These 11 loci are all within the current CODIS core loci. The EMP of these 11 loci can reach 1.6 × 10-13 and 1.5 × 10-13 for Chinese Han and Caucasian populations, respectively, and is sufficiently low for data sharing between China and the USA. These 11 loci include 6 loci in common with the European Standard Set and 7 loci with the new European Standard Set, with EMPs of 1.5 × 10-8 and 1.5 × 10-9, respectively. China will continue to move forward and formalize its core set of loci, and perhaps compatibility with those loci should be considered. Regardless, there may be sufficient compatibility for most single-source searches between the US and China.
There are two points that we do not address here, which should be considered before selecting loci: 1) many adventitious matches can be excluded by non-genetic information and thus, how that information would be used with the genetic data should be explored for practicality; and 2) the percentage of cases that would be facilitated by international sharing should be assessed. Most crimes will occur within a country or bordering countries. Although we personally do not have the data to resolve the value of international sharing, the utility of this should be considered.
Evaluation of Y-chromosome short tandem repeats
Match probability and mutation rates per Y-STR locus.
Mutation rates × 10-3
Y- chromosome short tandem repeat (Y-STR) combinations with minimum match probability (MP) for a specified number of Y-STR markers.1
Y-STR combinations with minimum MP2
KI = 1/MP3
3, 5, 15
1, 3, 5, 15
1, 2, 3, 5, 15
1, 2, 3, 5, 13, 15
1, 2, 3, 5, 9, 13, 15
1, 2, 3, 5, 9, 13, 14, 15
1, 2, 3, 5, 9, 10, 13, 14, 15
1, 2, 3, 5, 8, 9, 10, 13, 14, 15
1, 2, 3, 4, 5, 8, 9, 10, 13, 14, 15
0, 1, 2, 3, 4, 5, 8, 9, 10, 13, 14, 15
1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15
0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15
0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
Clearly, Y-STR loci are not as good as autosomal STRs for single-source profile comparisons, and as the current battery of autosomal STRs is sufficient for large database searches, there would be no need to include a set of Y-STRs. However, Y-STRs are very good for kinship analysis and for power of exclusion in familial searching and missing-persons identification. At this time Y-STR loci are not included in reference profiles (other than for missing persons) in the CODIS database, thus a familial search candidate list requires substantial work by the laboratory to eliminate a number of candidates. Currently, the DNA of familial search candidates is retrieved and typed for Y-STRs, and samples with non-matching Y-STR profiles are excluded. Substantial labor is required, and turnaround times can be slow. Faster turnaround times for investigative leads could be achieved if the new core loci included several Y-STRs instead of adding more autosomal loci. Indeed, only a small number of Y-STR loci are needed (probably only around 6).
Y-STR haplotypes can also be useful in interpretations of mixtures, especially when a single male DNA is mixed with female DNA. Ge et al. estimated the power of exclusion of 16 Y-STR haplotypes with a relatively small database size, and found that 95% of 2-person mixtures had 10 or fewer candidate haplotypes in the database. Further studies need to be carried out with fewer Y-STRs (around 6) in a larger Y-STR database to estimate the power of exclusion and number of possible contributors with using solely Y-STRs. The Y-STRs could then be combined with autosomal STRs for further evaluation. Consideration should include the effect of maintaining the current autosomal STR systems (that are in extant commercial kit formats) and of combining them with five or six informative Y-STRs. Increasing the number of investigative leads should be a primary motivation of the core loci selection.
Combining autosomal Y-chromosome short tandem repeats
As described in the two sections above, both autosomal STRs and Y-STRs have their places in forensic applications. Combining both autosomal STRs and Y-STRs may best meet the needs of forensic applications for single-source and kinship searches in large databases. Thus, we evaluated the performance of a combination of autosomal STRs and Y-STRs when the total number of core loci is limited because of the quality and quantity of forensic DNA. The loci in section B were not included because of their limitations in independence, MP, and/or mutation rates.
Match probabilities (MPs) of short tandem repeat (STR) loci combinations.
14 auto + 6 Y
1.53 × 10-21
15 auto + 5 Y
2.42 × 10-22
16 auto + 4 Y
4.48 × 10-23
17 auto + 3 Y
9.64 × 10-23
18 auto + 2 Y
4.83 × 10-24
19 auto + 1 Y1
3.38 × 10-25
9.20 × 10-25
Identifiler + 5 Y
7.74 × 10-20
PowerPlex16 + 5 Y
3.34 × 10-20
NGM + 5 Y
2.21 × 10-21
7 shared auto + 5 Y2
5.37 × 10-12
11 shared auto + 5 Y3
6.82 × 10-16
6 shared auto + 5 Y4
6.71 × 10-11
The purpose of creating criminal DNA databases is to generate investigative leads. With the growth of databases and expansion of applications, adding more STR loci into databases has been proposed or discussed in the USA , Europe [4–6], and China . Additional and alternative loci are being proffered. We promote the review of the current state of the art, and welcome recommendations for the future potential of the art. We have provided some example analyses for illustrative purposes for decision- and policy-makers and stakeholders to consider, beyond those considered by Hares . Such decisions have an important influence on developing investigative leads and could cost millions of dollars. Thus, judicious decisions with community input should be sought. The current battery of loci performs well for some applications, but is not sufficient for others. However, increasing the number of autosomal STR loci may not be the only or the best solution. For overall applications, a small set of Y-STRs with the current STR batteries may be more practical, especially if analyses for kinship (including familial searching), and possibly for mixtures, are to be part of the process. Indeed, a combination of autosomal and Y-STRs will perform well for single-source searches. The analyses described here should be expanded with larger simulations and include other relevant populations to generate data for more informed decision-making.
We strongly urge that the selection process consider casework applications as the primary driving force in the selection of core loci. The quantity and quality of DNA derived from casework evidence will always be a limiting factor. For instance, if the current loci are being reconsidered, the performance of large amplicon loci should be evaluated, especially in light of expanded analyses on forensic evidence, such as 'touch DNA'. For example, the FGA locus may provide a high discrimination power, but its performance in challenged samples may be poor compared with some less informative but smaller-sized amplicon loci. Partly the performance is due to amplicon size limitations, and partly to the wide spread of the FGA alleles. Data on success rates for the various loci (obviously in kit format) should be collected for forensic-evidence analyses.
The potential increase in resource strain on laboratories must also be weighed against the gain in power. Given the direction of casework towards typing more challenging samples (such as low-quantity and/or degraded samples), those STR loci that can be converted to mini-STRs might be considered the most desirable and thus it might be better to consider rejecting loci that cannot be converted to mini-STRs. Additionally, the FBI Working Group may have been too narrow in its STR performance review. For example, we have already pointed out that the PentaD locus, relegated to section B, is more informative than several of the STR loci in section A. However, the largest allele in the PentaD allelic ladder is a 17. Thus, it is entirely feasible that the PentaD locus (and the PentaE locus) could be converted to a mini-STR locus.
Indeed, a multiplex kit has reportedly been developed with the amplicon size of the Penta loci reduced . Perhaps the selection criteria should take into account the potential size of amplicons and avoid being constrained by current kit designs. CODIS could possibly drive the development of mini-STR configuration kits. In addition, developing very large multiplex kits may be possible for reference samples, but may be less easily met for casework demands. Sensitivity of detection is paramount for casework kits. Thus, the requirement for more loci may translate into two kits, putting greater demand on the casework laboratories and possibly still not increasing the number of typed loci if the DNA evidence is compromised. If more loci are to be added, it may be better to add more Y-STR loci instead of only autosomal loci, as the Y-STR loci (in concert with the core loci) can support both direct and indirect comparisons effectively. Using the criterion of casework performance, the conclusions for loci to include and exclude in a core set may change from those proffered by Hares .
Low-level population substructure is another criterion for a good forensic locus. Population substructure is usually measured by Fst (i.e., inbreeding coefficient). High Fst can reduce the information content of the locus. The National Research Council (NRC) Report II  recommended a conservative Fst value of 0.01 for major populations. As they have multiple alleles per locus and are highly polymorphic, the most commonly used autosomal STR loci are expected to have a low average Fst. Although the effect is small if a couple of higher Fst loci are added to a core set, it would be desirable to have population data from major populations to test for substructure effects before selecting loci. Similarly, it would be desirable to generate mutation-rate data before selecting loci. Population studies will be difficult to achieve in the current forensic arena because sufficient population data will not be generated by forensic laboratories unless the loci are part of a core set or in commercial kits. Funding could be provided to support CODIS endeavors to ensure a robust and long-lasting system is developed.
Assessing the CODIS loci is a laudable endeavor that needs to be carried out. We did not undertake all the studies necessary to evaluate the current loci and the needs that these proposed loci should meet. However, based on the discussion and simplified studies given here (generated for illustrative purposes), there are several points to consider.
The use of mitochondrial DNA was not considered in these studies because most of the profiles in CODIS are from men, and different methods or technology would be required for mtDNA typing. However, the database (and other databases worldwide) continues to grow, and proportionally more women (and maternal associations) may populate the database in the future. Therefore, future discussion should consider the value of some mtDNA markers for CODIS applications. Other markers that might be discussed and evaluated for long-term benefit include single-nucleotide polymorphisms (to include indels) and X-STRs. Next-generation sequencing technologies may make it possible to type autosomal STRs, Y-STRs, mtDNA and single-nucleotide polymorphisms in one analysis, and technical capability projections might be considered. Additionally, we did not address the effect of the selection criteria under moderate-stringency search parameters, or whether markers should be in the public domain. To better serve the lofty goals of improving single-source profile comparisons, mixture comparisons, kinship analyses such as missing-persons identification and familial searching, and international data sharing, more comprehensive studies are required to provide sufficient information to the decision-makers and stakeholders about constructing a new set of core loci for CODIS. Finally, the need to improve typing capabilities for casework analyses, and especially challenged forensic samples, must be the primary criterion for selecting core loci for CODIS. The most polymorphic loci will tend to be better for mixture deconvolution, but will tend to have higher mutation rates. These loci also will have the greatest spread of alleles, and thus be more subject to degradation. Therefore, a balance may need to be sought between information content and allele spread. We contend that most currently used STR loci that can be converted to small-sized amplicons will perform better overall for challenged casework and still be useful for mixture deconvolution (even if they are not the most polymorphic of loci) and for kinship analyses (because they will tend to have lower mutation rates).
List of abbreviations
Combined DNA Index System
average kinship index
expected match probability
National DNA Index System
short tandem repeat.
We thank Melody Josserand for her useful contributions to the topic under discussion. Additionally, we thank the two anonymous reviewers for their comments, which enabled further clarification of the presentation of some of the issues described in our paper.
- http://www.fbi.gov/about-us/lab/codis/ndis-statistics accessed on 8 August 2011,
- Hares DR: Expanding the CODIS core loci in the United States. Forensic Sci Int Genet. doi:10.1016/j.fsigen.2011.04.012
- National DNA Index System (NDIS) DNA Data Acceptance Standards: Operational Procedures. 2005,http://www.nlada.org/Defender/forensics/for_lib/Documents/1132070952.06/RF_GN_13_NDIS_Data_Standards%252005_31_05.pdf accessed on 8 August 2011,Google Scholar
- Gill P, Fereday L, Morling N, Schneider PM: New multiplexes for Europe -- amendments and clarification of strategic development. Forens Sci Int. 2006, 163: 155-157. 10.1016/j.forsciint.2005.11.025.View ArticleGoogle Scholar
- Schneider PM: Expansion of the European Standard Set of DNA database loci - the current situation. Profiles in DNA. 2009, 12 (1): 6-7. [http://www.promega.com/profiles/1201/1201_06.html]Google Scholar
- Gill P, Fereday L, Morling N, Schneider PM: The evolution of DNA databases -- recommendations for new European STR loci. Forens Sci Int. 2006, 156: 242-244. 10.1016/j.forsciint.2005.05.036.View ArticleGoogle Scholar
- Budowle B, Ge J, Chakraborty R, Eisenberg AJ, Green R, Mulero J, Lagace R, Hennessy L: Population genetic analyses of the NGM STR loci. Int J Legal Med. 2011, 125 (1): 101-109. 10.1007/s00414-010-0516-7.View ArticlePubMedGoogle Scholar
- Budowle B, Shea B, Niezgoda S, Chakraborty R: CODIS STR loci data from 41 sample populations. J Forens Sci. 2001, 46: 453-489.Google Scholar
- Budowle B, Ge J, Aranda X, Planz J, Eisenberg A, Chakraborty R: Texas population substructure and its impact on estimating the rarity of y str haplotypes from DNA Evidence. J Forens Sci. 2009, 54 (5): 1016-1021. 10.1111/j.1556-4029.2009.01105.x.View ArticleGoogle Scholar
- Budowle B, Masibay A, Anderson SJ, Barna C, Biega L, Brenneke S, Brown BL, Cramer J, DeGroot GA, Douglas D, Duceman B, Eastman A, Giles R, Hamill J, Haase DJ, Janssen DW, Kupferschmid TD, Lawton T, Lemire C, Llewellyn B, Moretti T, Neves J, Palaski C, Schueler S, Sgueglia J, Sprecher C, Tomsey C, Yet D: STR primer concordance study. Forens Sci Int. 2001, 124 (1): 47-54. 10.1016/S0379-0738(01)00563-1.View ArticleGoogle Scholar
- Butler JM, Hill CR, Kline MC, Duewer DL, Sprecher CJ, McLaren RS, Rabbach DR, Krenke BE, Storts DR: The single most polymorphic STR Locus SE33 performance in U.S. populations. Forens Sci Int Genet. doi:10.1016/j.fsigss.2009.08.173
- AABB annual report. 2008, http://www.aabb.org/sa/facilities/Documents/rtannrpt08.pdf Accessed on 8 August 2011
- http://www.cstl.nist.gov/strbase/ Accessed on 8 August 2011
- Ge J, Budowle B, Aranda XG, Planz JV, Eisenberg AJ, Chakraborty R: Mutation rates at Y chromosome short tandem repeats in Texas populations. Forens Sci Int Genet. 2009, 3 (3): 179-184. 10.1016/j.fsigen.2009.01.007.View ArticleGoogle Scholar
- Willuweit S, Roewer L: Y chromosome haplotype reference database (YHRD): update. Forens Sci Int Genet. 2007, 1: 83-87. 10.1016/j.fsigen.2007.01.017.View ArticleGoogle Scholar
- http://www.ncbi.nlm.nih.gov/ Accessed on 8 August 2011
- O'Connor KL, Hill CR, Vallone PM, Butler JM: Linkage disequilibrium analysis of D12S391 and vWA in U.S. population and paternity samples. Forens Sci Int Genet. 2011, 5 (5): 538-540. 10.1016/j.fsigen.2010.09.003.View ArticleGoogle Scholar
- O'Connor KL, Hill CR, Vallone PM, Butler JM: Corrigendum to "Linkage disequilibrium analysis of D12S391 and vWA in U.S. population and paternity samples.". Forens Sci Int Genet. 2011, doi:10.1016/j.fsigen.2010.09.003Google Scholar
- Phillips C, Ballard D, Gill P, Syndercombe Court D, Carracedo A, Lareu M: The recombination landscape around forensic STRs: accurate measurement of genetic distances between syntenic STR pairs using HapMap high density SNP data. Forens Sci Int Genet. 2011, doi:10.1016/j.fsigen.2011.07.012Google Scholar
- http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/latest/rates/ Accessed on 2 September 2011
- Weir BS: Matching and partially-matching DNA profiles. Ann Appl Stat. 2007, 1 (2): 358-370. 10.1214/07-AOAS128.PubMed CentralView ArticlePubMedGoogle Scholar
- Ge J, Chakraborty R, Eisenberg AJ, Budowle B: Comparisons of familial DNA database searching strategies. J Forensic Sci. 2011, Doi: 10.1111/j.1556-4029.2011.01867.xGoogle Scholar
- Ge J, Budowle B, Chakraborty R: Choosing relatives for DNA identification of missing persons. J Forens Sci. 2011, 56 (s1): S23-S28.View ArticleGoogle Scholar
- Ge J, Yan JW, Budowle B, Chakraborty R, Eisenberg A: Issues on China forensic DNA database. Chinese J Forens Med. 2011, 26 (3): 252-255.Google Scholar
- Ge J, Budowle B, Planz J, Eisenberg A, Ballantyne J, Chakraborty R: US forensic Y-chromosome short tandem repeats database. Leg Med. 2010, 12 (6): 289-295. 10.1016/j.legalmed.2010.07.006.View ArticleGoogle Scholar
- Bhoopat T, Hohoff C, Steger HF: Identification of DYS385 allele variants by using shorter amplicons and Northern Thai haplotype data. J Forensic Sci. 2003, 1108-1112. 5
- Ge J, Budowle B, Chakraborty R: Interpreting Y chromosome STR haplotype mixture. Leg Med. 2010, 12 (3): 137-143. 10.1016/j.legalmed.2010.02.003.View ArticleGoogle Scholar
- http://www.agcu.cn/ProductView.asp?ID=55, accessed on Nov. 26th 2011
- National Research Council Committee on DNA Forensic Science: An Update: the Evaluation of Forensic DNA Evidence. 1996, Washington (DC): National Academy PressGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.