Plant Working Group
UPDATE: CBOL approves matK and rbcL as the BARCODE regions for Land Plants
Statement by the Executive Committee, Consortium for the Barcode of Life 16 November 2009
The Consortium for the Barcode of Life (CBOL) received two well-documented proposals for the barcode regions for land plants. The first proposed rbcL and matK and referred to an August 2009 publication in the Proceedings of the National Academy of Science. This article pointed out that matK was not easy to amplify in some groups and that additional work on primer development is needed. The article argued that non-coding regions, especially trnH-psbA, have strong potential as DNA barcodes, but suffered from technical problems that make automated sequence assembly difficult. The PNAS paper argued that progress with a two-locus barcode region would be cheaper and faster than with a three-locus region, especially if the third region is non-coding and requires manual sequence editing. The Plant Working Group concluded that success in solving primer problems with matK is more likely than solving sequence quality problems for trnH-psbA and they recommended rbcL and matK as the two-locus barcode.
The second proposal to CBOL proposed a three-locus barcode consisting of rbcL, matK, and trnH-psbA. The proposal argued that breakthroughs on the matK and trnH-psbA problems are difficult to predict, and that a three-locus barcode would provide greater probability of having two barcode sequences for all species.
CBOL appointed an ad hoc panel of three independent reviewers for an evaluation of both proposals. The panel questioned the assertion in the PNAS paper that a solution to the matK primer problem is more likely than automated sequence processing for a non-coding region such as trnH-psbA. The panel recommended approval of the three-locus barcode with a reassessment after 18 months. If significant progress on either the matK or trnH-psbA problem is made by then, they said that the still-problematic barcode region could be eliminated.
CBOL's Executive Committee considered both proposals and the recommendations of the review panel. The Committee was convinced that the proposals provided solid factual basis on which to make a decision. Like the proposers and review panel, the Executive Committee noted that the 70-75% success rate of the proposed plant barcodes is significantly lower than the success of COI among animals. Nevertheless, the Committee agreed that further delay is unwarranted, unwise, and unlikely to produce a better solution.
The Executive Committee viewed the review panel's recommendation for a three-locus barcode region as scientifically defensible, but a more conservative and costly solution. The Committee was not convinced of the advantages of a three-locus barcode over a two-locus standard. The Committee therefore gave more weight to the findings of CBOL's Plant Working Group in its August 2009 PNAS paper. In the Committee's view, requiring a third region for the very large sample sizes involved in plant barcoding would add significant cost and delay to the plant barcoding initiative without adding resolving power in cases where effective matK primers have been developed.
The Executive Committee therefore concludes that only rbcL and matK are approved and required barcode regions for land plants. CBOL will inform GenBank that sequence records submitted to the International Nucleotide Sequence Database Collaborative are eligible to have the reserved keyword “BARCODE” as stipulated in the barcode data standards.
However, the Executive Committee accepted the review panel's recommendation to reassess the situation in 18 months. The current inability of the proposed plant barcode to resolve more than ~70% of species indicates that improvement in the approach is needed, along with more rbcL and matK data. A reassessment in 18 months would evaluate progress being made on matK primers and sequence assembly techniques for non-coding regions such as trnH-psbA.
As stated in the 2009 PNAS paper by the Plant Working Group, "In the short term, where further resolution and universality are required, we envisage that the core rbcL-matK barcode will be augmented in individual projects from a flexible short-list of supplementary loci including the noncoding plastid regions examined here (trnH-psbA, atpF-atpH, and psbK-psbI), and the trnL intron which has been advocated for situations involving highly degraded tissue (19). The rapidly evolving internal transcribed spacers of nuclear ribosomal DNA also represent a useful supplementary barcode in taxonomic groups in which direct sequencing of this locus is possible." For this reason, CBOL's Executive Committee encourages the community to collect data on trnH-psbA and other non-coding regions as a back-up to matK and to enhance protocols for the use of non-coding regions for DNA barcoding.
BACKGROUND: RECOMMENDING A STANDARD PLANT BARCODE (August 2009)
Members of the CBOL Plant Working Group have published an open access paper in PNAS recommending a plant barcode.
The preferred citation for this paper is: CBOL Plant Working Group (2009) A DNA barcode for land plants. Proceedings of the National Academy of Sciences, USA, 106: 12794-12797
The paper includes contributions from 52 researchers representing 25 institutions and compares the performance of seven candidate barcoding loci (rpoC1, rpoB, rbcL, matK, trnH-psbA, atpF-atpH, psbK-psbI). These were assessed against the following three criteria:
Criterion 1: Which loci show greatest universality?
This was tackled via a combination of pooling universality results from different laboratories, and de novo generated sequences at the University of Guelph. For angiosperms, the results were based on directly comparable sequencing trials of the seven regions, using a single primer pair per-locus, for 170 species. This work was coordinated by Mehrdad Hajibabaei (University of Guelph), using these protocols. For gymnosperms and cryptogams which often require different primer sets, the results are based on pooled data from several laboratories.
Criterion 2: Which loci are most amenable to bi-directional sequencing with few or no ambiguous base calls?
Assessments of sequence trace quality were based on de novo generated sequences from 190 land plant samples at the University of Guelph. Sequence quality assessments of the resulting trace files were undertaken by Sujeevan Ratnasingham, in collaboration with Mehrdad Hajibabaei.
Criterion 3: Which loci enable most species to be distinguished?
Assessments of species discrimination were based on pooled sequence data from several laboratories, with the aim of maximising representation of samples trialled for all seven candidate barcoding loci. The data analyses were carried out by Damon Little (New York Botanic Garden) and John Spouge (NCBI), in collaboration with Laura Forrest (Royal Botanic Garden Edinburgh). The scripts used by Damon Little in analysing the data are available online.
Based on these results, and subsequent discussions, the majority preference was to recommend portions of the genes rbcL+matK to form the core barcode for land plants. The decision was a close call, as the candidate loci each have different strengths and weaknesses. It is recognised that individual research groups may choose to supplement this core-barcode with additional loci.
WHERE CAN I FIND MORE INFORMATION ABOUT LABORATORY PROTOCOLS AND THE REGIONS THAT ARE RECOMMENDED AS THE PLANT BARCODE?
The following document summarises currently available protocols and information.
IS FURTHER PROTOCOL DEVELOPMENT REQUIRED FOR rbcL+matK?
Yes. For matK, greatest success was obtained in the Guelph sequencing trials with the 3F/1R primers designed by Ki-Joong Kim (Korea University) (see Reaction Conditions). Further primer development work is required in non-angiosperms for matK (particularly for cryptogamic plants), and it is recognised that the 3F/1R primers will not work in all angiosperms. Protocol development to enhance amplification strategies for matK, including the development of new primers and primer cocktails, is underway at laboratories including the University of Guelph, New York Botanic Garden and Royal Botanic Garden Edinburgh. Additional community efforts to enhance plant barcoding protocols are encouraged. If CBOL approves the rbcL+matK barcode we will establish an online forum to coordinate protocol development and the sharing of information to promote this development.
WHAT ARE THE MECHANISMS FOR ASSESSING THE PERFORMANCE OF PLANT BARCODES?
We encourage community dialogue to share experiences of the performance of plant barcodes so that the approaches and protocols can be refined in light of new information. Please send feedback to Barcoding@rbge.org.uk. We will enhance mechanisms for information exchange following CBOLís decision on a plant barcode.
WHAT LEVELS OF SPECIES DISCRIMINATION ARE EXPECTED FROM rbcL+matK?
The paper in PNAS was primarily designed to assess the relative performance of the different barcoding loci, rather than their absolute discriminatory power. Nevertheless, it is clear that species discrimination is on average lower in plants than in animals. We obtained ca. 72% in this study. See Fazekas et al. (2009) for a thorough appraisal of this issue.
ARE THERE INFORMATICS SYSTEMS CAPABLE OF HANDLING MULTI-LOCUS BARCODES?
The Barcode of Life Datasystems workbench (BOLD) has been configured to handle rbcL and matK. The database is also being developed to accommodate additional supplementary loci, including non-coding regions. The ongoing development of dedicated bioinformatic tools to support plant barcoding, including the use of multi-locus barcodes is a high priority.
RECENT COVERAGE ON THE PLANT BARCODE RECOMMENDATION
- PNAS article by CBOL's Plant Working Group
- Commentary in PNAS
- Press Release concerning the PNAS article (PDF) (23 July 2009)
- Science Magazine news report of the PNAS article (31 July 2009)
- Science Magazine Perspective (7 August 2009)
- Nature Magazine news report of the PNAS article (27 July 2009)
- Scientific American article (29 July 2009)
- BBC News report (29 July 2009)
- Washington Post news report (30 July 2009)
- UK Channel 4 News story (28 July 2009)
- Science Magazine news report on the Plant Working Group (September 2007)
PREVIOUS PAPERS PROPOSING OR DISCUSSING PLANT BARCODING REGIONS
- Chase MW, Cowan RS, Hollingsworth PM, et al. (2007) A proposal for a standardised protocol to barcode all land plants. Taxon 56, 295-299.
- Chase MW, Salamin N, Wilkinson M, et al. (2005) Land plants and DNA barcodes: short-term and long-term goals. Philosophical Transactions of the Royal Society B-Biological Sciences 360, 1889-1895.
- Cowan RS, Chase MW, Kress WJ, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55, 611-616.
- Fazekas AJ, Burgess KS, Kesanakurti PR, Graham SW, Newmaster SG, Husband BC, Percy DM, Hajibabaei M, Barrett SCH(2008) Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS ONE 3, e2802.
- Hollingsworth PM. (2008) DNA barcoding plants in biodiversity hot spots: progress and outstanding questions. Heredity. 101, 1–2; doi:10.1038/hdy.2008.16
- Hollingsworth ML, Clark A, Forrest LL, Richardson JE, Pennington RT, Long D, Cowan R, Chase MW, Gaudeul M, Hollingsworth PM (2009) Selecting barcoding loci for plants: evaluation of seven candidate loci with species level sampling in three divergent groups of land plants Molecular Ecology Resources 9, 439-457.
- Kress WJ and Erickson DL (2008) DNA barcodes: Genes, genomics, and bioinformatics. Proceedings of the National Academy of Sciences of the United States of America 105, 2761-2762
- Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE 2, e508.
- Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proceedings of the National Academy of Sciences of the United States of America 102, 8369-8374.
- Lahaye R, van der Bank M, Bogarin D, et al. (2008) DNA barcoding the floras of biodiversity hotspots. Proceedings of the National Academy of Sciences of the United States of America, http://www.pnas.org/cgi/content/abstract/0709936105v1
- Lahaye R, Savolainen V, Duthoit S, Maurin O, van der Bank M (2008) A test of psbK-psbI and atpF-atpH as potential plant DNA barcodes using the flora of the Kruger National Park (South Africa) as a model system. Available from Nature Precedings http://hdl.handle.net/10101/npre.2008.1896.1
- Ledford H (2008) Botanical identities: DNA barcoding for plants comes a step closer. Nature 451, 616 (doi:10.1038/451616b) http://www.nature.com/news/2008/080206/full/451616b.html
- Newmaster SG, Fazekas AJ, Ragupathy S (2006) DNA barcoding in land plants: evaluation of rbcL in a multigene tiered approach. Canadian Journal of Botany-Revue Canadienne De Botanique 84, 335-341.
- Newmaster SG, Fazekas AJ, Steeves RAD, Janovec J (2008) Testing candidate plant barcode regions in the Myristicaceae. Molecular Ecology Resources 8, 480-490.
- Pennisi E (2007) Taxonomy. Wanted: a barcode for plants. Science 318, 190-191.
- Sass C, Little DP, Stevenson DW, Specht CD (2007) DNA barcoding in the cycadales: testing the potential of proposed barcoding markers for species identification of cycads. PLoS ONE 2, e1154.
- Seberg O, Petersen G (2009) How Many Loci Does it Take to DNA Barcode a Crocus? PLoS ONE 4(2): e4598. doi:10.1371/journal.pone.0004598
- Taberlet P, Coissac E, Pompanon F, et al. (2007) Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Research 35: e14.