Genetic diversity of Colletotrichum lindemuthianum races based on ITS-rDNA regions

Colletotrichum lindemuthianum is the causal agent of anthracnose in common bean. Favorable conditions for the occurrence of this disease might result in up to 100% yield losses. One of the main challenges for common bean producers and breeders still remains the management of the disease, since this pathogen exhibits a wide genetic variability, probably due to its recombination sexual reproduction. The aim of this work was to study the genetic diversity of C. lindemuthianum races from different Brazil regions, through the sequencing of the ITS regions. The 5·8S gene and the flanking internal transcribed spacer regions (ITS1 and ITS2) of 40 different isolates of C. lindemuthianum collected in Brazil were amplified by PCR, and sequenced in order to determine the genetic variability. The results revealed that 46.88% of SNPs were detected in the ITS1 region, while 53.12% of them were located in the ITS2 region. The genetic distance generated by the p-distance method ranged from 0.000 to 0.169 between the races. The greatest distance was observed between the races 10 and 73 with a value of 0.169, indicating a wide genetic variability between them. The phylogenetic tree was composed of three groups, Group I had five subgroups. Similar results were also observed through the population structure analysis, which revealed the presence of three clusters. These results suggest that the sequence analysis of ITS regions of C. lindemuthianum rDNA may be a valuable tool to identify this pathogen through the design of specific primers.


INTRODUCTION
Common bean (Phaseolus vulgaris L.) is one of the most important legumes for the human diet in the world, especially in Latin America and Africa (Broughton et al., 2003). The socioeconomic importance of common bean is unquestionable, since this grain legume is in most cases the primary source of proteins, carbohydrates, vitamins and minerals for human diet (Hefni, Öhrvik, Mohamed & Witthoft, 2010).
Unfortunately, common bean crop is susceptible to several diseases caused by fungi. Among them, anthracnose caused by Colletotrichum lindemuthianum (Sacc and Magnus) Briosi and Cavara (1889) has a great impact on the grain yield and quality, because of its manifestation during the three growing seasons and severe damage of the crop, which in some cases is estimated 100% losses (Chiorato, Carbonell, Moura, Ito & Colombo, 2006). The Symptoms include necrotic or depressed lesions, of various colors and shapes, in petioles, pods, leaves and seeds ( Figure 1). The severity of the infection depends on both the race and the variety of the common bean (Kimati et al., 1997;Vieira et al., 2006).
Common bean was independently domesticated from wild beans at least in two separate geographic centers, Mesoamerica (from Mexico to Colombia) and Andes (from Colombia to Argentina) (Gepts & Debouck, 1991), giving rise to the two main gene pools. Beans from Mesoamerican gene pool are small to medium-seeded and, exhibit significantly greater genetic diversity than mostly large-seeded Andean beans (Beebe et al., 2000, Beebe, Rengifo, Gaitan, Duque & Tohme, 2001Chacón, Pickersgill & Debouck, 2005). An interesting studying conducted about the virulence pathogenic and genetic variations revealed that, causal agents of anthracnose (C. lindemuthianum), rust (Uromyces appendiculatus) and angular leaf spot (Phaeoisariopsis griseola) segregated into two distinct groups, Andean and Mesoamerican, that mirrored the genetic diversity of common bean (Guzmán et al., 1995;Pastor-Corrales, 1996). Andean isolates of C. lindemuthianum are usually isolated from common bean cultivars that belong to the Andean gene pool.
The studies about pathogenic variability revealed that different mechanisms such as parasexuality, anastomosis and the formation of anastomoses tubes between conidia (CATs) are involved (Roca, Davide, Mendes-Costa & Wheals, 2003). This justifies the high number of physiological races and the complexity in the use of the genetic resistance (Pereira, Ishikawa, Pinto & Souza, 2010).
Phenotypic and genotypic analyses are tools that assist in the characterization of pathogen variability at inter and intra-specific level, providing a more precise information for bean breeding programs. Interestingly, sequencing specific regions of genome can also efficiently assess genetic variability.
One approach to conduct this investigation is based on amplification of Internal Transcribed Spacer (ITS) regions of ribosomal DNA (rDNA) via PCR. ITS regions are transcribed into a precursor molecule named as 45S. After this molecule is cleaved at specific sites, the mentioned spacers (ITS1 and ITS2) are then removed. The precise function of ITSs is still unknown; however there is good evidence that they play an important role in biogenesis of the major subunit of rRNA and maturation of the small subunit (Hlinka, Murrell & Barker, 2002).
A large amount of nucleotide sequences from ITS regions of Colletotrichum and Glomerella are available at international databases, which are frequently used to homology sequence analyses (Moriwaky et al., 2002;Lobuglio & Pfister, 2008;Crouch, Clarke & Hillman, 2009). Sequencing of ITS regions is a way of detecting variations in C. lindemuthianum through SNP (Single Nucleotide Polymorphism) markers. These molecular markers can be used for detection of specific mutations species, allowing a classification of samples within the molecular taxonomy (Morin, Luikart & Wayne, 2004).
SNPs are considered stable markers because they are less mutable compared to others. Therefore, they are considered excellent for the study of genomic evolution, and consequently they are easier and more appropriate markers for use in population studies (Jehan & Lakhanpaul, 2006).
To better understand pathogen dynamics in different crop cultivation regions, it is necessary to focus on the detection of polymorphisms at physiological intra-race level. Considering the fact that ITS regions can vary intra specifically in the base sequences, these regions may be suitable for discriminating possible inter and intra-variations in the population of the pathogen.
Therefore, the objective of this work was to evaluate the genetic diversity of C. lindemuthianum races from different regions of Brazil through sequencing of ITS regions, using Neighbor joining (NJ), p-distance and Markov chain Monte Carlo (MCMC) methods.

Isolates of C. lindemuthianum
On this study we evaluated 40 isolates monosporic of C. lindemuthianum from Mato Grosso, Paraná and Santa Catarina states, Brazil, which belong to the mycoteca of Laboratório de Melhoramento de Feijão Comum e de Biologia Molecular do Núcleo de Pesquisa Aplicada à Agricultura (Nupagri). Of the 40 isolates, 32 races were previously characterized ( Table 1). The experiments were performed in the facilities of Laboratório de Melhoramento de Feijão Comum e de Biologia Molecular do Núcleo de Pesquisa Aplicada à Agricultura (Nupagri), Universidade Estadual de Maringá (UEM).

Genomic DNA extraction and quantification
A total of 40 isolates of Colletotrichum lindemuthianum (Table 1) recovered from common bean plants with anthracnose symptoms from several locations of Brazil were used in this study (Thomazella et al., 2002b;Gonçalves-Vidigal et al., 2008;Felipin-Azevedo et al., 2014). Monosporic isolates were kept at -20°C on filter paper impregnated with a conidium-mycelium suspension. Cultures of these isolates were grown on PDA medium and maintained at 4°C for DNA isolation. Genomic DNA was extracted from 250 mg of hyphal tissue using the methodology proposed by Raeder and Broda (1985) with modifications. DNA samples were quantified with Quant-iTTM fluorimeter. Samples were diluted in sterile TE to final concentration of 40ng/μL for further PCR.

Amplication and sequencing
The ITS-rDNA region was amplified from genomic DNA using primers ITS1 (5' TCCGTAGGTGAACCTGCGG 3') and ITS4 (5' TCCTCCGCTTATTGATATGC 3') (White, Bruns, Lee & Taylor, 1990), and ITS1F (5' CTTGGTCATTTAGAGGAAGTAA 3') (Gardes & Bruns, 1993). The PCR reactions were carried out in a 50 μl final volume containing 40 ng of genomic DNA, 1x reaction buffer 100 mM Tris-HCl (pH 9.0), 2 mM of each dNTP, 3 mM of MgCl2, 5 μM of each primer, and 1 U of Taq DNA polymerase. Amplification reactions were performed using a thermal cycler model TC-412 (M.J. Research Inc., Waltham, M.A.) with an initial denaturation step at 94°C for 1 min, followed by 30 cycles at 94°C for 15 s, 55-58°C for 15 s and 72°C for 15 s, and a final extension cycle at 72°C for 7 min. PCR products were stained with Sybr® and resolved in 1.2% agarose gels. Band analysis was carried out with L-PIX Image EX Model (Loccus Biotecnologia, Loccus do Brasil, Cotia, SP, Brazil). After that, amplicons were purified with PureLink PCR Purification Kit (Invitrogen), according to the manufacturer's recommendations. Samples were sent for sequencing to the Centro de Estudos do Genoma Humano e Células-Tronco CEHG-CEL of the Universidade de São Paulo -USP, São Paulo state. Sequencing was carried out using the BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems, Foster City, USA) and run on ABI 3730 DNA Analyser.

Sequence analysis of ITS region
For the construction comparing the consensus sequence, genetic distance and phylogenetic tree, was used 7 sequences obtained from GenBank database NCBI -National Center for Biotechnology Information (Altschul et al., 1997). These sequences were selected because they were highly similar (99 to 100%) to the races involved in this study (Table 2).

Data analysis
Nucleotide sequences were assembled and edited by the alignment using the BioEdit software through Clustal W (Hall, 1999) and submitted to a search for similarity in GenBank by the Blast methodology using MEGA software version 7 (Kumar, Stecher & Tamura, 2016).
Multiple sequence alignment of ITS regions were performed using BioEdit version 7.2.5 (Hall, 1999). Neighbor joining (NJ) method was carried out for construction of phylogenetic tree (Saitou & Nei, 1987). The p-distance method was used to construct the genetic distance matrix (Nei & Kumar, 2000). Phylogenetic trees were drawn and edited using the MEGA 7 software. Nucleotide diversity (p) of ITS region was estimated in MEGA 7 (Nei & Kumar, 2000;Kumar et al., 2016). The GenBank data of related species were also included in the phylogenetic analysis.
Bootstrap values (with 10,000 iterations) were calculated using MEGA software version 7 (Kumar et al., 2016). Structure 2.3.4 program (Pritchard, Stephens & Donnelly, 2000) was used to cluster the sequence of the isolates based on the Bayesian model. To determine the optimal number of clusters, 10 independent runs of K=2-10 were conducted with the previously mentioned software. Each run had a burn-in of 10,000 interactions followed by 100,000 data-collecting interactions using the Markov chain Monte Carlo (MCMC) method. Structure Harvester program defined the optimal values of K using ΔK method (Evanno, Regnaut & Goudet, 2005;Earl & vonHoldt, 2012). The dataset included all races obtained in the present work and 7 sequences (controls) retrieved from GenBank.

RESULTS AND DISCUSSION
Two group-specific PCR primers and ITS-rDNA sequence analysis were applied for the detection and differentiation of 40 C. lindemuthianum isolates. All isolates exhibited an ITS1-5.8S-ITS2 sequence region fragment of approximately 600 bp. A pairwise nucleotide sequence comparison revealed that all C. lindemuthianum isolates analyzed shared 97-100% identity with each other as well as with other C. lindemuthianum ITS sequences deposited in GenBank.
The sequences were compared with some Colletotrichum spp. sequences retrieved from GenBank ( Table  2). As for the individual rDNA regions sequence sizes, the 5.8S rDNA gene sequence (varied from 168 to 350 bp) was found across all of the Colletotrichum spp. In general, the length of ITS1 and ITS2 spacer varied from 2 to 167 bp and from 351 to 536 bp, respectively. Among the analyzed sequences, ITS2 sequence region of C. lindemuthianum was more divergent than the ITS1 (Figure 2). Similar results were obtained by Balardin et al. (1999), who performed sequencing studies on ITS regions of C. lindemuthianum isolates collected from several parts of the world including Brazil.

Identification of SNPs in the ITS regions
Interestingly ITS regions revealed a high genetic variability with detection of 128 SNPs, 60 of them detected in ITS1 and 68 of them in the ITS2. Although some authors reported different findings related to ITS region divergence among Colletotrichum species and other fungi (Bunting, Plumley, Clarke & Hillman, 1996;Sreenivasaprasad, Mills, Meehan & Brown, 1996;Cooke & Duncan, 1997), our results are in agreement with previously studies that exhibited ITS2 as the most divergent region (Sherriff et al., 1994;Balardin et al., 1999). Length variation (bases pairs) was not detected in the ITS1 and ITS2 regions and, 5.8S gene of the 40 C. lindemuthianum isolates.
Interestingly, race 75 harbored a SNP in ITS2 region that was not previously described in others. For that reason it was considered the most divergent in comparison to all evaluated races and sequences retrieved from GenBank, Race 73 is a common Mesoamerican race in North, Central, and South America The first occurrence of the race 73 in Brazil was reported by Thomazella et al (2002b).
A comparison of ITS1 sequences of races 0, 9, 31 and 79 (Paraná) with races 67, 89, 101 and 105 (Santa Catarina) showed that a similarity among the SNPs detected Insertions C (position 157 bp) and A (position 158bp), and C→A substitution transversion (position 159bp) were detected, suggesting similarity between different races. The interaction C. lindemuthianum pathogen with the host cultivars and the different environmental conditions found in each region may result in a broad pathogenic variability (Talamini et al., 2006). We noticed that races 0, 9, 31 and 79 from Paraná and races 67, 89, 101 and 105 from Santa Catarina exhibited the same basic substitutions. The race 10 showed SNPs in the 5.8S gene, where T insertion (position 177 bp) and C→T substitution transition (position 191 bp) occurred. In addition, we noticed that race 283 also showed a G insertion at position 186 bp in the same region.
Sequences of race 2 and MAFF 305390 isolate also displayed SNPs. ITS1 region of race 2 showed G→T transversion (position 77 bp) and G→A transition (position 165bp), whereas ITS2 region revealed C→A transversion at position 515 bp. Isolate MAFF 305390 only had SNPs in the ITS2 region, characterized as T (position 446 bp) and C (position 471 bp) insertions and C→T transition (position 474 bp). Interestingly, Moriwaki, Tsukiboshi and Sato (2002) described similar variations in the ITS2 region of MAFF 305390 isolate. Besides that, Chen et al. (2007) compared ITS region sequences of C. lindemuthianum races (17, 23, 31, 73, 89 and 1096) and MAFF 305390 isolate (GenBank). The authors concluded that these sequences were identical.

Genetic diversity of C. lindemuthianum races
The genetic distance was measured between pairs of homologous correspondence sequences between the nucleotides (Pairwise Distance), using the simplest method to measure the distance between two pdistance sequences.
Genetic distance (Figure 4) among C. lindemuthianum races revealed close genetic identity, and all samples were clustered together with the reference sequence of GenBank. Despite that, race 10 (PR) was the most divergent since it showed the highest genetic distance values that ranged from 0.134 to 0.169, followed by races 73 (SC) and 283 (PR).
When we considered the distance between the races 10 (PR) and 73 (SC), we observed a genetic divergence of 0.139, whereas a pairwise comparison of races 10 (PR) and 283 (PR) showed a value of 0.169.
The performance of a genetic divergence analysis based on the sequences of our tested races and the retrieved sequences of C. lindemuthianum from Genbank, showed that race 2047 was the most divergent (values ranging from 0.021 to 0.056). However races 31 and 89 from Genbank revealed similar results with the ones obtained in the study. In addition, it has been shown that these races are able to overcome the anthracnose resistance mechanism present in the cultivar Cornell 49-242.
The molecular analysis of the races from each state revealed the genetic divergence ranged from 0.000 to 0.169 in Paraná, from 0.000 to 0.073 in Santa Catarina, and from 0.000 to 0.061 in Mato Grosso. We noted that the races originally from Paraná were the most divergent.
According to Rodríguez-Guerra, Ramírez-Rueda, Vega and Simpson (2003), variability in a single site could be explained by different factors, such as: mutation, sexual recombination, parasexuality or introduction of a new race in the local population. We used the neighbor-joining method to reconstruct phylogenetic relationships within C. lindemuthianum isolates, which revealed the formation of three clusters ( Figure 5). Group I was composed of five subgroups with the following races: 17,27,72,89,0,23,91,114,2,351,1,72,65,31,31,83,87,75,and MAFF 305390. In this cluster we observed that the race 75, 87 and MAFF 305390 were the most divergent. Differences among GenBank sequences of races 17, 23 and 31 did not occur, as a consequence, they were clustered in the same group. Besides that, races 65, 72 and 114, which are able to overcome resistance of Co-3 gene, were grouped closely.
Twelve isolates of Mesoamerican races and one from Andean race were allocated in the Group III (0, 283, 9, 67, 31, 89, 79, 105, 101, 2, 346, 10, and 73), which was further subdivided into four groups ( Figure 5). The first subgroup was composed of races 0, 9, 31, 67, 79, 89, 101 and, 105, because they exhibited the same SNPs in ITS1 region. Race 283 was the most divergent with 19 SNPs detected. Interestingly, two isolates of race 31 (originally from Paraná state) exhibited molecular variability in relation to the nucleotide sequence of race 31 (retrieved from GenBank). Another subgroup was formed by isolates of the races 10 and 73. As they exhibited the highest number of SNPS, they were considered the most divergent races in the whole study.

Population genetic structure
The Bayesian clustering analysis as implemented in Structure (Pritchard et al., 2000) and the delta K value, were used to identify the number of distinct populations, assuming admixture ancestry and correlated allele frequencies. Structure analysis based on the distribution of sequences suggested that 47 isolates of the C. lindemuthianum were divided into three distinguished clusters ( Figure 6). Figure 6. Inferred population structure of 40 isolates and 7 nucleotide sequences of C. lindemuthianum through sequencing of ITS regions. Each color represents one cluster, and the length of the colored segment shows the race's estimated proportion of membership in that cluster as calculated by Structure in a usual run at the K value of K=3.
Andean races 3, 7, 17, and 55 were allocated as admixture in the Cluster I, whereas Mesoamerican races were allocated in all clusters. In this work, admixture between Andean and Mesoamerican populations was observed ( Figure 6). GenBank nucleotide races 23, 31, 89 and MAFF 305390 were clustered in the same group (Cluster II).
Races 1, 3, 7, 10, 31, 55, 65, 67, 73, 75, 83, 91, 2 and 17 were allocated in two distinguished subgroups of Group I, since the variability in relation to the SNPs was detected. Besides that, Mesoamerican races were allocated in all Clusters, while Andean races were allocated in Cluster I and II. Mahuku and Riascos (2004) evaluated Andean and Mesoamerican isolates of C. lindemuthianum through repetitive DNA sequence patterns and not find genetic differences between C. lindemuthianum isolates.

Intra-race variability based on sequencing of the ITS1, 5.8S and ITS2 regions
Molecular polymorphism within similar virulence phenotypes was observed in this study. Previous studies examined C. lindemuthianum patotypes from different countries (Sicard, Michalakis, Dron & Neema, 1997;Balardin et al., 1999). These authors observed a similar molecular polymorphism as described by us.
Currently, data regarding molecular variability of C. lindemuthianum patotypes is scarce, except for some races. For example, Talamini et al. (2006), Davide and Souza (2009) and Coêlho et al. (2016) reported the presence of molecular variability between and within isolates belonging to race 65. Another investigation also revealed pathogenic variability of races 65, 73 and 81 (Santos, Antunes, Rey & Rossetto, 2008). These results indicate that this pathogen can exhibit high genetic variability. As a consequence, use of ITS region sequencing can be helpful for detection and identification of emerging races or sub-races.
The intra-race variability on pathogen population structure suggested independent evolution of specific virulence types such as races 0, 2, 31, 72, 73, 75, 83, and 89 in different geographic regions. We observed that races 0, 2, 31, 72, 73, 75, 83, and 89 exhibited intra-race molecular variability. This fact suggests that a specific host-pathogen interaction occurred, which contributed to a lack of geographical association and presence of molecular polymorphism in the rDNA of C. lindemuthianum.

CONCLUSIONS
The detection of polymorphism among the physiological races of C. lindemuthianum are necessary to better understand the dynamics of this pathogen in the regions of common bean cultivation, mainly in the states of Mato Grosso, Paraná and Santa Catarina. Considering that the ITS regions can vary intraspecifically in the sequence of bases, they are appropriate to discriminate the possible variations within and among the population of the pathogen. This information is of great relevance since the molecular diversity conferred through the ITS regions in the identification of SNPs can collaborate for a better understanding of the hostpathogen relationship, in the search for the development of new resistant cultivars.
Due to the genetic variability, ITS region sequencing is a promising methodology, as it shows a high rate of evolution and these regions are typically species specific. We observed in this study that ITS2 region revealed the highest genetic variability in 47 isolates of C. lindemuthianum. These results suggest that sequence analysis of ITS rDNA regions might be a valuable tool for identification of this pathogen through design of specific primers.