SNP marker association for incrementing soybean seed protein content

Soybean seed protein content (SPC) has been decreasing throughout last decades and DNA marker association has shown its usefulness to improve this trait even in soybean breeding programs that focus primarily on soybean yield and seed oil content (SOC). Aiming to elucidate the association of two SNP markers (ss715630650 and ss715636852) to the SPC, a soybean population of 264 F5-derived recombinant inbred lines (RILs) from a bi-parental cross was tested in four environments. Through the single-marker analysis, the additive effect (a) and the portion of SPC variation due to the SNPs (r2) for single and multi-environment data were assessed, and transgressive RILs for SPC were observed. The estimates revealed the association of both markers to SPC in most of environments. The marker ss715636852 was more frequently associated to SPC, including multi-environment data, and contributed up to a = 1.30% for overall SPC, whereas ss715630650 had significant association just in two locations, with contributions of a = 0.76% and a = 0.74% to overall SPC in Vic1 and Cap1, respectively. The RILs 8413 was classified as an elite genotype due to its favorable alleles and high SPC means, which reached 53.78% in Cap1, and 46.33% in MET analysis. Thus, these results confirm the usefulness of the SNP marker ss715636852 in a soybean breeding program for SPC.


INTRODUCTION
Soybean [Glycine max (L.) Merr.] is a major crop widely cultivated worldwide, and its importance is mainly assigned to its seed protein content (SPC), denoting the relevance of this crop for human and animal nutrition, as well as economy and world food security. SPC is quantitatively inherited and correlates negatively with most of the main traits taken into account in a soybean-breeding program (Kwon & Torrie, 1964). It is complex to elevate the percentages of SPC, once the selection towards grain yield and seed oil content (SOC) has been prioritized when compared to protein increment (Bandillo et al., 2015;Patil et al., 2018). For example, SPC of ancestral soybean cultivars and early releases declined from 40% to 37% during the period of 1924 to 2004 at the United States, whereas grain yield was incremented steadily during this same time (Mahmoud et al., 2006).
Promising methodologies are being proposed to increase soybean SPC without compromising grain yield, and efforts concentrate in gathering adequate specific allelic combinations for performing marker assisted selection. By using the marker information within or in proximity to important protein-related QTLs (Quantitative Trait Loci), strategies involving the validation of molecular markers in structured populations have been combined with traditional breeding methods in order to deliver more rapid genetic gains in many soybean traits (Jun et al., 2008).
In view of the soybean protein demand, several markers have been described to have an important effect over SPC on the 20 soybean chromosomes (https://www.soybase.org/). Zhang et al. (2015) validated SSR (simple sequence repeats) markers followed by two cycles of marker-assisted selection, where parent lines were outperformed in 9% regarding SPC of the selected progenies. Rodrigues et al. (2010) performed a single-marker association analysis in two SSR markers located at D1a soybean linkage group, where 5.57% of the SPC phenotypic variation was attributed to the marker satt408.
Gains in SPC are complex for soybean breeding programs and fundamental to supply the global demand for vegetable protein. Therefore, this study aimed to evaluate the effects of two single nucleotide polymorphisms (SNPs) over soybean SPC in a single and multi-environment (MET) approach.

Plant material and field trials
In this study, 264 F5-derived soybean recombinant inbred lines (RILs) were obtained from a single cross between PMQS12 and PMQS80. The parent material belonged to BioAgro soybean breeding program germplasm of Universidade Federal de Viçosa. SPC of PMQS12 and PMQS80 averaged 45.7 and 46.4% under greenhouse conditions, respectively. The population was advanced through the single seed descent (SSD) method under greenhouse environment up to F5 generation and their derived seeds were consistently used for all phenotyping trials. The RILs population, parent lines and the checks ANSC83022, M7739 and M8372 were tested in four environments in southeastern Brazil, Viçosa (20°45'14"S -42°52'55"O, 649 m of altitude), and Capinópolis (18°40'55"S -49°34'12"O, 530 m of altitude), both at Minas Gerais state, in 2017 and 2018 crop years. The field trials were set under randomized complete block design with two replicates. Each plot consisted in a 1m row spaced 0.5m apart, with 15 plants as final plant density per plot. The crop management from planting to harvest was performed in accordance to Sediyama et al. (2015).
The soil of the experimental sites comprised red-yellow dystrophic latosol with clayey texture, and dark red eutrophic latosol with medium texture, at Viçosa and Capinópolis, respectively (Santana & Moura Filho, 1978). The climatic conditions within the experimental seasons (i.e. average temperatures and rainfall) for both locations can be checked at http://www.inmet.gov.br/. The environments Viçosa in 2017, Viçosa in 2018, Capinópolis in 2017, and Capinópolis in 2018 were referred as Cap1, Cap2, Vic1 and Vic2, respectively.

Phenotyping
After manual harvesting of all plants per plot, a 30g random seed sample was collected, milled and the SPC was assessed through a near infrared spectrometry equipment (Thermo Fisher Antaris II FT-NIR) similarly to the studies by Rodrigues et al. (2014). The SPC values were converted to dry basis, as it follows:

Genotyping
A leaf disk was sampled on a single plant at V4 stage of every RIL, and the DNA was extracted according to Dellaporta et al. (1983). After nucleic acid quantification by NanoDrop spectrophotometer, the DNA was diluted to 10 ng.µL -1 for SNP genotyping. SNPs located in proximity to SSRs, which were previously used in the BioAgro-UFV program for marker assisted SPC breeding, were chosen. Those polymorphic between PMQS12 and PMQS80 were selected to carry out this study ( Table 1). The SNP genotyping followed the KASP methodology developed by Biosearch Technology (https://www.biosearchtech.com), and was performed using the Applied Biosciences 7500 equipment. The amplification reaction comprised 1 cycle of 94 ºC for 15 minutes; 10 cycles of 94 ºC for 20 seconds, with a gradient of 61-55 ºC, decreasing 0.6 ºC every 60 seconds; 30 cycles of 94 ºC for 20 seconds and 55 ºC for 60 seconds; and a 37 ºC cycle for 60 seconds. Each reaction consisted of 2.5 µL of DNA at 10 ng.µL -1 , 2.5 µL of 2x Master Mix and 0.14 µL of Primer Mix. Allelic discrimination was performed in the AB 7500 v. 2.3 software.

Statistics
From the phenotypic data, the individual and MET analyses were performed according to the following models: y = + Z1g + Z2b + , [model 1 for individual analysis] Table 1. Summary of SNP markers used for the association study based on the information available at https://www.soybase.org/. These SNPs are part of the SoySNP50K array (Song et al., 2013  where is the vector of observed data within environment; is the mean; is the random effect of genotypes (RILs), g ~ NID (0, 2 ), where 2 is the genetic variance; b is the random vector of blocks, b ~ NID (0, 2 ), where 2 is the block variance. 1 and 2 are the design matrices for and , respectively; and where is the vector of observed data for the t trials; is the mean; is the fixed vector of trials; is the random effect of genotypes (RILs), g ~ NID (0, 2 ), where 2 is the genetic variance; is the random effect of genotype by environment interaction, ge ~ NID (0, 2 ), where 2 is the variance of genotype by environment interaction; and is the random vector of residuals, ~ NID (0, 2 ), where 2 is the residual variance. , 1 and 2 are the design matrices for , and , respectively. From both models, broad sense heritability was assessed as ℎ 2 = 1 − ̅ 2̂2 , where ̅ is the average variance of pairwise differences between the best linear unbiased predictions (BLUPs) of effects. SPC generated from RILs, parent material and checks were compared. The marker analysis was based on the single-marker association (Schuster & Cruz, 2008) through the following linear regression model: where is the predicted mean for the genotype from individual and MET data (models 1 and 2); 0 is the intercept of regression or the mean, 1 is the marker additive effect, and is the random vector or residuals. 1 is the design matrix for coding genotypes as 1 1 = 2, 1 2 = 1, 2 2 = 0. The coefficient of determination ( 2 ) obtained from the regression analysis denotes the proportion of the SPC variance due to SNP. The additive effect ( ) of an associated marker was obtained as the difference between the average of the individuals with the favorable allele and the average of the individuals with the unfavorable allele. The genetic effects were tested by the Likelihood Ratio Test (LRT; Rao, 1973) and the normal distribution of the data was verified through Lilliefors test (Razali & Wah, 2011). The precision of the experiments was assessed in concordance with Rodrigues et al. (2010), by using the coefficient of variation, presented in percentages, as it follows: represents the standard deviation of residuals for each experiment, and ̅ as the average calculated from the observed values for single-environment analyses or average from the predicted genetic values from multi-environment analysis. The statistical analysis were performed at GENES (Cruz, 2013) and R (R Core Team, 2019) softwares.

RESULTS AND DISCUSSION
Soybean SPC field data Figure 1 shows that SPC followed a normal distribution for all single-environment and MET analyses (p>0.05), presenting frequencies of the data according to the expected pattern for a quantitative inherited trait. The occurrence of genotypes with SPC values above 50% and below 40% was observed. However, the RILs average SPC are barely the same as their parent material. Cap1 and Cap2 presented the highest values for protein overall, being 47.80 and 47.60% respectively. In order to compare the environments, the SPC means of 10% superior RILs were 52. 84, 51.49, 49.39, 48.63, and 47.84% for Cap 1, Cap2, Vic1, Vic2 and MET dada, respectively. Cap1 also presented the highest frequency (18.18%) of transgressive RILs above 50% of SPC, in contrary to Vic2 and MET data that presented 1.13 and 0%.
The means of parent lines and RILs outperformed the checks in all trials, as previously expected. The LRT analysis showed significant differences (p<0.01) for the variance component of genotype effect (̂2) for both single environment and MET data ( Table 2). The broad sense heritability values were similar and ranged from 70.79 to 76.82% for individual trial data, with the superior and inferior estimates for Vic1 and Cap2, respectively. For the MET analysis, broad heritability estimate was moderate, corresponding to 53%. All % values were below 10% for all field trials, similar to the results obtained by Rodrigues et al. (2010) in a study of QTL mapping for seed protein content in soybean.
The data analyses denote a good field data quality, and reveal a satisfactory precision in terms of % and normality of data, as well as in the findings by Rodrigues et al. (2010) and Rodrigues et al. (2014). The F5-derived RILs population used in this study had enough recombination cycles that resulted in transgressive genotypes for both superior and inferior SPC means, especially towards superior ones for Capinópolis location. Genotypes with SPC means above 50% were more frequent in Cap1 and Cap2 environments. Viçosa and Capinópolis experimental sites contrast in terms of weather conditions. Capinópolis is warmer especially during soybean growing season and SPC is favored by elevated temperatures. This same fact has been addressed by Piper and Boote (1999) and Patil et al. (2017), where they revealed increases in soybean SPC under elevated temperatures. The weather data from these locations can be downloaded as abovementioned in the materials and methods section.
In Vic2, the SPC transgressive segregation range observed in our research surpassed the one by Zhang et al. (2015), which ranged from 35.89% to 49.10%, irrespective to the fact of one more recombination cycle in the NJRSXG population of F6-derived RILs in their study. The same remark is done when comparing to the study from Rodrigues et al. (2010), where SPC varied from 32.2% to 44.5%. Despite the use of contrasting SPC parent lines, Rodrigues et al. (2010) evaluated a F2:3 soybean population, in which the lack of recombination cycles may have costed the appearances of more transgressive genotypes. Thus, the superior transgressive genotypes means of this study suggest that PMQS12 x PMQS80 F5-derived RILs population is an elite germplasm for SPC selection. Furthermore, checks were outperformed in an average of 5.40% in all trial scenarios, which is inferior to the results reported by Warrington et al. (2015), where F5derived RILs mapping population exceeded the cultivars in a rate of 7.50% for SPC.
The three checks used in the present study are modern cultivars widely adopted by soybean growers in Brazil and correspond to a diverse range of maturity group. Their SPC estimates bellow to the ones from RILs population are due to the plant improvement process that prioritized higher yields and may have lost favorable alleles for SPC through breeding. The genetic correlation between SPC and yield was -0.58 in the research by Kwon and Torrie (1964). Likewise, Yesudas et al. (2013) reported a positive correlation between yield and SOC, and a negative correlation between yield and SPC. SOC means were assessed and will be mentioned briefly in the next section.
Genetic variation is responsible for the largest portion of the phenotypic variance in the population, and the significant variance estimate for genotype by environment interaction depicts the changes in performance of genotypes in terms of SPC depending on the environment (Table 2). All SPC broad sense heritability estimates were lower than the ones presented by Patil et al. (2018). In their research, SPC data from four environments were presented and the broad sense heritability values were around 90%. Instead, our individual trial means and broad sense heritability results are similar to the ones presented by Zhang et al. (2015) and fits perfectly to those from Hwang et al. (2014) that associated molecular markers with soybean quality traits, including SPC. Our MET analysis delivered lower broad sense heritability estimates than the ones from single analyses, exactly the contrary as Zhang et al. (2015) reported in the MET data. In the present study, the genotype by environment interaction consumes the overall genetic variance and results in decreases in broad sense heritability, as described by Kang (1997).

SNP marker association
The two markers tested in this association study showed their usefulness for increasing SPC in most of the studied environments (Table 3). The SNP marker ss715630650 was significantly associated with SPC in Vic1 and Cap1 environments. However, ss715636852 was associated in Cap1, Cap2 and Vic1 environments, and MET data. The total variation in SPC explained individually by these markers ranged from 1.65% (ss715630650) to 6.11% (ss715636852). The direct contribution of the additive effect of these markers to the SPC the additive effect varied from 0.74% to 1.30%, which corresponds to 1.54 and 2.77% of the overall SPC mean for ss715630650 and ss715636852, respectively. The only cases in which the two markers had significant influence over SPC was at Vic1 and Cap1 environments. The simultaneous additive effect of the favorable alleles of these markers over SPC in Vic1 and Cap1 were similar and accounted for 2.01% and 2.04, respectively.  Table 4 presents the genotypes with superior and inferior SPC means when applying 2% of selection intensity, as well as their respective the alleles for ss715630650 and ss715636852. Among the selected genotypes from each environment, eight appeared at least in two environments. The RIL 78-43 was selected at Vic1, Cap2 and MET data. Out of the five superior selected genotypes in each data set, at least four of them had the favorable allele for ss715636852.
In Cap1, all of the genotypes had that superior allele. Among the lower ones, the unfavorable allele (guanine) appearances became more frequent. For Vic1 and Cap1, where the marker ss715630650 was associated, the unfavorable allele (guanine) was present regardless the magnitude of the means. However, not all the superior genotypes presented the two favorable alleles, in which the absence of ss715630650 was remarkable.
Although this research involves a very low amount of associated markers, it is possible to observe the importance of the complementarity of parents to result in high SPC populations. The favorable alleles of ss715630650 and ss715636852 were donated by PMQS12 and PMQS80, respectively. Parent lines that donated favorable alleles with higher effects over SPC than the ones addressed in this study were used by Warrington et al. (2015). In their research, Danbaekkong was the parent line that contributed with most of the favorable alleles. This line accounted for 55% of SPC variation and was able to increment 13.64% of SPC in terms of additive effect. The QTL responsible to for this occurrence is qProt_Gm20 at chromosome 20, located at least 35 Mbp apart of ss715636852. The relevance of chromosome 20 in soybean seed content traits is once again highlighted by this research, as well as described by Patil et al. (2018), who mapped haplotypes in the region comprised between 28.5 and 33.5 Mbp of chromosome 20, being considered a hotspot for protein content. This interval overlaps haplotypes identified in the QTL mapping and genome wide association studies by Bolon et al. (2010), Hwang et al. (2014), Vaughn et al. (2014), and Bandillo et al. (2015). Table 3. SNP marker association estimates for seed protein content (SPC) obtained through the single marker analysis for single-environment (Vic1, Vic2, Cap1, and Cap2) and multi-environment (MET) approaches. Nonetheless, the stability of % 2 and estimates throughout the environments and MET data suggests that a special attention should be given to ss715636852 for its consistent results. In environments where the two SNPs had significant association, the increment in SPC was greater than in the presence of either one of them alone. From that, we can infer that ss715630650 and ss715636852 may have distinct functions in the SPC metabolic pathway. Therefore, the marker association study to SPC should be performed previously in order to discard markers that show redundant contributions to the trait.

SNP
The lack of association of any marker to Vic2 environment can be assigned to the sowing date in late December 2018, in contrast to the sowing in early October for the other environments. Then, the genotype by environment interaction can lead to differential response of genes and its respective markers to sowing date.
The minor effect of ss715630650 can be verified in the Table 4, where its favorable allele at Vic1 and Cap1 was present at the inferior genotypes but absent in the superior ones. Zhang et al. (2015) also obtained high SPC progenies with minor effect favorable alleles when compared to other progenies in the population. This fact is justified by the interaction of the associated markers with the environment, as well as the performance of other markers associated to SPC, including those with irrelevant effect. That is to say, the additive effect of each associated marker can change according to different environments. Then, the changes in ranking genotypes according to SPC means and the lack of association of specific markers for some of the environments confirms an interaction between RILs with environments (significant ̂2 estimates, Table 2), and suggests the interaction of the SPC associated markers with environments. Patil et al. (2018) justified a similar fact in their study due to the complex nature of a quantitative trait, in which its interaction with the environment implies instability of QTLs controlling soybean seed composition.
The top ranked RILs for SPC that carried favorable alleles had about 4% less SOC than the checks. In addition, the ones that had no favorable allele for SPC had about 2% less SOC. It can be considered that ss715630650 and mainly ss715636852 are not to recommended simultaneous selection as suggested by Singh (2017) and Li et al. (2017). Hence, for breeding programs seeking to increase soybean SPC and SOC simultaneously, ss715630650 and ss715636852 are not recommended.
The co-segregation and complexity in separating high SPC from low SOC evidence the effects of major pleiotropic genes (Hwang et al., 2014;Wang et al., 2014;Patil et al., 2018). However, it is possible to achieve a range of 41-43% and 20-22% for SPC and SOC respectively, especially in cases where the genes coding for each trait are tightly linked (Zhang et al., 2018). In such case, the recombination cycles for obtaining RILs, as well as further marker association studies are essential in identifying favorable alleles that can ease the breeding process. Beyond those values, pleotropic genes play a large role over these traits hampering the simultaneous selection. It reinforces the fact that simultaneous selection for both traits is complex and limited to specific situations (Zhang et al., 2018). Table 4. Seed protein content (SPC) means of transgressive genotypes under 2% of selection intensity for single-environment (Vic1, Cap1, and Cap2) and multi-environment (MET) data. The environment V2 is absent due to lack of significant marker association. The colored cells represent the favorable allele for increasing SPC (A: adenine, G: guanine). The blank cells represent the lack of association between marker and SPC at the environment. The superior genotypes are presented in the first five rows, and the inferior ones in the last five rows.
Among the 264 RILs, the RIL 84-13 was considered an elite genotype once it presented a good field performance for SPC (53.78, 49.01, 47.56, 46.61, and 46.33% for Cap1, Cap2, Vic1, Vic2, and MET, respectively) and can serve as parent material to be destined to crosses for obtaining transgressive lines. In contrary, the remaining SPC lines containing both favorable alleles must undergo to a pre-breeding process to become more adapted to possess the essential agronomic traits. Similarly, ss715636852 must have its immediate use recommended in soybean breeding programs for increasing SPC. This marker can contribute to higher genetic gains.

CONCLUSIONS
This work was useful for the association of markers that have impact on SPC and for the identification of superior RILs. The significant association of ss715636852 and its additive effect contribute to incrementing SPC in the process of marker-assisted breeding. On the other hand, the use of ss715630650 did not result in statistically detectable increment in SPC. The results showed the differential performance of genotypes in respect to the different environments, which hinder selection in breeding programs for SPC. This research do not intend to be dogmatic and recommends a previous association of ss715630650, ss715636852 or any other marker prior the marker assisted breeding process. Therefore, the use of ss715636852 was beneficial for increasing SPC.