A comprehensive in silico analysis, distribution and frequency of human Nkx2-5 mutations; A critical gene in congenital heart disease

Introduction: Congenital heart disease (CHD) affects 1% to 2 % of live births. The Nkx2-5 gene, is known as the significant heart marker during embryonic evolution and it is also necessary for the survival of cardiomyocytes and homeostasis in adulthood. In this study, Nkx2-5 mutations are investigated to identify the frequency, distribution, functional consequences of mutations by using computational tools. Methods: A complete literature search was conducted to find Nkx2-5 mutations using the following key words: Nkx2-5 and/or CHD and mutations. The mutations were in silico analyzed using tools which predict the pathogenicity of the variants. A picture of Nkx2-5 protein and functional or structural effects of its variants were also figured using I-TASSER and STRING. Results: A total number of 105 mutations from 18 countries were introduced. The most (24.1%) and the least (1.49%) frequency of Nkx2-5 mutations were observed in Europe and Africa, respectively. The c.73C>T and c.533C>T mutations are distributed worldwide. c.325G>T (62.5%) and c.896A>G (52.9%) had the most frequency. The most numbers of Nkx2-5 mutations were reported from Germany. The c.541C>T had the highest CADD score (Phred score = 38) and the least was for c.380C>A (Phred score=0.002). 41.9% of mutations were predicted as potentially pathogenic by all prediction tools. Conclusion: This is the first report of the Nkx2-5 mutations evaluation in the worldwide. Given that the high frequency of mutation in Germany, and also some mutations were seen only in this country, therefore, presumably the main origin of Nkx2-5 mutations arise from Germany.


Introduction
Congenital heart disease (CHD) is the most common defect in heart structure 1 that occurring 1%-2% of live births and 10% of abortions. 2,3 In spite of numerous research studies aiming to detect CHD reasons, the exact etiology of this disease is still obscure. The past decades studies estimated that chromosomal abnormalities and single gene disorders result in 8% of CHDs. 1 Some transcriptional factors regulate cardiac development including GATA binding protein 4 (GATA4), T-box transcription factor (TBX) and NK2 home box . These are as CHD prime causes, with topping the list of Nkx2-5. 4 The Nkx2-5 gene, a highly conserved gene from Drosophila to humans, is located on chromosome 5q35 and contained two exons. It is known as the significant heart marker during embryonic evolution and it is also necessary for the survival of cardiomyocytes and homeostasis in adulthood. The most of Nkx2-5 mutations have been observed in CHD cases, including tetralogy of Fallot (TOF), ventricular septal defect (VSD), atrial septal defect (ASD), and transposition of the great arteries (TGA). 5 Studies in human and animal models indicated Nkx2-5 expression only in cardiac tissue, thereby emphasizing its significant role in heart development. Mice which lacked even one copy of Nkx2-5 gene, represented various heart abnormalities. Furthermore, it has been observed that Nkx2-5 is involved in postnatal heart protection. 1 Any changes in the genes, especially critical genes in a specific pathway, have a significant effect on the health. Evaluation of Nkx2-5 mutations can prepare early diagnosis of a variety of CHDs. These mutations display #Equally first authors.

In silico analyses
The functional and pathogenicity consensus of mutations were predicted using computational tools such as Mutation Taster, 10 Sorting Intolerant From Tolerant, version 6.2.1 (SIFT), 11 Polymorphism Phenotyping, version 1.03 (Polyphen2), 12 PROVEAN, version 2.0.23 13 and combined annotation dependent depletion (CADD), version 1.3. 14 SIFT interprets results using the TrEMBL (version 34. 3) and Swiss-Prot (version 51.3) and classifies mutations as deleterious (<0.05) and/or tolerated (≥0.05). 11 PolyPhen2 predicts the impact of an amino acid change on protein structure and function by applying protein 3D structure and multiple sequence alignment. It classifies mutations as possibly damaging, probably damaging, or benign. 12 The PROVEAN also classifies mutations as deleterious or natural. These tools evaluate the functional consequences of mutations at five principal levels; protein stability, posttranslational, translational, transcriptional, and splicing. The protein FASTA sequence (NP_004378.1) of selected mutations was used as the input file of these prediction tools.
Protein structure prediction I-TASSER (Iterative Threading Assembly Refinement) was applied for evaluating the Nkx2-5 protein structure/ function resulting from the mutation with the most frequency. It is a platform which generates some models of query protein applying state-of-art algorithms. The quality of protein features prediction, is judged with some scores; C-score (confidence score), models with C-score >-1.5 have correct fold; TM-score (template modeling score), the value range is in [0, 1], 15 a higher score displays a better structure; and RMSD, with the range similar to TM-score, determines accuracy of the model. I-TASSER also predicts solvent accessibility with the range from 9 (highly exposed) to 0 (buried) residue position. 16

Protein network prediction
The STRING database version 10.5 17 was used for describing of the proteins which have interaction with Nkx-2.5 protein. This database provides a useful evaluation of protein-protein associations, including physical and functional interactions.

Literature review
A total number of 59 articles were surveyed. We found 105 mutations (Table 1) containing 80 missenses, 12 deletions, 3 insertions, and 10 nonsenses. These mutations were documented from 18 countries, which among them, America, Germany and China had the most number of Nkx2-5 mutations, respectively ( Figure 1). We obtained some significance information from these searches including: mutation features according to DNA and protein sequences, CHD type related to any mutation, the number of reported affected cases harboring specific mutation, the total number of studied individuals, and the place where study performed there. The number of studied individuals were as 2230, 1199, 2827, 335 and 146 individuals in America, Europe, Asia, Africa and Australia, respectively. 30 mutations in America (c.533C>T with the most frequency), 49 in Europe (c.896A>G with most frequency), 46 in Asia (c.738T>A with most frequency) and 2 mutations both in Africa and Australia have been found. Moreover, the frequency of Nkx2-5 mutations was as 4.12% in America, 24.1% in Europe, 6.15% in Asia, 1.5% in Africa and 2% in Australia. The location of Nkx2-5 mutations was illustrated in Figure 2.

Frequency and distribution of the mutations
The c.73C>T was detected in 8 countries including: America, Spain, Brazil, Italy, Germany, Korea, Lebanon and Turkey. The c.533C>T was also observed in 4 countries like America, Germany, Japan and Australia. These findings indicate that the distribution of c.73C>T and c.533C>T mutations are more than other Nkx2-5 mutations in worldwide. c.325G>T (62.5%) and c.896A>G (52.9%) had the most frequency, although the distribution of them was only in Germany. The most numbers of Nkx2-5 mutations were reported from Germany, among them, c.896A>G (94.5%) and c.547A>G (42.6%) were the most common mutations in this country. The frequency of the Nkx2-5 mutations in continents were indicated in Figure 3. The frequency of mutation in studied patients

Bioinformatics
Computational analyses of the 105 mutations, predicted pathogenic effect for most of them (

Prediction of the normal and mutant models
Five structural/functional models of normal Nkx2-5 protein were obtained by I-TASSER as an output. We selected the structure with the highest scores, C-score: -4.50, TM-score: 0.25±0.07 and RMSD: 17.8±2.5Å. Moreover, we captured the three-dimensional models of mutant protein p.R25C generating by I-TASSER and selected the structure with the highest scores, C-score: -3.38, TM-score: 0.34±0.11 and RMSD: 14.6±3.7Å. The result assessing showed the solubility of mutant protein was reduced in comparison with normal protein but the protein structure was the same Indeed, solvent accessibility was predicted both native Arginine residue with score of 6 and variant Cysteine residue with score of 4 as buried exposed (Figure 4).

Discussion
CHD is the most common birth defect in worldwide. 71 Although there are several important genes which play important role in the CHD etiology, but the Nkx2-5 is topping the list. Nkx2-5 is one of the master transcription factors of heart development that regulates cardiac ion channels. 72,73 This gene was identified as the first gene involved in CHD by genetic association studies in large families. 20,74 We determined the frequency/distribution of Nkx2-5 mutations and evaluated these mutations by using computational tools (Mutation Taster, SIFT, Polyphen2, PROVEAN and CADD). This protein consists some conserved regions: DNA binding home domain (HD), peptide conserved TNdomain near the amino acid terminus and NK2-domain located c-terminal to the HD. Studies have demonstrated that HD domain has critical role in DNA binding, interaction with other proteins and transcriptional regulation. 75,76 In the majority of reported cases, the variant is a missense mutation (33 missense mutations) located within the HD domain of the Nkx2-5 gene ( Figure  2). Moreover, the most common CHD types resulting from Nkx2-5 mutation were ASD and VSD.
In the past years, in silico analyses as an efficient tool has classified variants as being neutral or lethal. Both SIFT and Polyphen2 are the most common in silico prediction tools applied in diagnostic laboratories. The    approach which was used to classified variants as "Benign" or "Pathogenic" according to combined predictions from the five computational tools. This approach was applied to ensure all likely pathogenic variants of Nkx2-5 gene would not be missed. In current study, we could determine high confidence information regarding the effect of amino acid change on Nkx2-5 structure/function applying solely computational tools. The present work is the first attempt to asses all mutations of the Nkx2-5 gene and overall, we reported 105 variants of Nkx2-5 gene. Among them, c.380C>A variant was predicted to be benign by SIFT, Polyphen2 and PROVEAN tools but disease causing by Mutation taster. However, the low CADD score (Phred score= 0.002) confirmed that it can be a polymorphism. The c.541C>T had the highest CADD score (Phred score=38) and was only observed in England. Given this information, it can be deduced that c.541C>T is a mutation  with founder's effect which resulting in ASD in England.
The highest frequency Nkx2-5 mutations, about 99.6%, was revealed in Germany. It means that more association studies in this country can discover more new mutations in this gene. Given that the c.73C>T (p.R25C) was distributed in all continents, it seems to be a hotspot position. Although HGMD documented this mutation as a pathogenic variant, but the 1000 Genome and ExAC databases reported it as a heterozygous form. Moreover, most of the software's predicted it as a disease causing variant. c.325G>T and c.896A>G were observed with the high frequency just in Germany, this indicates a founder effect of these mutations. c.325G>T is a nonsense mutation which generates a truncated protein, while c.896A>G was predicted as a benign variant by several prediction tools, it was reported as a pathogenic variant in HGMD and not registered in 1000 G and ExAC. Among Nkx2-5 network proteins, BMP2 protein is a cardiac factor which elicits expression ectopically of the heart markers GATA4 and Nkx2-5. It plays critical role in myocardial differentiation and regulation of proliferation. The Nkx2.5 has binding site in SMAD4 enhancer, thus BMP2 activity is needed for heart progenitor characteristics. 77 TBX5 and Nkx2-5, both of them operate as co-activators for GATA4 pathway activity, any changes in these proteins can disrupt cardiac septation. 78 This study indicates that the frequency and distribution of Nkx2-5 mutations is more in Europe ( Figure 3) and Asia and America are standing in next steps. This result shows that maybe the source of Nkx2-5 mutations was from Europe and by migration transferred to other continents. Finally, it should be noted there are many modifier factors in Nkx2-5 pathway which might affect on manifestation resulting from Nkx2-5 mutations. Large population studies, variants frequency assessment in both of normal and patient population, functional study of mutations by animal models and evaluate expression level arising from mutations, can improve our understanding in this category.

Conclusion
Here, in silico analyses and structural model of Nkx2-5 have been submitted for the first time. Nkx2-5 plays critical role in embryonic cardiac development and has several important mutable regions with high distribution. Our results indicated that Nkx2-5 gene can be a significant candidate for CHD etiology investigation. Bioinformatics approaches permit large numbers of variants to be evaluated at the same time and predicted effects of all variants at the protein level. It should be noted that our information are according to database, however, any pathogenic variants should be experimentally confirmed. Regarding that some mutations observed only in Germany, also high frequency and diversity of mutations in this country, it seems Nkx2-5 has a significant role in this part of the world.

Competing interests
None.

Ethical approval
The study is performed in accordance with the Helsinki Declaration and has been approved by the