Amit Kumar Banerjee,Neelima Arora,Upadhyayula S.N. Murty*
Bioinformatics Group,Biology Division,Indian Institute of Chemical Technology,Tarnaka,Hyderabad-500007,Andhra Pradesh,India
The present study provides substantial evidence of variation among the ITS2 regions of the sample species both at sequence as well as RNA secondary structure level. A bioinformatics approach was exploited to reveal the mystery of conservation and variation in ITS2 in mosquito genera. Multiple sequence alignment revealed conservation within the species and across species. Dimer frequency of all the species showed variance in conservation according to the type of dimer. RNA secondary structures were generated for all the species. To obtain a deep insight into structural similarity and divergence, parameters like structural energy, number of stems, G-C, A-U, G-U pairings were calculated. Out of five types of loops derived by Sribo Program, hairpin and exterior loops were found to be fairly more conserved rather than interior, bulge and multi loops. Tandem repeats with high degree of variation were observed in these specific regions. The unexpected observation was the striking proximity of Aedes longfilamentus and Anopheles culicifacies in the phylogenetic tree where they shared same clade instead of other species of Aedes genera. This finding is quite appealing and intriguing being contrary to the expected outcome and requires further rigorous investigation. These results indicate towards the major variable and conserved regions as well as the stretches of sequences and structural parts where selection pressure varies during ribosome biogenesis.
ITS2,Phylogenetic marker,RNA Secondary structure,Mosquito,Anopheles,Aedes.
Perilous mosquito-borne diseases like Malaria,Chikungunya and Dengue put a massive social and economic burden globally. In the wake of reemergence of mosquito-borne diseases,studies at the molecular level have gained momentum during past decades. Out of 450 Anopheline species known,only 30-40 species act as vectors for malaria while the proportion of Aedes species acting as vector is still less in comparison. Anopheles and Aedes mosquitoes are extensively distributed throughout the globe . An. culicifacies,An. fluviatilis,An. minimus and An. sundaicus are predominant in the Indian peninsular region. Anopheles farauti,An. gambiae and An. darlingi are widely spread in Australia,Africa and Latin America respectively. Anopheles nuneztovari is predominant in diverse geographical locations of Latin America . Aedes species are also distributed through out the world and lot of studies have focused on the taxonomy of these species globally in the recent past .
Two internal transcribed spacers (ITS) occur in eukaryotic organisms namely ITS1 and ITS2. ITS1 and ITS2 are flanked by 18S gene and 5.8S gene,5.8S gene and 28S gene respectively . ITS2 is regarded to be more conserved than ITS1 owing to presence of numerous tandem repeats. The ITS2 region is now the most widely sequenced DNA region in mosquito genera of Anopheles,Culex and Aedes . Complete ITS2 sequences of nine species each of Anopheles and Aedes genera from diverse geographical locations were considered for this study.
The ITS2 portion of ribosomal DNA is considered as extremely conserved and species-specific. Among the hugely considered molecular markers for mosquito taxonomy [5-6],ITS2,part of ribosomal DNA,is exploited extensively for distinguishing the closely-related species. This marker is extensively used in Anopheline complexes like,An. maculipennis complex,An. quadrimaculatus complex  and An. culicifacies complex . Research targeting the ITS2 region for species classification and phylogenetic analysis is gaining impetus globally  . In the last decade,ITS2 regions of several Neotropical Anophelines have been sequenced and dumped in the GenBank database. Therefore,the conservation and variation comparison of the ITS2 sequences across the species as well as genera is the prime objective of this study.
ITS 2 sequences of 18 mosquito species belonging to diverse geographical locations,9 each from Aedes and Anopheles genera,retrieved from Genbank were investigated. The species considered for this study along with the NCBI accession number and lengths (bp) are shown in the table below (Table 1).
2.2 Sequence alignment
Multiple sequence alignment was performed using CLUSTALW  with a gap opening penalty of 15 and gap extension penalty of 6.66. The result of ClustalW analysis is graphically represented by using GeneBee server  (Figure 1-3).
The phylogenetic tree was constructed considering the ITS2 region by using the program PHYLOWIN .196 sites were generated. Other parameter considered was divergence observed with 500 bootstrap replicates. The tree was built using Neighbor-joining method. The consensus tree along with branch distances and bootstrap number is shown in Figure 4 .
All the species were considered for locating tandem repeats. Tandem Repeats Occurrence Locator (TROLL)  was used for determining the conserved repeats in the considered ITS2 sequences. The maximum motif length considered was 6 and the minimum repeat length was 10.The observed results are tabulated in Table 2.
Frequency of the bases is an important measure for sequence conservation analysis. Dimer frequency was calculated using Spectrum repeat finder (SRF) . The most frequent dimer repeats and the extended range of their distribution in all the considered ITS2 sequence was calculated using SRF. Obtained results are shown in Figure 5 -8.
RNA secondary structure consists of stems and loops. Mainly five types of loops are present in RNA secondary structure,namely,interior,hairpin,exterior,multi and bulge. For in-depth analysis,calculation of secondary structure and determination of structural conservation is essential.
RNA secondary structures for ITS2 were predicted using RNADRAW [15-16] program. The dynamic programming algorithm utilized in RNADRAW is based on the effort of Zuker and Stiegler  and employs energy parameters taken from Freier  and Jaeger . In the current study,complete sequences of ITS2 were used for RNA structure prediction. The minimum energy structure prediction algorithm in RNADRAW was ported from the RNAFOLD program included in the Vienna RNA package. Obtained results are shown in Table 3,4.
Based on a statistical sample of the Boltzmann ensemble for secondary structures,Sribo program in Sfold (statistical Folding and Rational Design of Nucleic Acids)  was used to predict the probable target accessibility sites (loops) for trans-cleaving ribozymes in ITS2. Here,the likelihood of unpaired sites for potential ribozyme target was assessed. Due to existence of RNA as a population of different structures,stochastic approach to the evaluation of accessible sites was found to be appropriate . The probability profiling approach by Ding and Lawrence  reveals target sites that are commonly accessible for a large number of statistically representative structures in the target RNA,thus bypassing the long-standing difficulty in accessibility evaluation due to limited representation of probable structures owing to high statistical confidence in predictions. The probability profile for individual bases (W=1) was produced for the region that includes a triplet and two flanking sequences of 15 bases each in every site of the selected cleavage triplet. Figure 9 and 10 shows the observed results.
ITS2 length varies from 374- 572 bp in Anopheline species where An. fluviatilis and An. minimus showed similar length. The length variations were observed with maximum length being 374 bp and minimum of 572 bp for An.fluviatilis and An. sundaicus respectively. ITS2 in Aedes showed comparatively smaller size of ITS2 sequences that ranged from 196- 373 bp with Aedes ashworthi showing the shortest ITS2.
The ClustalW results show the expected better alignment in the Anopheles and Aedes genera individually as well as dissimilarity when all the 18 species of both the genera were considered simulataneously (Figure 1,2). GC content among Aedes species varied between 48.7 and 55.5% while in considered Anopheline species,it ranged from 51.5 to 59.4%.
The multiple sequence alignments in Figure 1 and Figure 2 depicted the variation in the alignment very clearly. Species considered from Aedes genera showed better conservation at the sequence level than Anopheles genera. The overall alignment of the 18 species shows better alignment from position 150 to 320 (Figure 3 ).
The lowest distance among all the sequences is observed in between Anopheles minimus and An. fluviatilis. An. culicifacies and Aedes longfilamentus share the same node with an observed divergence of 0.1 distance.
The tandem repeat search revealed variations. Some of the species in both the genera do not show any tandem repeats (Ae. australis,Ae. ashworthi,Ae. wardangensis,An. atroparvus,An. messeae,An. culicifacies,An. minimus,An. sundaicus ) in the constraint range.
TC and GC repeat are same in case of Ae.australis. Same dimer frequency for these repeats among all the 3 species is another interesting feature observed. Ae. australis and Ae. ashworthi also show same GC repeat number. Dimer frequency varies from 16 to 33 in case of CG repeats but as far as other dimers are concerned,they maintain the same copy number. Ae.triseriatus,Ae. hendersoni and Ae. longfilamentous are found to be having same copy number in the same repeat searching region. Aedes aegypti and Ae. albopictus also possess same copy number in their spanning region.
CA repeats are found to have same copy number in An. messeae and An. stephensi are found even though region ranges are different.
TG repeats are found to be almost same in An. fluviatilis,An. minimus but differ significantly in An. sundaicus. Difference in GT repeats is observed between An. superpictus and An. sundaicus.
Secondary structural features of ITS2 regions are presented in Table 3,4.
ITS2 RNA structures from An.sundaicus has the highest negative free energy (-185.45 Kcal) followed by An. stephensi (-133.13 Kcal) while An.superpictus showed lowest negative Structural energy. Structural energy is found to be same in An. fluviatis and An. messeae and quite same in An. culicifacies and An. darlingi.. Surprisingly,GU pairing is same in 4 species viz. An.atroparvus,An.fluviatilis,An.messeae and An. minimus. Similar maximum heat formation is observed in An. culicifacies,An.fluviatilis,An. minimus.
Highest negative structural energy was observed in Aedes albopictus (-116.65 Kcal) followed by Ae.cretinus (-104.59) while Ae. wardangensis possess lowest negative energy. Structural energy is found to be similar in Ae. australis and Ae. wardangensis viz -49.7 and -49.5 respectively and it varies in trivial manner in Ae. aegypti and Ae. ashworthi.
The stems (double stranded paired regions) stabilize RNA secondary structure. The Anopheles species considered in this study were grouped into different classes based on the similarity of RNA stems and loops. Class I comprising (An. stephensi and An. sundaicus) showed similarity in secondary structural features. Similar observations were found in Class II (An.messae and An. superpictus) and Class IV (An.atroparvus and An. fluviatilis).Despite the different geographical regions,An. culicifacies and An. minimus are falling in the class III sharing similar secondary structural features.
Similarly in Aedes,ITS2 were classified into 3 classes; Class I (Ae. wardangensis and Ae.aegypti),Class II (Ae. ashworthi and Ae.australis) based on similar structural properties while rest others showing high variance in structural features were clubbed as Class III. An interesting observation was that on visual comparison of secondary structure with the phylogenetic tree,Anopheles atroparvus,An. fluviatilis and Aedes cretinus which share the same stem number (28) were found close to each other in the tree too.
This convergence at secondary structural level among few species from different regions may be due to the evolutionary pressure on ITS2 to maintain the RNA secondary structure involved in post-transcriptional processing of rRNA.
The order of preference for conservation is observed to be more in the case of hairpin loop followed by exterior and bulge loop but surprisingly considerable variation is present in the interior loop type. In exterior loop and multi loop,An. darlingi shows significant variation to other species. In Ae. ashworthi,no bulge loop is observed. Maximum conservation among species is observed in exterior loop followed by hairpin loop while maximum variation is found in interior loop
Wesson et al  had described several homologous domains in the ITS regions of Aedes mosquito species and these domains are known to undergo base pairing to form a core region crucial to various stem attributes. This suggested the role of conserved core region of rDNA in functional rRNA folding pattern. ITS2 is required for the precise and efficient processing and maturation of 26S rRNA ribosomal units . The distribution of information needed for the efficient removal of ITS2 from its RNA precursor is not localized but scattered throughout the ITS2 region. Variation and changes in rRNA folding pattern due to evolutionary sequence variation in the ITS spacer regions may have an imperative role on the ribosomal biogenesis.
ITS2,a well-known phylogenetic marker,is studied for contrasting features that form the basis of evolutionary process i.e. the general trend of variability among the species as well as the conservation among few species. At the sequence level and in loop regions,Aedes is found to show less variation with respect to the Anopheles species. Results of the present study suggests that exterior loops are the most conserved types of loops followed by hairpin,bulge and multi loops .But Aedes has shown similarity with Anopheles in secondary structure features by showing maximum divergence in interior loop conservation. Identifying the homologous regions and reconstructing their evolution increases the traits available for the phylogenetic analysis. Construction of an evolutionary tree using more species will provide an understanding for their functional selection. The present study indicates the phylogenetic relationships among the mosquito species belonging to different genera. In nutshell,this study shows that Aedes ITS2 region is more conserved than Anopheles ITS2. This indicates a possible role of varied selection pressure in this variation and it may be a plausible cause of greater divergence of Anopheles rather than Aedes species. This study raises a question on reliability of ITS2 in Anopheles as a phylogenetic marker as compared to Aedes. Further rigorous investigation is required considering all the species of Anopheles and Aedes to prove the reliability of ITS2 as a phylogenetic marker in context of mosquito species. New neutral and efficient molecular markers are required to study the origin and phylogeny of mosquito.
The authors are grateful to Dr. J.S. Yadav,Director,IICT,India for his constant support and encouragement. Neelima Arora thanks CSIR for Senior Research Fellowship.