The complete mitogenome of Curculiochinensis (Chevrolat, 1878) (Coleoptera: Curculionidae: Curculioninae)

Abstract The mitogenome of Curculiochinensis (Chevrolat, 1878) was sequenced and annotated to better identify C.chinensis and related species. The mitogenome is 18,680 bp in length, includes the 37 typical mitochondrial genes (13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes) and two control regions (total length: 3,879 bp). Mitogenome organisation, nucleotide composition and codon usage are similar to the previously sequenced Curculio mitogenomes. All 13 protein-coding genes use ATN or TTG as start codon and end with TAA/G or incomplete stop codons (single T-). Twenty-one transfer RNA genes have the typical clover-leaf structure, while the dihydrouridine (DHU) arm of trnS1 is missing. In Curculio mitogenomes, the size of the control region is highly variable. Both ML and BI analyses, based on the 13 PCGs and two rRNAs from six species of Curculioninae, strongly supported the monophyly of Curculio. In Curculio, the relationships amongst included species were inferred as ((C.chinensis + Curculio. sp.) + (Curculiodavidi + Curculioelephas)), with C.chinensis and C. sp. forming a clade (BS = 100; PP = 1).


Introduction
The typical mitogenome of insects is a circular double-stranded DNA molecule with 15-18 kb in length, encoding 13 protein-coding genes (PCGs), two ribosomal RNA genes (rRNAs), 22 transfer RNA genes (tRNAs) and also includes a large non-coding region (control region) (Boore 1999, Cameron 2014. In insects, the mitogenome has been widely used as a molecular marker to explore population genetics, phylogeny and evolution (Fenn et al. 2008, Galtier et al. 2009, Cameron 2014).
The camellia weevil, Curculio chinensis (Chevrolat, 1878) is widely distributed in most of China's Camellia spp. (family Theaceae) producing areas (Li et al. 2015). It is one of the most serious pests of tea and causes huge economic losses (Li et al. 2015). Since different species exhibit distinct responses to specific biocontrol agents and pesticides, accurate species identification is very important in pest management (Zhou et al. 2020). However, the camellia weevil is often difficult to identify using morphological characteristics of the larvae. It is impractical to identify camellia weevil by rearing larvae to adults because the larvae are long-lived and difficult to rear when removed from the seed (Shu et al. 2013, He et al. 2014. Molecular identification has proven to be reliable and effective for specieslevel identification of insects at any life stage (Chen et al. 2016).
In this study, we sequenced and annotated the mitogenome of C. chinensis and analysed its characteristics. In addition, we reconstructed the molecular phylogenetic relationships of C. chinensis and other species of the genus Curculio. The molecular data presented here will be useful for studies on identification and evolution in C. chinensis and related species.

Sample collection and DNA extraction
Adult specimens of C. chinensis were collected from Camellia spp. in Yunguanshan Forest Farm, Guiyang City, Guizhou Province, China (26.48208727°N, 106.75480714°E, July 2020) (C. chinensis is a host-specifc predator of the seeds of Camellia spp.). All fresh specimens were preserved in 100% ethyl alcohol and deposited in a -20℃ freezer at the laboratory of Guizhou Academy of Forestry, Guiyang. Identification of adult specimens was based on morphological characteristics (Chao and Chen 1980). Whole genomic DNA was extracted from thorax muscle tissues using the Biospin Insect Genomic DNA Extraction Kit (BioFlux) following the manufacturer's instructions. Voucher specimens are stored in the insect collection of Guizhou Academy of Forestry.

Molecular phylogenetic analysis
A total of six mitogenomes from two genera of Curculioninae were used for the phylogenetic analyses (Table 1). We used as much mitogenome data for the genus Curculio in NCBI as possible. Of these, four species belong to Curculio (the ingroup), while the remaining two species from the genus Anthonomus Germar, 1817 were chosen as outgroup. Nucleotide sequences (without stop codons) for the 13 PCGs were aligned using MAFFT v.7 (Katoh and Standley 2013) with the G-INS-i (accurate) strategy and codon alignment mode (Code table: Invertebrate mitochondrial genetic codon). The rRNAs genes (rrnL and rrnS) were aligned using MAFFT v.7 (Katoh and Standley 2013) with the Q-INS-I algorithm (which takes account of the secondary structure of rRNA genes). Ambiguously aligned areas were removed using Gblocks v.0.91b (Talavera and Castresana 2007), respectively. Gene alignments were concatenated using PhyloSuite v.1.2.2 (Zhang et al. 2019). Partitioning scheme and nucleotide substitution models for Maximum Likelihood (ML) and Bayesian Inference (BI) phylogenetic analyses were selected with PartitionFinder2 (Lanfear et al. 2017) using the Bayesian Information Criterion (BIC) (Suppl. materials 1, 2). ML analyses were reconstructed by IQ-TREE v.1.6.3 (Nguyen et al. 2015) under the ultrafast bootstrap (UFB) approximation approach (Minh et al. 2013) with 5,000 replicates. BI analysis was performed using MrBayes v.3.2.7a (Ronquist et al. 2012) in the CIPRES Science Gateway (Miller et al. 2010) with four chains (one cold chain and three hot chains). Two independent runs of 2,000,000 generations were carried out with sampling every 1,000 generations. The first 25% of trees were discarded as burn-in. After the average standard deviation of split frequencies fell below 0.01, stationarity was assumed.

Mitogenome organisation and nucleotide composition
The mitogenome of C. chinensis is a double-stranded circular DNA molecule, containing 37 typical mitochondrial genes (13 PCGs, 22 tRNAs and two rRNAs) and two control regions (  (Xu et al. 2017). Variation in the size of the control region is the main source of the length variation in Curculio mitogenomes (Fig. 2). The mitogenome of C. chinensis has the same gene order as other previously sequenced Curculio species (Xu et al. 2017). A total of 71 overlapping nucleotides were found in ten pairs of neighbouring genes, the longest overlap (23 bp) being identified between the trnL1 and rrnL. Furthermore, there are 151 intergenic nucleotides dispersed across 13 gene boundaries and the longest intergenic region (103 bp) is located between trnS2 and nad1. Mitogenomes of the six Curculioninae taxa used in this study.  (Table 3). In every sequenced mitogenome of Curculio, PCGs have the lowest AT content, while the control region has the highest AT content (Table 3). All four Curculio mitogenomes have positive AT-skews (0.052-0.062) and negative GC-skews (−0.203 to −0.17), similar to other recently reported weevil mitogenomes (Apriyanto and Tambunan 2020, Song et al. 2020, Wang et al. 2021) and most other insects (Wei et al. 2010).

Protein-coding genes
The total size of all 13 PCGs of C. chinensis is 11,160 bp, accounting for 59.74% of the entire mitogenome (Table 3). In 13 PCGs, nad2, cox1, cox2, atp8, atp6, cox3, nad3, nad5, nad4, nad4L, nad6 and cob use ATN (ATA/T/G/C) as start codon, while nad1 is initiated by  TTG, which is common for Curculio mitogenomes (Xu et al. 2017). All PCGs stopped with TAA/G or their incomplete form single T-. The incomplete termination codon single T-can be completed by post-transcriptional polyadenylation (Ojala et al. 1981). The AT-skews of all PCGs amongst Curculio range from -0.146 (C. davidi) (Xu et al. 2017) to -0.133 (C. chinensis and Curculio sp.), showing a biased use for the T nucleotide. The relative synonymous codon usage (RSCU) of C. chinensis mitogenome is presented in Fig. 3, indicating Leu, Phe and Ile are the three most frequently used amino acids. In the new mitogenome, the four most frequently utilised codons are UUA-Leu, UUU-Phe, AUU-Ile and AUA-Met. The most frequently used codons are composed of A nucleotide or U nucleotide, which reflects the high AT content of PCGs.

Transfer and ribosomal RNA genes
The typical sets of 22 tRNAs were identified with the size ranging from 62 bp (trnR) to 71 bp (trnK) ( Table 2). The AT content of tRNAs (76.8%-78.3%) was slightly higher than that of the PCGs (75.7%-76.1%) ( Table 3). Most tRNAs have clover-leaf secondary structures, except for trnS1, where the dihydrouridine (DHU) arm became a simple loop (Fig. 4). This feature is common in metazoan mitogenomes (Garey and Wolstenholme 1989). A total of 30 mismatched base pairs belonging to six types (U-G, U-U, A-C, A-G, U-C and A-A) were found in the arm structures of the 22 tRNAs.
The length of rrnS and rrnL genes ranges from 2,059 bp (C. sp.) to 2,152 bp (C. chinensis) and AT content of rRNAs is conserved in the Curculio (Table 3). For C. chinensis, the rrnL gene (length: 1329 bp) is encoded between trnL1 and trnV and the rrnS gene (length: 788 bp) is encoded between trnV and the control region, similar to other sequenced Curculio (Xu et al. 2017). Table 3.

Phylogenetic relationships
Based on ML and BI analyses of nucleotide data of 13 PCGs and two rRNAs, we reconstructed the phylogenetic relationships of four species of Curculio. The trees of both analyses have congruent topologies, with all branches strongly supported (Fig. 5). Furthermore, relationships recovered in our analyses are similar to those found by Song et al. (Song et al. 2020), but we only focused on the phylogenetic relationships within Curculio. The monophyly of the genus Curculio was recovered with strong support, consistent with the previous study (Song et al. 2020). In Curculio, the relationships amongst included species were inferred as ((C. chinensis + C. sp.) + (Curculio davidi + Curculio elephas Fabricius, 1781)), with C. chinensis and C. sp. forming a clade. In China's Camellia spp. producing areas, both C. chinensis and C. sp. are host-specific predators of the seeds of Camellia spp. The topologies of the phylogenetic trees reconstructed by us strongly supported the sister relationship between these two Curculio species (BS = 100; PP = 1), which may reflect a convergent evolutionary phenomenon in Curculio species with Camellia spp. as their host.