Abstract
In brief
Sperm function is essential for fertility across humans, agriculture and wildlife, yet comparative studies remain limited. This study integrates multi-species proteomic data to identify a core sperm proteome, uncovering conserved molecular pathways and validating novel sperm proteins critical for motility and fertilization.
Abstract
Reproductive biology is often considered in three siloed research areas; humans, agriculture and wildlife. Yet, each demand solutions for treatment of subfertility, fertility biomarkers, development of assisted reproductive technologies and effective contraception. To efficiently develop solutions applicable to all species, we must improve our understanding of the common biology underpinning reproductive processes. Accordingly, we integrate proteomic data from 29 publicly available datasets (>2 TB of data) to characterize mature sperm proteomes spanning 12 vertebrate species, identifying 13,853 proteins. Although human and mouse have relatively well-annotated sperm proteomes, many non-model species rely heavily on predicted or homology-inferred identifications. Despite variation in proteome size, composition and reproductive strategies, comparative analyses revealed that vertebrates share a fundamental molecular framework essential for sperm function. A core set of 45 species-level and 135 order-level conserved proteins mapped to critical processes, including energy generation, acrosome function and novel signalling pathways (BAG2 and FAT10). Knockout mouse models further validate the significance of these conserved proteins, demonstrating that their disruption impairs sperm motility and fertilization capacity. Moreover, we discovered loss-of-function variants of two additional core sperm proteins in clinical samples, linking them to severe sperm defects. Intriguingly, in-silico analysis reveals function-driven, context-dependent diversity surpassing evolutionary patterns. Collectively, these results highlight the value of integrating publicly available datasets and underscore the need for improved genome/proteome annotation in non-model species in mammals. This work provides a foundation for developing cross-species strategies to enhance fertility treatments, assisted reproductive technologies and conservation efforts. All data are available via ShinySpermKingdom (https://reproproteomics.shinyapps.io/ShinySpermKingdom/).
Introduction
Reproductive biology is often considered in three siloed areas: humans, domesticated animals and wildlife. Despite their differences, there are several common needs across these species; efficient production, treatment of subfertility and infertility, development of assisted reproductive technologies (ARTs) and effective contraception (Comizzoli & Holt 2019, Duffy et al. 2020). To effectively meet these needs, research should focus on developing solutions that are applicable across species where possible. However, to achieve this goal, we need a better understanding of the common reproductive biology across species.
Reproductive physiology is incredibly diverse across the animal kingdom, with many strategies unique to particular evolutionary lineages. In the context of mating, examples of this diversity include the location of semen deposition (Suarez & Pacey 2006), the ability to store spermatozoa in the female tract for extended periods (Holt & Lloyd 2010) and variable sperm morphologies (Fitzpatrick et al. 2022). Even between species of the same class, some of these traits appear to differ significantly. There is potential to both take advantage of common pathways and potentially exploit strategies that are unique to some species.
This area of study has significant potential benefits on the male side of the reproductive equation. For example, high quality transcriptomic and proteomic studies on the formation of biological sperm storage may highlight pathways that could be targeted to prolong the in vitro shelf life of spermatozoa from a variety of species. As an example, koala sperm display an exceptional longevity of up to 42 days post-ejaculation during ex vivo storage (Johnston et al. 2000, Johnston et al. 2012, Skerrett-Byrne et al. 2021a ). Understanding the biochemical principles of this longevity would yield potential benefits as diverse as human ARTs and addressing the logistical issues of extending the shelf life of fresh semen in the beef and dairy industries (Murphy et al. 2017). Alternatively, similar studies of spermatozoa and seminal plasma from species with high sperm competition could identify proteins and non-coding RNA species that may be exploited to treat subfertility and improve ARTs. To make such advances, further basic discovery research is required.
With decreasing costs and increasing sensitivity, proteomic profiling of the male gamete has become widespread (Mohanty et al. 2015). While sperm proteomes have been published for a variety of species, we are yet to capitalize on the suggestion of Oliva, Martínez-Heredia and Estanyol (Oliva et al. 2008) to identify conserved proteins from among the wealth of proteomic data that has been collected. As yet, there has been limited exploration of cross-species analyses; with a single study having compared the sperm proteomes of rodents and ungulates (Bayram et al. 2016) and another comparing three closely related mouse species with differences in sperm competition (Vicens et al. 2017). The results of these studies suggest that sperm proteins fall into two categories; i) highly conserved ‘core’ proteins and ii) rapidly evolving proteins that are unique to species or taxonomic groups. Importantly, many of the core proteins had important biological roles (e.g. spermatogenesis and capacitation (Bayram et al. 2016)), suggesting they may provide key avenues for developing cross-species reproductive solutions.
A comprehensive cross-species sperm proteomic analysis would provide the highest quality data, which can help to build further research. However, the physical collection of gametes from a large assortment of species involves many difficulties, not the least of which are the considerable expense and logistical management of samples. Thus, as a precursor to further experimental studies, we herein present an in-silico analysis of publicly available proteomic data from 12 species, representing the most comprehensive cross-species sperm proteome published to date. Using this information, we establish an up-to-date core sperm proteome, highlighting candidate pathways that are highly conserved across species. In addition, we compare proteomes across species based on biological contexts to highlight potentially important pathways for further study.
Materials and methods
Chemicals and reagents
Unless otherwise specified, all reagents were purchased from Merck.
Proteomic data sourcing
Publicly available proteomic data was sourced from the ProteomeXchange repository (www.proteomexchange.org) (Deutsch et al. 2023), drawing data from a range of partner repositories including PRIDE (Perez-Riverol et al. 2022), iProX (Chen et al. 2022) and MassIVE (Choi et al. 2020). The search term ‘sperm’ was initially used to obtain all proteomic datasets containing this keyword. Next, datasets were refined by phylum, with only species in the Chordata phylum retained. In the remaining datasets, any without the RAW mass spectrometry output files (e.g.WIFF and .RAW) available were excluded. From this filtered list, the remaining datasets were refined based on the following criteria: use of bottom-up proteomics employing data-dependent acquisition on fresh, mature cauda epididymal or ejaculated spermatozoa from wildtype animals. Studies were not excluded based on sample enrichment techniques (e.g. density gradient centrifugation or isolation of plasma membrane proteins). Beyond these criteria, studies were excluded if the experimental treatment of each raw MS output file could not be determined or available RAW files deposited were not of sufficient detail or file type for reanalysis. In the first case, clarification was sought from the publishing authors to establish file identities before exclusion. The application of these criteria resulted in 29 datasets (Table 1) and over 2 TB of RAW data.
Final proteomic datasets included in this study.
Study | Dataset identifier | PMID | Species |
---|---|---|---|
Byrne et al. (2012) | PXD000007 | 23081703 | Bos indicus |
Ramesha et al. (2020) | PXD014172 | 32508098 | Bos indicus |
Kasvandik et al. (2015) | PXD001096 | 25603787 | Bos taurus |
Shen et al. (2021) | PXD019435 | 34009990 | Bos taurus |
Fu et al. (2019) | PXD003859 | 31146187 | Bubalus bubalis |
Batra et al. (2021) | PXD022114 | 34174811 | Bubalus bubalis |
Nixon et al. (2019) | MSV000082258 | 30072580 | Crocodylus porosus |
Labas et al. (2015) | PXD001254 | 25086240 | Gallus gallus |
Vitorino Carvalho et al. (2021) | PXD022322 | 33898456 | Gallus gallus |
Vandenbrouck et al. (2016) | PXD003947 | 27444420 | Homo sapiens |
Schiza et al. (2018) | PXD007515 | 30097533 | Homo sapiens |
Urizar-Arenaza et al. (2019) | PXD011290 | 30622161 | Homo sapiens |
Pini et al. (2020) | PXD014849 | 32026202 | Homo sapiens |
Castillo et al. (2019) | PXD014871 | 31824947 | Homo sapiens |
Guyonnet et al. (2012) | PXD000575 | 22707618 | Mus musculus |
Guyonnet et al. (2014) | PXD000592 | 24797071 | Mus musculus |
Castaneda et al. (2017) | PXD005343 | 28630322 | Mus musculus |
Liu et al. (2019) | PXD013092 | 30867229 | Mus musculus |
Xu et al. (2020) | PXD016928 | 31969357 | Mus musculus |
Skerrett-Byrne et al. (2022) | PXD028834 | 36384108 | Mus musculus |
Casares-Crespo et al. (2019) | PXD007989 | 30753958 | Oryctolagus cuniculus |
Juárez et al. (2020) | PXD015510 | N/A | Oryctolagus cuniculus |
Leahy et al. (2020) | PXD017537 | 32383290 | Ovis aries |
Skerrett-Byrne et al. (2021a) | PXD024250 | 34411425 | Phascolarctos cinereus |
Xu et al. (2021) | PXD025607 | 34336820 | Sus scrofa |
Pérez-Patiño et al. (2019) | PXD010062 | 30257877 | Sus scrofa |
Zhang et al. (2022) | PXD030020 | 35252197 | Sus scrofa |
Fuentes-Albero et al. (2021) | PXD024588 | 34336830 | Tursiops truncatus |
Bayram et al. (2016) | PXD003164 | 26768581 | Mus musculus, Bos taurus, Sus scrofa, Rattus norvegicus |
International mouse phenotyping consortium mouse models, histology and data collection
The International Mouse Phenotyping Consortium (IMPC) database (Dickinson et al. 2016, Groza et al. 2022) was mined for genetic knockout mice overlapping with those identified as the core sperm proteome (Fig. 3A), and with special access granted, we cross referenced with the European Mouse Mutant Archive (EMMA) (Hagn et al. 2007) to restrict to those gene KOs with available in vitro fertilization (IVF) and sperm data. The mouse models were generated using the IMPC targeting strategy with CRISPR/Cas technology at Helmholtz Munich (https://www.mousephenotype.org/understand/the-data/allele-design/). After genotyping, heterozygous × heterozygous matings were set up to generate sufficient mutant mice with littermate +/+ controls for phenotyping analysis at the German Mouse Clinic, as described (Fuchs et al. 2018), and in agreement with the standardized phenotyping pipeline of the IMPC including histopathological analysis (n = 2) (https://www.mousephenotype.org/impress/PipelineInfo?id=14) for all lines except for Aldh7a1 (n = 5). We obtained data pertinent to Aldh7a1 (aldehyde dehydrogenase 7 family member A1), Echs1 (enoyl-CoA hydratase, short chain 1), Etfb (electron transfer flavoprotein subunit beta), Ndufa10 (NADH:ubiquinone oxidoreductase subunit A10) and Pebp4 (phosphatidylethanolamine-binding protein 4), with a wildtype reference control provided by EMMA. For further information on protocols used by EMMA for sperm collection, analysis and IVF, please see their publicly available resources and videos (https://www.infrafrontier.eu/emma/cryopreservation-protocols/). Briefly, histopathological analyses of formalin-fixed, paraffin-embedded and haematoxylin and eosin (H&E)-stained sections (3 μm thick) of testis, epididymis, prostate and seminal vesicles from control and mutant mice were performed blind by two pathologists. When applicable, the number of multinucleated giant cells (MGCs) in the seminiferous tubules was counted and expressed per unit area of testis.
Proteome Discoverer processing
Consistent with previous studies (Murray et al. 2021, Trigg et al. 2021, Martin et al. 2022, Skerrett-Byrne et al. 2021a , b , c , 2022, Smyth et al. 2022, Staudt et al. 2022), database searching of each study’s RAW files was performed using the Proteome Discoverer 2.5 (Thermo Fisher Scientific, USA). The SEQUEST HT was used to search against the appropriate UniProt database (Table S1 (see section on Supplementary materials given at the end of the article), each downloaded June 18, 2022, including reviewed and unreviewed proteins). Highly stringent database searching criteria were utilized, including up to two missed cleavages, a precursor mass tolerance set to 10 ppm and fragment mass tolerance of 0.02 Da. Trypsin was designated as the digestion enzyme. Cysteine carbamidomethylation was set as a fixed modification while acetylation (K, N-terminus), phosphorylation (S,T,Y) and oxidation (M) were designated as dynamic modifications. Interrogation of the corresponding reversed database was also performed to evaluate the false discovery rate (FDR) of peptide identification using Percolator on the basis of q-values, which were estimated from the target-decoy search approach. To filter out target peptide spectrum matches over the decoy-peptide spectrum matches, a fixed FDR of 1% was set at the peptide level. The resultant protein list was exported from the Proteome Discoverer 2.5 as an excel file and further refined to include only those with a protein identification (FDR ≤0.01) with at least one or more unique peptides.
Phylogenetic trees and UniProt mapping
NCBI taxonomy numbers were submitted to phylot (v2) to generate a phylogenetic tree, visualized and exported from iTOL (interactive tree of life) (Letunic & Bork 2007, 2011). Utilising UniProt (https://www.uniprot.org/), each of the sperm proteomes were mapped to the UniProt Knowledge Base to ascertain the level of evidence of each protein (Skerrett-Byrne et al. 2021a , b , c , 2022, Smyth et al. 2022). UniProt protein evidence is a measure of the current, manually curated type of evidence that supports the existence of that protein; experimental evidence i) at protein level; ii) at transcript level; iii) protein inferred from homology; and iv) protein predicted.
Humanization with OmicsBox
To conduct a comparative analysis between all species, a minimum cut-off of at least 500 proteins was applied before proceeding to humanization to maximize the comparisons possible. Data from each of the remaining eight species were uploaded to UniProt to generate a FASTA file for humanization using a custom workflow on the OmicsBox software (version 2.2.4, BioBam Bioinformatics, Spain) (https://www.biobam.com/omicsbox). This workflow includes a cloud based DIAMOND BLAST protein search against the human proteome (Götz et al. 2008, Buchfink et al. 2021, Skerrett-Byrne et al. 2021a , Zhang et al. 2022), with the output restricted to an e-value cut-off of 4.07E−10 to ensure accurate homologues were obtained (97.5% average conversion).
Identifying conserved and species-specific proteins
Conserved proteins were identified at both species level (i.e. proteins present in all species used for further analysis) and order level (i.e. proteins present in at least one species of all taxonomic orders used for further analysis). Humanized identifications (IDs) were employed for this analysis and lists were compared using jvenn (Bardou et al. 2014) and DeepVenn (Hulsen 2022) to identify conserved proteins. The analysis at order level included the taxonomic orders Primates (H. sapiens), Rodentia (M. musculus), Artiodactyla (B. taurus, O. aries, S. scrofa, T. truncates), Lagomorpha (O. cuniculus), Diprotodontia (P. cinereus) and Crocodilia (C. porosus).
Comparing the sperm proteome based on biological contexts
Groups of species were compared based on several biological ‘contexts’, including location of testes (internal vs external), history of selective breeding (yes vs no) and sperm metabolism preference (glycolysis preference vs no preference). Humanized IDs were used for this analysis and species classified into each group are listed in Supplementary Table S17. To account for the stronger influence of human and mouse proteomes due to their extensive inventories, proteins were not included in the analysis if they were only identified in mouse or human spermatozoa. Lists were compared using jvenn (Bardou et al. 2014) and DeepVenn (Hulsen 2022) to identify conserved and unique proteins.
Bioinformatic analyses of proteomic data
Bioinformatic analyses employed humanized IDs for analysis. High granularity pathway analysis was performed using the Ingenuity Pathway Analysis software package (IPA; Qiagen, Germany), as previously described (Murray et al. 2021, Skerrett-Byrne et al. 2021a , b , c , 2022, Trigg et al. 2021, Martin et al. 2022, Smyth et al. 2022, Staudt et al. 2022, Zhang et al. 2022). Each humanized proteomic list was analysed on the basis of predicted protein subcellular location and classification (other excluded), in addition to canonical pathways and disease and functions, using the IPA P-value enrichment score (a strict cut-off of P-value ≤0.05) (Krämer et al. 2013). The database for annotation, visualization and integrated discovery (DAVID, www.david.ncifcrf.gov, v 2021 (Huang da et al. 2009, Sherman et al. 2022),) functional annotation clustering tool was used to identify enriched clusters based on gene ontology terms, protein–protein interactions, protein domains, pathways and literature. All searches were performed with default thresholds for similarity, classification and enrichment, using Homo sapiens as the background gene list. Clusters were classified as significantly enriched based on Benjamini adjusted P-values ≤0.05. Visual protein–protein interaction networks were generated using STRING (www.string-db.org, v 11.5). The humanized proteome was interrogated using UniProt to assess subcellular locations relevant to sperm cells, the following GO terms were used: acrosomal vesicle (GO:0001669), perinuclear theca (GO:0033011), nucleus (GO:0005634), mitochondrion (GO:0005739), axoneme (GO:0005930), cytoskeletal calyx (GO:0033150), sperm midpiece (GO:0097225), head-tail coupling apparatus (GO:0120212), principal piece (GO:0097228), end piece (GO:0097229), annulus (GO:0097227) and flagellum (GO:0036126).
Clustering and network visualization
The refined pathways output from IPA were loaded into Perseus (version 1.6.10.43) (Tyanova et al. 2016) to carry out unbiased hierarchical clustering across the species. Protein networks of the core sperm proteomes at the species (45 proteins) and order (135 proteins) taxonomic levels were investigated using STRING (version 12.0) (Szklarczyk et al. 2021) and then visualized and modified using Cytoscape (version 3.8.2) (Shannon et al. 2003). Basic data handling, if not otherwise stated, was conducted using Microsoft Excel 365 (version 2211, Microsoft Corporation, USA) and GraphPad Prism version 10.4.1 (GraphPad Software; USA).
Shiny application development
In accordance with Shiny blueprint outlined by ShinySperm (Skerrett-Byrne et al. 2024), a Shiny application was deployed to support the accessibility and interpretability of these datasets within, allowing for effective data-driven insights by the field. The full coding script supporting ShinySpermKingdom (https://reproproteomics.shinyapps.io/ShinySpermKingdom/) can be downloaded from GitHub – https://github.com/DavidSBEire/ShinySpermKingdom. In brief, the ShinySpermKingdom application was built using the shiny package (version 1.9.1) on the RStudio (version 2024.04.1 + 748), with base R (version 4.3.3, 2024-02-29). Supporting the functionality and aesthetics of this application are several packages, including DT, eulerr, ggplot2, openxlsx, plotly, readxl, reshape2, RColorBrewer and shinydashboard.
Male reproductive genomics (MERGE) cohort
The MERGE cohort currently comprises exome/genome data of almost 3,000 men, of whom most attended the Centre of Reproductive Medicine and Andrology (CeRA), Münster, for couple infertility. MERGE is continuously growing, and for the current study, 2,882 datasets of men with quantitative and/or qualitative sperm defects were queried for the 135 genes encoding the identified conserved sperm proteins. Specifically, 2,327 men had very few or no sperm in the ejaculate (crypto- or azoospermia, HP:0030974/HP:0000027), 437 had various grades of reduced sperm counts (oligozoospermia, HP:0000798) often combined with reduced/impaired sperm motility and/or morphology (astheno-/teratozoospermia, HP:00122077/HP:0012864), and 118 had normal sperm counts but motility and/or morphology defects. The most recent description of MERGE including the details of sequencing are available in (Stallmeyer et al. 2024). Only well-covered (>20×), rare (minor allele frequency <0.01 in gnomAD 2.1.1), coding, homo- or hemizygous, loss-of-function variants (LoF: stop gained, frameshift, splice acceptor/donor) were prioritised.
Results
Establishment of multispecies mature spermatozoa proteomes
A comprehensive search of the ProteomeXchange repository using the keyword ‘sperm’ yielded a total of 146 datasets (Fig. 1A). After excluding species from outside the Chordata phylum, a total of 90 datasets remained. Of these, 48 studies were excluded based on our predefined inclusion criteria, such as only studies on functional mature sperm cells (see STAR methods). A further 13 studies were excluded based on insufficient information available for reanalysis. Datasets at each successive level of exclusion are listed in Table S1.
Characterization of multispecies sperm proteomes. (A) Across 12 different species, over 2 TB of RAW spectral data was sourced from public repositories and processed using the Proteome Discoverer 2.5, utilising highly stringent criteria. (B) The number of proteins identified in each of the 12 species and (C) the proportion their respective level of evidence for protein evidence as curated by to the UniProt knowledge base; i) at protein level; ii) at transcript level; iii) protein inferred from homology; and iv) protein predicted.
Citation: Reproduction 169, 6; 10.1530/REP-25-0105
The final cohort comprised 29 datasets, representing 12 species: Homo sapiens (Vandenbrouck et al. 2016, Schiza et al. 2018, Castillo et al. 2019, Urizar-Arenaza et al. 2019, Pini et al. 2020), Mus musculus (Guyonnet et al. 2012, Guyonnet et al. 2014, Bayram et al. 2016, Castaneda et al. 2017, Liu et al. 2019, Xu et al. 2020, Skerrett-Byrne et al. 2022), Sus scrofa (Bayram et al. 2016, Pérez-Patiño et al. 2019, Xu et al. 2021, Zhang et al. 2022), Bos taurus (Byrne et al. 2012, Kasvandik et al. 2015, Bayram et al. 2016, Ramesha et al. 2020, Shen et al. 2021), Crocodylus porosus (Nixon et al. 2019), Oryctolagus cuniculus (Casares-Crespo et al. 2019, Juárez et al. 2020), Tursiops truncatus (Fuentes-Albero et al. 2021), Ovis aries (Leahy et al. 2020), Phascolarctos cinereus (Skerrett-Byrne et al. 2021a ), Gallus gallus domesticus (Labas et al. 2015, Vitorino Carvalho et al. 2021), Rattus norvegicus (Bayram et al. 2016) and Bubalus bubalis (Fu et al. 2019, Batra et al. 2021) (Fig. 1A, Table S1). These datasets were reanalysed using a stringent and uniform pipeline implemented in the proteome discover, and the resultant protein IDs are provided in Tables S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13. To enhance accessibility, all the data are also available on ShinySpermKingdom, facilitating an interactive experience with these complex datasets (https://reproproteomics.shinyapps.io/ShinySpermKingdom/).
Unsurprisingly, human (9,296 proteins) and mouse (8,645 proteins) exhibited the most comprehensive sperm proteomes (Fig. 1B, Table 2), reflecting their status as well researched species. Returning nearly a third of the larger proteomes was the boar (3,298), closely followed by bull (3,177), crocodile (2,855) and rabbit (1,650). The proteome of each species was first assessed using UniProt to determine their current curated level of protein evidence (Fig. 1C; Tables S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13). Predictably, the sperm proteomes of well-characterized species such as human (91.4%), mouse (94.4%) and rat (83.1%) were all well-annotated at protein and transcript level. Interestingly, buffalo sperm harboured 77.3% of its evidence at the transcript level. Within the sperm proteomes of the remaining eight species, in most cases, >90% of protein identifications were only predicted or inferred from homology, indicating that experimental evidence for the existence of most proteins remains poor in non-traditional model species (Fig. 1C). Due to the low number of protein identifications in the chicken (223 IDs), Norwegian rat (89 IDs) and buffalo (84 IDs), these species were excluded from further downstream analyses (Fig. 1B). These exclusions ensured a focus on datasets with sufficient coverage and quality for robust comparative analysis.
Summary of protein identifications by species.
Species | Studies, n | Original protein IDs, n | Humanized IDs, n | Conversion rate (%) | Unique to species, n (%) |
---|---|---|---|---|---|
Human (Homo sapiens) | 5 | 9,296 | 6,093 (66.3) | ||
House mouse (Mus musculus) | 7 | 8,645 | 8,424 | 97.4 | 2,536 (30.1) |
Boar (Sus scrofa) | 4 | 3,298 | 3,190 | 96.7 | 111 (3.5) |
Cattle (Bos taurus/indicus) | 5 | 3,177 | 3,048 | 95.9 | 149 (4.9) |
Saltwater crocodile (Crocodylus porosus) | 1 | 2,855 | 2,743 | 96.1 | 165 (6.0) |
European rabbit (Oryctolagus cuniculus) | 2 | 1,650 | 1,633 | 99.0 | 59 (3.6) |
Common bottlenose dolphin (Tursiops truncates) | 1 | 950 | 941 | 99.1 | 15 (1.6) |
Sheep (Ovis aries) | 1 | 743 | 726 | 97.7 | 40 (5.5) |
Koala (Phascolarctos cinereus) | 1 | 582 | 573 | 98.5 | 26 (4.5) |
Chicken (Gallus gallus) | 2 | 223 | |||
Brown rat (Rattus norvegicus) | 1 | 89 | |||
Domestic water buffalo (Bubalus bubalis) | 2 | 84 |
Humanized sperm proteomes redefine evolutionary links
To advance the understanding of the remaining nine species, each sperm proteome was converted to their respective human homologues to allow utilization of human focused bioinformatic tools, facilitating a standardized cross species analysis. Humanization was achieved using the OmicsBox software, as previously described (Skerrett-Byrne et al. 2021a ). Conversion rates to humanized IDs were exceptionally high, ranging from 95.9–99.1%, producing a total of 13,853 proteins (Fig. 2A, Tables 2 and S14). These newly generated sperm proteomes were subject to analyses using the ingenuity pathway analysis (IPA) to provide an overall classification of protein types present. Notably, despite variations in proteome size, proportional compositional analysis of each species revealed broad consistency (Fig. 2B). Enzymes were the dominated category, accounting for ∼72.5% of all sperm proteins, followed by transporters (∼15.5%), transcription (∼5.3%) and translation (∼2.7%) regulators, receptors (∼1.6%), ion channels (∼1.6%), cytokines (∼0.6%) and growth factors (∼0.5%).Whilst there were no overt differences in the proportional composition of protein classification types, the two largest sperm proteomes (i.e. human and mouse), featured proportionally more transcription regulators than that of all other species assessed (i.e. ∼9.3 vs ∼4.1%). This enrichment may indicate potential differences in sperm-specific regulatory mechanisms between traditional model organisms and less-studied species, or reflects poorer annotation in these latter species. Further interrogation with UniProt subcellular localization and the Human Protein Atlas (Uhlén et al. 2015) sperm subcellular resource allowed mapping to key sperm locations (Fig. S1): acrosome (146 proteins), perinuclear theca (17 proteins), nucleus (3,879 proteins), calyx (15 proteins), equatorial segment (13 proteins), connecting piece (45 proteins), mid-piece (118 proteins), mitochondria (394 proteins), annulus (16 proteins), flagellum (204 proteins), flagellar centriole (21 proteins), axoneme (176 proteins), principal piece (91 proteins) and end piece (31 proteins) (Table S14).
Humanization of sperm proteomes. (A) Summary of the number of proteins which were successfully converted to human homologues (duplicates removed). (B) The percentage of the original species proteome retained. (C) Heatmap depicts the top 25 unique pathways significantly enriched in at least one species (P-value ≤0.05), white denoting absence of detection. To the left, in gold is a phylogenetic tree that depicts the evolutionary distances and relationships between the nine species (generated with phylot and iTOL). Mirrored to the right in blue is the unbiased hierarchical clustering based upon the full remit of 482 pathways (sperm function).
Citation: Reproduction 169, 6; 10.1530/REP-25-0105
Seeking to delve further into the functional relationships among these sperm proteomes, the canonical pathways node of IPA was utilized leading to the identification of 482 unique pathways significantly enriched in at least one species (P-value ≤0.05). Unbiased hierarchical clustering based on the conservation of these pathways revealed mouse spermatozoa to be the most functionally related to that of their human counterparts (Fig. 2C). This finding contrasts that of the genomic lineage tracing, which indicated that among the species assessed, rabbits were the closest evolutionary relative to humans. Notably, this hierarchical clustering approach achieved the division of the assessed species into two broad groupings: i) crocodile, dolphin, bull and boar; and ii) sheep, koala, rabbit, mouse and human. This reorganization suggests that functional relationships based on sperm proteomes may not strictly align with the evolutionary distances derived from genomic data. Amongst the most significantly enriched pathways across all species were those related to energy metabolism (‘mitochondrial dysfunction’, ‘oxidative phosphorylation’, ‘glycolysis’, ‘gluconeogenesis’ and ‘fatty acid β-oxidation’) and capacitation (‘sirtuin signalling pathway’ and ‘protein ubiquitination pathway’).
Characterization of the core sperm proteome
To ascertain the proteins most fundamental to the functional competency of spermatozoa across the nine assessed species, a comparison of all protein identifications was conducted. This strategy uncovered a modest 45 and 135 conserved proteins at the taxonomic level of species and orders, respectively, which we hereafter refer to as the core sperm proteome (Fig. 3A, Table S15). The nine species were collapsed into six taxonomic orders, namely Primate (human; 9,186 proteins), Rodentia (mouse; 6,707 proteins), Artiodactyl (boar, bull, dolphin and sheep; 3,795 proteins), Lagomorpha (rabbit; 1,548 proteins), Diprotodontia (koala; 556 proteins) and Crocodilia (crocodile; 2,103 proteins). Of those proteins that were identified in more than one order, the largest overlaps were between Primate and Rodentia (830), Primate, Rodentia and Artiodactyla (489), and Rodentia and Artiodactyla (457) (Fig. 3A).
Core sperm proteome. (A) Venn diagram depicts the overlap of sperm proteomes at the level of taxonomic orders; Primates (H. sapiens), Rodentia (M. musculus), Artiodactyla (B. taurus, O. aries, S. scrofa, T. truncates), Lagomorpha (O. cuniculus), Diprotodontia (P. cinereus) and Crocodilia (C. porosus). UniProt alignment tool (Clustal Omega) was utilized to interrogate the sequence alignment between each species for (B) heat shock protein family A member 2 (HSPA2), (C) protein kinase cAMP-dependent type I regulatory subunit alpha (PRKAR1A), (D) voltage-dependent anion channel 3 (VDAC3), (E) acrosin (ACR), (F) calcium-binding tyrosine phosphorylation regulated (CABYR) and (G) zona pellucida-binding protein (ZPBP). The median alignment (% Similarity) is denoted in gold next to each protein symbol. The conserved sperm proteomes at the level of species (45 proteins) and order (135 proteins) were subjected to analysis using the ingenuity pathway analysis (IPA). Heatmaps depict the comparative analysis of species and orders, with a refined focus on reproductive-related (H) molecular functions and (I) pathways. Pathways and function known to be important to motility, energy and egg interactions are highlight by blue, green and gold boxes, respectively.
Citation: Reproduction 169, 6; 10.1530/REP-25-0105
Further interrogation of these conserved proteins with the UniProt alignment tool (Clustal Omega (Sievers et al. 2011)) revealed that many proteins with known roles in sperm motility, mitochondria function, capacitation and acrosome reaction have high levels of sequence similarity (Fig. 3B, C, D); heat shock protein family A member 2 (HSPA2; 98.9%), protein kinase cAMP-dependent type I regulatory subunit alpha (PRKAR1A; 97.4%) and voltage dependent anion channel 3 (VDAC3; 96.8%). However, a notable divergence was observed for human VDAC3, which differs by ∼20% compared to the other eight species. Conversely, proteins critical for sperm motility, zona pellucida binding and penetration exhibited greater variability in sequence conservation (Fig. 3E, F, G); acrosin (ACR, 65.1%), calcium-binding tyrosine phosphorylation regulated (CABYR; 64.8%) and the zona pellucida-binding protein (ZPBP; 82.3%).
Both species and order lists were subjected to analysis with IPA, focussing on molecular functions and pathways. Strong consistency between both groups was observed, with the significant enrichment (P-value ≤0.05) of key reproductive processes, including synthesis of ATP, movement of cilia, acrosome reaction, binding of sperm and zona pellucida and fertilization (Fig. 3H, Table S16). Pathway analysis of the core sperm proteomes displayed significant enrichment of pathways involved in proteostasis, metabolism and oxidative stress (Fig. 3I, Table S16). The most significant enriched pathways were Bcl2-associated athanogene 2 (BAG2) and ubiquitin-like protein FAT10 signalling. Complementary STRING analysis identified distinct protein interaction networks (Fig. S2), with clusters of proteins being readily detected associated with chaperone functions, the proteasome, ribosome function, metabolism, sperm morphogenesis and zona pellucida binding. Further DAVID analysis of the core sperm proteome revealed significant enrichment of similar annotation clusters, including chaperone functions, the proteasome, ribosome function, glycolysis, the TCA cycle and flagella. Additional enriched annotation clusters included secretory granules, mitochondrial function and ATP binding. These findings collectively underscore the critical roles of these conserved proteins in ensuring sperm functionality across diverse species.
Knockout mouse models confirm conserved proteins affect sperm fertilization competency
To investigate the functional relevance of these 135 conserved sperm proteins, we leveraged resources from the International Mouse Phenotyping Consortium (IMPC) database (Dickinson et al. 2016; Groza et al. 2022) and the European Mouse Mutant Archive (EMMA) (Hagn et al. 2007) to obtain knockout (KO) models of these protein-coding genes of interest. This effort yielded five candidates: Aldh7a1 (aldehyde dehydrogenase 7 family member A1), Echs1 (enoyl-CoA hydratase, short chain 1), Etfb (electron transfer flavoprotein subunit beta), Ndufa10 (NADH:ubiquinone oxidoreductase subunit A10) and Pebp4 (phosphatidylethanolamine-binding protein 4). We first established the evolutionary conservation of these proteins across species (Fig. 4A), yielding relatively high conservation between species: ALDH7A1 (median 86.8%), ECHS1 (83.4%), ETFB (89.3%), NDUFA10 (77.5%) and PEBP4 (52.3%). Notably, species-specific variations were observed for LDH7A1 (rabbit) and PEBP4 (crocodile).
Core sperm protein mice knockouts affect sperm motility and fertilization capacity. (A) UniProt alignment tool (Clustal Omega) was used to align protein sequence for phosphatidylethanolamine-binding protein 4 (PEBP4), enoyl-CoA hydratase, short chain 1 (ECHS1), electron transfer flavoprotein subunit beta (ETFB), NADH:ubiquinone oxidoreductase subunit A10 (NDUFA10) and aldehyde dehydrogenase 7 family member A1 (ALDH7A1). Gene knockout (KO) models were generated and sperm from heterozygous males were used for IVF. From these IVF experiments, the heatmaps depict the percentage of (B) motile sperm and those with progressive motility for each KO compared to wildtype (WT). Fertilization capacity was tracked and heatmaps depict the (C) two-cell stage cleavage rate (%CR), blastocyte rate (%BR) and pregnancy rate (%PR). (D) Representative H&E images of the testis and epididymis from WT, Pebp4 −/− and Ndufa10 +/− 16-week-old mice, where MGCs are indicated by arrowheads. Scale bars = 250 and 50 μm for insert image. Quantification of number of MGCs/mm2 is represented by a bar chart as the mean ± SEM, with individual datapoints plotted; ***P < 0.0001.
Citation: Reproduction 169, 6; 10.1530/REP-25-0105
Through EMMA, we obtained unpublished in vitro fertilization (IVF) data pertaining to heterozygous and homozygous KO mice (Fig. 4). Notably, Echs1, Etfb and Ndufa10 are homozygous lethal, and as such, data presented for these genes are from heterozygous males. Total sperm motility analysis of male KO mice showed marked reductions for Aldh7a1 (27.7% decrease) and Etfb (21.7%), compared to wildtype (Fig. 4B). Forward progressive motility was further impaired, with proportional reductions of 36.5% (Etfb), 32.7% (Aldh7a1) and 25% (Ndufa10) (Fig. 4B). Milder decreases were observed for Pebp4 (13.5% loss) and Echs1 (9.6% loss). Next, using sperm from these KO males, we performed IVF to evaluate their fertilization potential. The first point of examination was the cleavage rate to the two-cell stage (Fig. 4C), which mirrored motility trends with marked reductions: Aldh7a1 (29%), Etfb (31.2%) and Ndufa10 (32.3%). Rates for Pebp4 and Echs1 remained within expected values. Blastocyst formation rates revealed severe effects, particularly for KOs of Ndufa10 and Etfb with success rates of 10 and 29%, respectively (Fig. 4C). Almost halving of success was observed for Pebp4 (53%), whilst Aldh7a1 (15%) and Echs1 (25%) were the least impacted. Pregnancy rates were low for all KOs compared to the expected rate of WTs, with Etfb being the most greatly affected (33%), followed by Echs1 (56%) and remaining KOs ranging from 61 to 68%. Where pregnancy was achieved, litter sizes were within the range of expected of age matched wildtype controls (Fig. S3A), with no significant shifts in foetal sex distribution (Fig. S3B).
In complement to the IVF studies, the histology of the male reproductive tract was examined to provide a detailed understanding of its structure and function at microscopic level, and explore any potential contributing effects played by these pivotal tissues. For each KO mouse line, haematoxylin and eosin (H&E) staining was carried out on sections of the testis and epididymis (spermatogenesis (Hermo et al. 2010a , b ) and sperm maturation (Nixon et al. 2020)), and the prostate gland and seminal vesicles (major contributors to the seminal plasma (Robert & Gagnon 1994, Schjenken et al. 2018)) at 16 weeks of age (Figs 4D and S4). The Pebp4 KO displayed the most pronounced abnormalities, including significant presence of MGCs (symplasts), associated with mild multifocal seminiferous tubular degeneration (Creasy et al. 2012), and presence of sloughed germ cells in the epididymal caudal tubules, a good indicator of spermatogenic disruption in the testis (De Grava Kempinas & Klinefelter 2014) (Figs 4D and S4). A similar phenotype of higher number of MGCs was found in Ndufa10 KOs (Figs 4D and S4), but did not achieve statistical significance (P-value = 0.054), and was accompanied by mild focal vacuolation. Across the five KOs, there were no histopathological changes observed for the prostate or seminal vesicles (Fig. S4), with the exception of Ndufa10 with hyperplasia in the anterior prostate (or coagulating gland) noted (Fig. S4).
Conserved proteins loss of function variants linked to human sperm defects
Seeking to further support the clinical relevance of these core sperm proteins, we interrogated our access to nearly 2,900 exomes and genomes of men with quantitative and/or qualitative sperm defects from the MERGE cohort (Stallmeyer et al. 2024). This analysis returned two men with homo- or hemizygous loss of function (LoF) variants: subject M2218 is homozygous for a stop-gain variant in parkin coregulated: PACRG; NM_152410.2:c.369T>A p.(Tyr123Ter). He repeatedly had normal sperm counts, but 99–100% sperm head defects and significantly impaired motility. Subject M3692 is homozygous for a frameshift variant in dynein axonemal light intermediate chain 1: DNALI1; NM_003462.5:c.490dup p.(Tyr164LeufsTer20). He repeatedly had normal sperm counts, but almost all sperm were immotile.
Characterization of sperm proteins solely identified in different species
The total number of protein IDs for each species was strongly correlated to the number of ‘unique’ (detectable) proteins in that species’ proteome (r s = 0.88, P = 0.002). While a considerable proportion of proteins in the human (66.3%) and mouse (30.1%) sperm proteomes were only identified in these species, all other species (with much more poorly characterized proteomes) contained <6% proteins not identified in any other species (Table 2 and S17).
The DAVID analysis of unique-to-species proteins highlighted significantly enriched annotation clusters in some species (Table S17). In mice, a range of enriched clusters were observed, including those involved in transcription (RNA splicing (enrichment score (ES) 18.8), small non-coding RNA processing (ES 5.2), RNA helicases (ES 4.2)) and translation (ribosome/mitochondrial translation (ES 8.8). In both cattle and the koala, there was enrichment for secreted proteins (ES 4.3 and 3.5 respectively); however, the proteins within these clusters were species-specific. Secreted proteins only identified in the cattle sperm proteome included beta defensins (DEFB108B and DEFB116), cytokines (IL12B, IL34, CCL2, and CTF1), proteins with antimicrobial activity (VIP and ADM) and RNase 1. In contrast, the enriched cluster of secreted proteins only identified in the koala sperm proteome largely contained pro-hormones (AVP, RLN2, PTH, INSL5 and POMC). Proteins uniquely identified in dolphin sperm showed weak but significant enrichment of functions associated with cell and gonadal differentiation (ES 2.6), including AMH, TSPY2 and TSPY8. There were no significantly enriched annotation clusters in unique-to-species proteins in the boar, rabbit, sheep or crocodile sperm proteomes.
As described above, depending on the species, 0.9–4.1% of protein identifications were not successfully converted to humanized IDs for further analysis. While many of these were poorly characterized proteins that may indeed have human equivalents, it is likely that some of these represent additional unique-to-species proteins. Examples of such proteins that are thought to be specific to one or several closely related species (albeit most with homologues in more diverse species) include mouse seminal vesicle proteins (SVS3A, SVS3B, SVS4, SVS5 and SVS6) (Karn et al. 2008), boar carbohydrate-binding protein AQN-1 (Kraus et al. 2005), bovine spermadhesin Z13 (Haase et al. 2005) and rabbit semen coagulum protein (SVP200) (Lundwall et al. 2020).
Reproductive strategies reflected in sperm proteomes
In seeking to investigate the influence of evolutionary and imposed reproductive strategies on sperm protein composition, we further interrogated each species proteome by focussing on sperm metabolism preference (glycolysis preference vs no preference), location of the testes (internal vs external) and history of selective breeding (yes vs no). Species were stratified into their appropriate groups and Venn diagram analyses were conducted to determine the proteins unique to each biological context. First, focussing on sperm metabolism, a comparison of species with a preference for glycolytic sperm metabolism against those with no preference between glycolysis and oxidative phosphorylation, resulted in a 49.7% shared overlap of proteins (1,857) (Fig. 5A). The larger component of this comparison was those species with no sperm metabolism preference (cattle, rabbit and sheep), with 1,385 unique proteins. Only species that preferentially use glycolysis showed enrichment for a variety of metabolically relevant pathways, including degradation of amino acids (valine, isoleucine, tryptophan, leucine, cysteine, phenylalanine, alanine and glutamine) and N-acetylglucosamine (Table S18).
Reproductive strategies analyses. Sperm proteomes were stratified into three analyses, investigating the influence of evolutionary and imposed reproductive strategies on sperm protein composition, focussing on (A) sperm metabolism preference (glycolysis preference vs no preference), (B) location of the testes (internal vs external) and (C) history of selective breeding (yes vs no). Each analysis included Venn diagrams to determine unique proteins to each biological context, which were further subject to analysis using the ingenuity pathway analysis (IPA). Heatmaps depict the comparative analysis of the resultant molecular functions.
Citation: Reproduction 169, 6; 10.1530/REP-25-0105
A subsequent comparison of sperm proteomes based on species with internal vs external testes revealed that 41.5% of proteins (1,825) were shared, with only 13.1% (574) proteins unique to species with internal testes (crocodile and dolphin) (Fig. 5B). Species with external testes showed significant enrichment for signalling involving a range of interleukins (e.g. IL-3, -13 and -17), reactive oxygen species production and NRF2-mediated oxidative stress response pathways and clathrin-mediated endocytosis signalling, adherens junction signalling, HIF1α signalling, acute phase response signalling and integrin signalling. By comparison, species with internal testes showed significant enrichment for pathways involved in glycine and cholesterol synthesis (Table S18).
Finally, species were compared based on whether they have a history of selective breeding; a strategy that revealed 42.9% (1,853) proteins was shared regardless of breeding practices (Fig. 5C). However, species under selective breeding pressures (cattle, boar, rabbit and sheep) yielded the greater number of unique proteins, amounting to a total of 1,853 proteins (42.1%). Conversely, sperm from those species without selective pressures were associated with the unique expression of 658 proteins (8.2%). Selectively bred species showed higher enrichment for endothelin-1, EIF2 and CDK5 signalling. In comparison, species with no history of selective breeding showed higher enrichment for adherens junction signalling (Table S18). Unique and overlapping protein IDs for each of the three biological contexts comparisons are supplied in Table S18.
Discussion
From a total of 29 datasets across 12 vertebrate species (>2 TB of RAW data), we have generated the most comprehensive cross-species analysis of the mature sperm proteome to date, identifying a grand total of 13,853 unique proteins residing in the sperm of those species studied herein. Within this dataset, we discerned a core sperm proteome of 45 proteins conserved at the species level and 135 proteins conserved at the order level, underscoring a fundamental molecular framework essential for generating fertilization-competent spermatozoa. Enrichment analyses linked these core proteins to critical pathways in proteostasis and sperm metabolism, while knockout mouse models of selected conserved proteins confirmed their influential roles in governing sperm motility, fertilization success and overall reproductive health. Together, our findings demonstrate that despite the remarkable diversity in reproductive strategies observed across vertebrates, including differences in sperm metabolic preferences, testicular location and histories of selective breeding, a fundamental set of molecular components remain steadfastly conserved. This universal baseline provides a platform upon which species-specific adaptations are layered, shaping the unique sperm proteomes that ultimately define each organism’s reproductive biology. Complementing the full breadth of analyses discussed here, we have deployed a Shiny application, ShinySpermKingdom (https://reproproteomics.shinyapps.io/ShinySpermKingdom/), to support the accessibility and interpretability of these datasets beyond static documents, allowing for effective data-driven insights by researchers in the field (Skerrett-Byrne et al. 2024).
Proteomics technologies in reproductive biology have exploded more than 20-fold increase in 20 years, parallelling advancements in mass spectrometry (MS) and bioinformatic pipelines (Skerrett-Byrne et al. 2024). As a field, proteomics is reaching an intriguing point, as these RAW datasets can be viewed as ‘living datasets’, from which we can continuously yield new insights as bioinformatic tools, database annotations and computational capacities advance (Drew et al. 2017, Dai et al. 2024). Whilst there are several variables affecting their ‘mine-ability’, early analyses often only scratch the surface of what these complex spectra contain; proteins may remain unidentified simply because the necessary reference sequences or annotation frameworks are inadequately annotated at the time (Willems et al. 2020). Through these advancements, the continuous reanalysis of MS datasets is not merely a luxury, but a crucial endeavour to fully leverage these rich repositories of biological information, ensuring that the scientific community continues to extract meaningful knowledge from the collective body of proteomic data. In complement to this, large-scale proteomic atlases, such as the human and mouse draft proteomes (Wang et al. 2019; Giansanti et al. 2022), provide a systematic framework for characterisation that could be adapted to non-model species, ultimately capturing the full diversity of sperm proteomes across taxa.
Of the >2,000 sperm proteome studies found on PubMed (Skerrett-Byrne et al. 2024), a mere 218 studies have data deposited in a publicly available repository, a poor return of ∼10%. Until now, to our knowledge, there have yet to be any studies reanalysing the RAW MS data of previous studies focused on mature sperm cells; here, we have demonstrated the capacity to assemble proteomic profiles for 12 taxa, achieving an unprecedented depth of coverage and unveiling numerous unique and conserved proteins. Comparing the proteomic depths of the studies included to generate these new sperm proteomes, we demonstrate improved depths in rabbit (1,360 (Juárez et al. 2020) to 1,605; 18% increase) and crocodile (1,119 (Nixon et al. 2019) to 2,855; 155% increase). Despite these strides, the proteomic annotation of non-model species still lags behind that of humans and established model organisms, highlighted best by the discrepancy between human and koala sperm proteome, with the latter being only 6.3% the size of the former. Consequently, the wealth of newly identified proteins reported here, while indicative of greater analytical depth, is still constrained by the uneven quality of reference data, highlighting the need for greater annotation placed upon non-model species in NCBI and UniProt databases, and the pursuit of new analytical tools (Heck & Neely 2020, Van den Broeck et al. 2023).
Interestingly, even beyond direct protein-to-protein comparisons, our functional analyses highlight complex patterns of convergence and divergence that transcend traditional phylogenetic boundaries. For instance, despite their substantial evolutionary distance, koalas and sheep emerge as functionally aligned in terms of pathways underpinning their sperm proteomes, underscoring the powerful influence of reproductive strategies and physiological demands on proteome composition. Conversely, we observed species-specific pathways enrichment, with crocodiles displaying the great enrichment of fatty acid β-oxidation, an important source of long-term sustained energy as sperm ascends several metres of the female reproductive tract (Gist et al. 2008, Nixon et al. 2019). These cross-species comparisons highlight both the remarkable flexibility and deep conservation of sperm biology.
The identification of a core sperm proteome comprising 45 proteins at the species level and 135 at the order level underscores the presence of a conserved molecular foundation essential for sperm function across evolutionary lineages. Notably, given the size of the smallest proteome, the koala (556 proteins), these conserved proteins represent a 24.3% level of conservation at the order level, highlighting the functional importance of these proteins despite proteome size variation. Focussing on the sequence conservation of these core sperm proteins, we observed a median of 94.4% conservation at the amino acid level. Proteins such as heat shock protein A2 (HSPA2) and voltage dependent anion channel 3 (VDAC3) maintain exceptionally high sequence conservation, reflecting their critical and broadly indispensable roles in processes such as protein folding, sperm maturation and capacitation (Sampson et al. 2001, Arcelay et al. 2008, Redgrove et al. 2012, Nixon et al. 2017, Li et al. 2023). By contrast, proteins critical for zona pellucida binding and penetration, such as acrosin (ACR) (Dudkiewicz 1983, Liu & Gordon Baker 1993, Hua et al. 2023), ZPBP (Lin et al. 2007, Dun et al. 2010) and calcium-binding tyrosine-phosphorylation regulated protein (CABYR) (Naaby-Hansen et al. 2002, Skerrett-Byrne et al. 2022), exhibit greater sequence variability, perhaps signifying adaptations to specific reproductive environments or selective pressures. Notably, expected proteins such as histones and protamines were not highly conserved across all species, with about 16 histone proteins retained across at least 4/9 species, with testis-specific Histone H2B type 1-A (H2BC1) (Montellier et al. 2013) detected in 6/9 species. Once more, pointing to the need for greater annotation of non-model species.
This is supported by previous work which sought to compare the sperm proteome within three closely related Mus species that experience different levels of sexual selection (Vicens et al. 2017). Whilst not a defined difference in protein sequence, this work highlighted significant interspecific protein abundance divergence of proteins which govern sperm–egg interactions, including ACR and ZPBPs, and conversely no significant shifts in HSPA2 or protein kinase cAMP-dependent type I regulatory subunit alpha (PRKAR1A). Seeking to understand how these core sperm proteins may interact, in-silico enrichment analyses grouped them into known networks integral to sperm motility (flagellar motility), energy source of the sperm cell (oxidative phosphorylation (OXPHOS)) and fertilization, encompassing zona pellucida binding and acrosome reaction. A key study from 2016, included in our work, carried out proteomics on sperm collected from 19 placental mammalian species and identified a core sperm proteome of 623 proteins (Bayram et al. 2016). Importantly, this study leverages two advantages: i) comparatively narrower range of species, ungulates and rodents; and ii) all proteomics sample preparation, MS analysis and data processing are uniform. However, there is considerable overlap with our study in regards to the significantly enriched pathways and functions which connect the core sperm proteome, namely metabolic processes such as OXPHOS, glycolysis and the tricarboxylic acid cycle (TCA), acrosome assembly, zone pellucida binding and proteasome function (Bayram et al. 2016). Notably, the emergence of the highly enriched proteostasis-associated pathways, involving proteins such as BAG2 and FAT10, provides intriguing hints of previously underappreciated quality control processes that may safeguard sperm function by ensuring proper protein folding, timely degradation, regulation of capacitation and acrosomal reaction, thereby facilitating successful fertilization (Cafe et al. 2021, Smyth et al. 2024). For instance, the enrichment of BAG2 in the conserved sperm proteome, known to modulate HSPA2 in spermatogenesis (Yin et al. 2020), raises the potential of a post-testicular role in maintaining HSPA2 functionality in facilitating sperm–egg adhesion and binding (Nixon et al. 2015, Smyth et al. 2024).
Seeking to build upon this important database and demonstrate how these analyses can help build new knowledge on sperm biology, we were granted special access to the European Mouse Mutant Archive (EMMA) (Hagn et al. 2007) to obtain sperm and IVF data from knockout (KO) mouse models targeting selected proteins from the core proteome. We elected to focus on proteins which had not previously been implicated in sperm morphology or functional maturation; metabolic enzyme ALDH7A1 (aldehyde dehydrogenase 7 family member A1) (Korasick & Tanner 2021); mitochondrial enzymes ECHS1 (enoyl-CoA hydratase, short chain 1) (Burgin & McKenzie 2020) and ETFB (electron transfer flavoprotein subunit beta) (Henriques et al. 2021); mitochondrial complex I subunit NDUFA10 (NADH:ubiquinone oxidoreductase subunit A10) (Formosa et al. 2018); and a member of the phosphatidylethanolamine-binding proteins, PEBP4, although detected in bull sperm and semen previously (An et al. 2012, Somashekar et al. 2017), no studies have demonstrated its role in sperm function. Here, for the first time, KOs of these five core genes have been shown to directly influence sperm quality, motility and fertilization potential. Intriguingly, the largest decreases observed in total sperm motility and progressive motility were associated with the four proteins located in the mitochondrial matrix, each of which play key roles in mitochondrial function. Given the pivotal role mitochondria play in the energy required to drive sperm motility (Piomboni et al. 2012), it lends to the reason that these proteins may have regulatory roles in ensuring efficient OXPHOS/ATP production in sperm mitochondria. Further to the KO mouse work, we sought to investigate the clinical relevance of these proteins in the context of human infertility. Utilising the MERGE cohort, we queried our 135 conserved sperm proteins against the almost 2,900 males with quantitative and/or qualitative sperm defects, identifying two men with homozygous loss-of-function variants in PACRG and DNALI1. Notably, a recent mouse KO study demonstrated that PACRG interacts within a manchette-associated complex, and is essential for proper sperm assembly (Yap et al. 2023). Moreover, KO mice exhibited a significant reduction in sperm count, with the remaining sperm characterized by abnormally shaped heads and bent tails. DNALI1 is a component of the inner dynein arms and in men affected by biallelic pathogenic variants, immotile sperm exhibiting an asymmetric fibrous sheath of the flagella have been described (Sha et al. 2022). Together, these experimental models lend strong empirical support to the notion that the core sperm proteome is not merely a historical residue of evolutionary conservation, but rather a functional blueprint vital for the generation and maintenance of fertilization-competent sperm across species.
Beyond their universally conserved core, sperm proteomes also reflect diverse biological contexts that shape reproductive strategies and outcomes. For instance, contrasting testicular environments, as represented by species with either external or internal testes, impart distinct proteomic signatures on mature sperm. Species with external testes exhibit enrichments in signalling pathways associated with interleukin-mediated communication, reactive oxygen species management and NRF2-mediated oxidative stress responses, and pathways related to cell–cell interaction and endocytic processes. These findings suggest that externalized gonads, potentially evolving under selective pressures linked to temperature regulation or hypoxic conditions, have prompted the refinement of sperm’s molecular toolkit to bolster resilience and maintain fertilisation capacity. Conversely, species with internal testes show a bias towards pathways involving glycine and cholesterol synthesis, hinting at metabolic adjustments that support sperm function in a more thermally stable yet potentially resource-limited environment.
Selective historical breeding regimes and unique metabolic preferences leave distinguishable marks on the sperm proteome. For instance, microRNA biogenesis and endothelin-1 signalling are enriched in domesticated species, reflecting human-driven selection for traits impacting sperm maturation and early embryonic development through epigenetic and signalling mechanisms (Sharma et al. 2018, Trigg et al. 2021, Conine & Rando 2022, Spadafora 2023, Tomar et al. 2024). In contrast, non-selected species’ enrichment for adherens junction signalling points to a more basal state of sperm cell–cell interaction and membrane integrity (Lui et al. 2003, Wen et al. 2016). In addition, the preferential use of glycolysis as an energy source (du Plessis et al. 2015) in certain taxa is mirrored by an increased representation of amino acid degradation and N-acetylglucosamine metabolism pathways, underscoring how ecological pressures shape sperm energetics. In essence, the sperm proteome reflects not only a shared, evolutionarily ancient blueprint but also the distinct biological contexts that steer reproductive strategies and outcomes in varied environmental and evolutionary landscapes.
Despite the breadth and depth of the current dataset, several limitations constrain the interpretation and generalisability of our findings. Foremost, the use of publicly available proteomic data, generated over a span of years and employing diverse sample preparation protocols, mass spectrometry platforms and analytical pipelines, precludes direct, quantitative ‘apples to apples’ comparisons between species. Although advancements discussed, many datasets remain hampered by incomplete metadata, and limited raw data availability hinder reproducibility and the clarity of species-to-species comparisons. In addition, our reliance on human orthologues for pathway analyses inevitably restricts our ability to fully explore lineage-specific adaptations or signatures of selective pressure acting on particular sperm proteins. Similarly, we have focused exclusively on chordate species, thus excluding the vast diversity of reproductive strategies found in invertebrates and other non-chordate taxa. Moreover, the marked disparity in proteome sizes across species highlights that coverage bias remains a significant concern, but regardless our current dataset offers a glimpse of the full sperm proteome complexity. Recognizing these acquisition biases paves the way for new funding proposals aimed at refining reference annotations and expanding high-quality MS coverage across diverse species. By systematically addressing these limitations, the field can move closer to an accurate and comprehensive understanding of sperm proteomes at a global scale. Until then, our findings, while important and innovative, should be viewed as a framework, upon which the building of the global sperm proteome landscape becomes more thoroughly mapped and understood.
Future investigations will benefit from a more controlled and integrated experimental framework, enabling true comparisons across species. Standardising sample collection protocols, employing uniform mass spectrometry methodologies and harmonising data processing pipelines will help overcome current disparities in data quality and annotation. As proteome databases are further refined, and as more non-model species achieve higher-quality genome assemblies, future analyses will be better positioned to distinguish genuine species-specific proteins from those simply missing in incomplete datasets. In addition, exploring the role of post-translational modifications (PTMs), such as phosphorylation, acetylation and glycosylation, across multiple taxa will be critical for understanding how subtle regulatory mechanisms modulate sperm functionality. Indeed, since sperm protein composition is largely defined during epididymal maturation (Skerrett-Byrne et al. 2022), incorporating temporal and spatial sampling strategies alongside PTM profiling may reveal how species adapt their sperm at the molecular level to an array of environmental pressures.
Ultimately, comprehensive multispecies studies employing rigorous and consistent workflows have the potential to create living, expandable proteomic databases that will evolve as analytical capabilities and reference annotations improve. By repeatedly revisiting and reanalysing mass spectrometry datasets, the field can unlock layers of complexity that have, until now, remained concealed. Such iterative proteomic approaches, informed by emerging knowledge of sperm physiology and integrated with genomic, transcriptomic and epigenomic data, will open new avenues for understanding the evolutionary and functional contexts of sperm proteomes. Here, we demonstrate the power of systematically revisiting and reanalysing proteomic data, identifying a set of 135 conserved proteins critical to sperm function, including several that have never been previously implicated in sperm biology. This study underscores the importance of continuous proteomic refinement, allowing for the identification of previously overlooked but evolutionally and functionally essential proteins. By embracing this approach, future research to not only refine our grasp of the universal underpinnings of sperm biology but also reveal how species-specific adaptations arise and shape reproductive success across diverse taxa.
Supplementary materials
This is linked to the online version of the paper at https://doi.org/10.1530/REP-25-0105.
Declaration of interest
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the work reported.
Funding
This research was supported by the National Health and Medical Research Council of Australia (NHMRC) Emerging Leadership Fellowship (APP2034392) and the College of Engineering, Science and Environment (University of Newcastle) Accelerator Fellowship, both awarded to D A S B. In addition, F T was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) Clinical Research Unit ‘Male Germ Cells’ (CRU326, project number 329621271). F T and S K were supported by the German Federal Ministry for Education and Research (BMBF) as part of the Junior Scientist Research Centre ‘ReproTrack MS’ (grant 01GR2303).
Author contribution statement
Conceptualisation was done by T P, B N and D A S B. Methodology was provided by T P and D A S B. D A S B helped with software. Investigation was performed by T P, B N, T K, R T, A S M, P S B, F T, T S, S K, V G D, H F, S M, M H A and D A S B. Formal analysis was done by D A S B. Validation was done by R T, A S M, P S B, V G D, H F, S M, M H A and D A S B. Visualisation was performed by T P and D A S B. T P and D A S B helped with the writing of the original draft. Writing of the review and editing was done by B N, T K, R T, A S M, P S B, F T, T S, S K, V G D, H F, S M and M H A. Funding acquisition was performed by B N and D A S B. Resources were maintained by B N R T, F T, T S, S K, V G D, H F, S M, M H A and D A S B. Supervision was conducted by B N and D A S B.
Acknowledgements
We thank the Academic and Research Computing Support team, The University of Newcastle, who provided High Performance Computing Infrastructure to support the bioinformatics analyses. We also thank David MacKenzie for his graphic design input and support. We thank Steffie Dunst and Bernhard Rey for technical support with the sperm and IVF culture experiments. We thank the technicians and animal caretakers of the German Mouse Clinic. We thank Dr Wei Zhou for his insightful questions at the Society for Reproductive Biology Annual Scientific Meeting.
References
An L-P , Maeda T , Sakaue T , et al. 2012 Purification, molecular cloning and functional characterization of swine phosphatidylethanolamine-binding protein 4 from seminal plasma. Biochem Biophys Res Commun 423 690–696. (https://doi.org/10.1016/j.bbrc.2012.06.016)
Arcelay E , Salicioni AM , Wertheimer E , et al. 2008 Identification of proteins undergoing tyrosine phosphorylation during mouse sperm capacitation. Int J Dev Biol 52 463–472. (https://doi.org/10.1387/ijdb.072555ea)
Bardou P , Mariette J , Escudié F , et al. 2014 Jvenn: an interactive Venn diagram viewer. BMC Bioinformatics 15 1–7. (https://doi.org/10.1186/1471-2105-15-293)
Batra V , Bhushan V , Ali SA , et al. 2021 Buffalo sperm surface proteome profiling reveals an intricate relationship between innate immunity and reproduction. BMC Genom 22 480. (https://doi.org/10.1186/s12864-021-07640-z)
Bayram HL , Claydon AJ , Brownridge PJ , et al. 2016 Cross-species proteomics in analysis of mammalian sperm proteins. J Proteomics 135 38–50. (https://doi.org/10.1016/j.jprot.2015.12.027)
Buchfink B , Reuter K & Drost H-G 2021 Sensitive protein alignments at tree-of-life scale using diamond. Nat Methods 18 366–368. (https://doi.org/10.1038/s41592-021-01101-x)
Burgin HJ & McKenzie M 2020 Understanding the role of OXPHOS dysfunction in the pathogenesis of ECHS1 deficiency. FEBS Lett 594 590–610. (https://doi.org/10.1002/1873-3468.13735)
Byrne K , Leahy T , McCulloch R , et al. 2012 Comprehensive mapping of the bull sperm surface proteome. Proteomics 12 3559–3579. (https://doi.org/10.1002/pmic.201200133)
Cafe SL , Nixon B , Ecroyd H , et al. 2021 Proteostasis in the male and female germline: a new outlook on the maintenance of reproductive health. Front Cell Dev Biol 9 660626. (https://doi.org/10.3389/fcell.2021.660626)
Casares-Crespo L , Fernández-Serrano P & Viudes-de-Castro MP 2019 Proteomic characterization of rabbit (Oryctolagus cuniculus) sperm from two different genotypes. Theriogenology 128 140–148. (https://doi.org/10.1016/j.theriogenology.2019.01.026)
Castaneda JM , Hua R , Miyata H , et al. 2017 TCTE1 is a conserved component of the dynein regulatory complex and is required for motility and metabolism in mouse spermatozoa. Proc Natl Acad Sci U S A 114 E5370–E5378. (https://doi.org/10.1073/pnas.1621279114)
Castillo J , Bogle OA , Jodar M , et al. 2019 Proteomic changes in human sperm during sequential in vitro capacitation and acrosome reaction. Front Cell Dev Biol 7 295. (https://doi.org/10.3389/fcell.2019.00295)
Chen T , Ma J , Liu Y , et al. 2022 iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res 50 D1522–D1527. (https://doi.org/10.1093/nar/gkab1081)
Choi M , Carver J , Chiva C , et al. 2020 MassIVE.quant: a community resource of quantitative mass spectrometry–based proteomics datasets. Nat Methods 17 981–984. (https://doi.org/10.1038/s41592-020-0955-0)
Comizzoli P & Holt WV 2019 Breakthroughs and new horizons in reproductive biology of rare and endangered animal species. Biol Reprod 101 514–525. (https://doi.org/10.1093/biolre/ioz031)
Conine CC & Rando OJ 2022 Soma-to-germline RNA communication. Nat Rev Genet 23 73–88. (https://doi.org/10.1038/s41576-021-00412-1)
Creasy D , Bube A , Rijk Ed , et al. 2012 Proliferative and nonproliferative lesions of the rat and mouse male reproductive system. Toxicol Pathol 40 40S–121S. (https://doi.org/10.1177/0192623312454337)
Dai C , Pfeuffer J , Wang H , et al. 2024 quantms: a cloud-based pipeline for quantitative proteomics enables the reanalysis of public proteomics data. Nat Methods 21 1603–1607. (https://doi.org/10.1038/s41592-024-02343-1)
De Grava Kempinas W & Klinefelter GR 2014 Interpreting histopathology in the epididymis. Spermatogenesis 4 e979114. (https://doi.org/10.4161/21565562.2014.979114)
Deutsch EW , Bandeira N , Perez-Riverol Y , et al. 2023 The ProteomeXchange consortium at 10 years: 2023 update. Nucleic Acids Res 51 D1539–D1548. (https://doi.org/10.1093/nar/gkac1040)
Dickinson ME , Flenniken AM , Ji X , et al. 2016 High-throughput discovery of novel developmental phenotypes. Nature 537 508–514. (https://doi.org/10.1038/nature19356)
Drew K , Lee C , Huizar RL , et al. 2017 Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol Syst Biol 13 932. (https://doi.org/10.15252/msb.20167490)
du Plessis SS , Agarwal A , Mohanty G , et al. 2015 Oxidative phosphorylation versus glycolysis: what fuel do spermatozoa use? Asian J Androl 17 230–235. (https://doi.org/10.4103/1008-682x.135123)
Dudkiewicz AB 1983 Inhibition of fertilization in the rabbit by anti-acrosin antibodies. Gamete Res 8 183–197. (https://doi.org/10.1002/mrd.1120080207)
Duffy JMN , Adamson GD , Benson E , et al. 2020 Top 10 priorities for future infertility research: an international consensus development study† ‡ . Hum Reprod 35 2715–2724. (https://doi.org/10.1093/humrep/deaa242)
Dun MD , Mitchell LA , Aitken RJ , et al. 2010 Sperm–zona pellucida interaction: molecular mechanisms and the potential for contraceptive intervention. Fertil Control 198 139–178. (https://doi.org/10.1007/978-3-642-02062-9_9)
Fitzpatrick JL , Kahrl AF & Snook RR 2022 SpermTree, a species-level database of sperm morphology spanning the animal tree of life. Sci Data 9 30. (https://doi.org/10.1038/s41597-022-01131-w)
Formosa LE , Dibley MG , Stroud DA , et al. 2018 Building a complex complex: assembly of mitochondrial respiratory chain complex I. Semin Cell Dev Biol 76 154–162. (https://doi.org/10.1016/j.semcdb.2017.08.011)
Fu Q , Pan L , Huang D , et al. 2019 Proteomic profiles of buffalo spermatozoa and seminal plasma. Theriogenology 134 74–82. (https://doi.org/10.1016/j.theriogenology.2019.05.013)
Fuchs H , Aguilar-Pimentel JA , Amarie OV , et al. 2018 Understanding gene functions and disease mechanisms: phenotyping pipelines in the German Mouse Clinic. Behav Brain Res 352 187–196. (https://doi.org/10.1016/j.bbr.2017.09.048)
Fuentes-Albero MC , González-Brusi L , Cots P , et al. 2021 Protein identification of spermatozoa and seminal plasma in bottlenose dolphin (Tursiops truncatus). Front Cell Dev Biol 9 673961. (https://doi.org/10.3389/fcell.2021.673961)
Giansanti P , Samaras P , Bian Y , et al. 2022 Mass spectrometry-based draft of the mouse proteome. Nat Methods 19 803–811. (https://doi.org/10.1038/s41592-022-01526-y)
Gist DH , Bagwill A , Lance V , et al. 2008 Sperm storage in the oviduct of the American alligator. J Exp Zool A Ecol Genet Physiol 309 581–587. (https://doi.org/10.1002/jez.434)
Götz S , García-Gómez JM , Terol J , et al. 2008 High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36 3420–3435. (https://doi.org/10.1093/nar/gkn176)
Groza T , Gomez FL , Mashhadi HH , et al. 2022 The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Res 51 D1038–D1045. (https://doi.org/10.1093/nar/gkac972)
Guyonnet B , Zabet-Moghaddam M , SanFrancisco S , et al. 2012 Isolation and proteomic characterization of the mouse sperm acrosomal matrix. Mol Cell Proteomics 11 758–774. (https://doi.org/10.1074/mcp.m112.020339)
Guyonnet B , Egge N & Cornwall GA 2014 Functional amyloids in the mouse sperm acrosome. Mol Cell Biol 34 2624–2634. (https://doi.org/10.1128/mcb.00073-14)
Haase B , Schlötterer C , Hundrieser ME , et al. 2005 Evolution of the spermadhesin gene family. Gene 352 20–29. (https://doi.org/10.1016/j.gene.2005.04.015)
Hagn M , Marschall S & Hrabè de Angelis M 2007 EMMA – the European mouse mutant archive. Brief Funct Genomics 6 186–192. (https://doi.org/10.1093/bfgp/elm018)
Heck M & Neely BA 2020 Proteomics in non-model organisms: a new analytical frontier. J Proteome Res 19 3595–3606. (https://doi.org/10.1021/acs.jproteome.0c00448)
Henriques BJ , Katrine Jentoft Olsen R , Gomes CM , et al. 2021 Electron transfer flavoprotein and its role in mitochondrial energy metabolism in health and disease. Gene 776 145407. (https://doi.org/10.1016/j.gene.2021.145407)
Hermo L , Pelletier RM , Cyr DG , et al. 2010a Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73 241–278. (https://doi.org/10.1002/jemt.20783)
Hermo L , Pelletier RM , Cyr DG , et al. 2010b Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 2: changes in spermatid organelles associated with development of spermatozoa. Microsc Res Tech 73 279–319. (https://doi.org/10.1002/jemt.20787)
Holt WV & Lloyd RE 2010 Sperm storage in the vertebrate female reproductive tract: how does it work so well? Theriogenology 73 713–722. (https://doi.org/10.1016/j.theriogenology.2009.07.002)
Hua R , Xue R , Liu Y , et al. 2023 ACROSIN deficiency causes total fertilization failure in humans by preventing the sperm from penetrating the zona pellucida. Hum Reprod 38 1213–1223. (https://doi.org/10.1093/humrep/dead059)
Huang DW , Sherman BT & Lempicki RA 2009 Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4 44–57. (https://doi.org/10.1038/nprot.2008.211)
Hulsen T 2022 DeepVenn – a web application for the creation of area-proportional Venn diagrams using the deep learning framework Tensorflow.js. arXiv preprint arXiv: 2210.04597.
Johnston SD , McGowan MR , Phillips NJ , et al. 2000 Optimal physicochemical conditions for the manipulation and short-term preservation of koala (phascolarctos cinereus) spermatozoa. J Reprod Fertil 118 273–281. (https://doi.org/10.1530/jrf.0.1180273)
Johnston SD , Zee YP , López-Fernández C , et al. 2012 The effect of chilled storage and cryopreservation on the sperm DNA fragmentation dynamics of a captive population of koalas. J Androl 33 1007–1015. (https://doi.org/10.2164/jandrol.111.015248)
Juárez JD , Marco-Jiménez F , Talavan A , et al. 2020 Evaluation by re-derivation of a paternal line after 18 generations on seminal traits, proteome and fertility. Livest Sci 232 103894. (https://doi.org/10.1016/j.livsci.2019.103894)
Karn RC , Clark NL , Nguyen ED , et al. 2008 Adaptive evolution in rodent seminal vesicle secretion proteins. Mol Biol Evol 25 2301–2310. (https://doi.org/10.1093/molbev/msn182)
Kasvandik S , Sillaste G , Velthut-Meikas A , et al. 2015 Bovine sperm plasma membrane proteomics through biotinylation and subcellular enrichment. Proteomics 15 1906–1920. (https://doi.org/10.1002/pmic.201400297)
Korasick DA & Tanner JJ 2021 Impact of missense mutations in the ALDH7A1 gene on enzyme structure and catalytic function. Biochimie 183 49–54. (https://doi.org/10.1016/j.biochi.2020.09.016)
Krämer A , Green J , Pollard J Jr , et al. 2013 Causal analysis approaches in ingenuity pathway analysis. Bioinformatics 30 523–530. (https://doi.org/10.1093/bioinformatics/btt703)
Kraus M , Tichá M , Železná B , et al. 2005 Characterization of human seminal plasma proteins homologous to boar AQN spermadhesins. J Reprod Immunol 65 33–46. (https://doi.org/10.1016/j.jri.2004.10.001)
Labas V , Grasseau I , Cahier K , et al. 2015 Qualitative and quantitative peptidomic and proteomic approaches to phenotyping chicken semen. J Proteomics 112 313–335. (https://doi.org/10.1016/j.jprot.2014.07.024)
Leahy T , Rickard JP , Pini T , et al. 2020 Quantitative proteomic analysis of seminal plasma, sperm membrane proteins, and seminal extracellular vesicles suggests vesicular mechanisms aid in the removal and addition of proteins to the ram sperm membrane. Proteomics 20 e1900289. (https://doi.org/10.1002/pmic.201900289)
Letunic I & Bork P 2007 Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23 127–128. (https://doi.org/10.1093/bioinformatics/btl529)
Letunic I & Bork P 2011 Interactive tree of life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39 W475–W478. (https://doi.org/10.1093/nar/gkr201)
Li C , Yu R , Liu H , et al. 2023 Sperm acrosomal released proteome reveals MDH and VDAC3 from mitochondria are involved in acrosome formation during spermatogenesis in Eriocheir sinensis. Gene 887 147784. (https://doi.org/10.1016/j.gene.2023.147784)
Lin YN , Roy A , Yan W , et al. 2007 Loss of zona pellucida binding proteins in the acrosomal matrix disrupts acrosome biogenesis and sperm morphogenesis. Mol Cell Biol 27 6794–6805. (https://doi.org/10.1128/mcb.01029-07)
Liu DY & Gordon Baker H 1993 Inhibition of acrosin activity with a trypsin inhibitor blocks human sperm penetration of the zona pellucida. Biol Reprod 48 340–348. (https://doi.org/10.1095/biolreprod48.2.340)
Liu F , Liu X , Liu X , et al. 2019 Integrated analyses of phenotype and quantitative proteome of CMTM4 deficient mice reveal its association with male fertility. Mol Cell Proteomics 18 1070–1084. (https://doi.org/10.1074/mcp.ra119.001416)
Lui WY , Lee WM & Cheng CY 2003 Sertoli-germ cell adherens junction dynamics in the testis are regulated by RhoB GTPase via the ROCK/LIMK signaling pathway. Biol Reprod 68 2189–2206. (https://doi.org/10.1095/biolreprod.102.011379)
Lundwall Å , Persson M , Hansson K , et al. 2020 Identification of the major rabbit and Guinea pig semen coagulum proteins and description of the diversity of the REST gene locus in the mammalian clade Glires. PLoS One 15 e0240607. (https://doi.org/10.1371/journal.pone.0240607)
Martin JH , Mohammed R , Delforce SJ , et al. 2022 Role of the prorenin receptor in endometrial cancer cell growth. Oncotarget 13 587–599. (https://doi.org/10.18632/oncotarget.28224)
Mohanty G , Swain N & Samanta L 2015 Sperm proteome: what is on the horizon? Reprod Sci 22 638–653. (https://doi.org/10.1177/1933719114558918)
Montellier E , Boussouar F , Rousseaux S , et al. 2013 Chromatin-to-nucleoprotamine transition is controlled by the histone H2B variant TH2B. Genes Dev 27 1680–1692. (https://doi.org/10.1101/gad.220095.113)
Murphy EM , Murphy C , O'Meara C , et al. 2017 A comparison of semen diluents on the in vitro and in vivo fertility of liquid bull semen. J Dairy Sci 100 1541–1554. (https://doi.org/10.3168/jds.2016-11646)
Murray HC , Enjeti AK , Kahl RGS , et al. 2021 Quantitative phosphoproteomics uncovers synergy between DNA-PK and FLT3 inhibitors in acute myeloid leukaemia. Leukemia 35 1782–1787. (https://doi.org/10.1038/s41375-020-01050-y)
Naaby-Hansen S , Mandal A , Wolkowicz MJ , et al. 2002 CABYR, a novel calcium-binding tyrosine phosphorylation-regulated fibrous sheath protein involved in capacitation. Dev Biol 242 236–254. (https://doi.org/10.1006/dbio.2001.0527)
Nixon B , Bromfield EG , Dun MD , et al. 2015 The role of the molecular chaperone heat shock protein A2 (HSPA2) in regulating human sperm-egg recognition. Asian J Androl 17 568–573. (https://doi.org/10.4103/1008-682x.151395)
Nixon B , Bromfield EG , Cui J , et al. 2017 Heat shock protein A2 (HSPA2): regulatory roles in germ cell development and sperm function. Adv Anat Embryol Cell Biol 222 67–93. (https://doi.org/10.1007/978-3-319-51409-3_4)
Nixon B , Johnston SD , Skerrett-Byrne DA , et al. 2019 Modification of crocodile spermatozoa refutes the tenet that post-testicular sperm maturation is restricted to mammals. Mol Cell Proteomics 18 S58–S76. (https://doi.org/10.1074/mcp.ra118.000904)
Nixon B , Cafe SL , Eamens AL , et al. 2020 Molecular insights into the divergence and diversity of post-testicular maturation strategies. Mol Cell Endocrinol 517 110955. (https://doi.org/10.1016/j.mce.2020.110955)
Oliva R , Martínez-Heredia J & Estanyol JM 2008 Proteomics in the study of the sperm cell composition, differentiation and function. Syst Biol Reprod Med 54 23–36. (https://doi.org/10.1080/19396360701879595)
Pérez-Patiño C , Parrilla I , Li J , et al. 2019 The proteome of pig spermatozoa is remodeled during ejaculation. Mol Cell Proteomics 18 41–50. (https://doi.org/10.1074/mcp.ra118.000840)
Perez-Riverol Y , Bai J , Bandla C , et al. 2022 The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 50 D543–D552. (https://doi.org/10.1093/nar/gkab1038)
Pini T , Parks J , Russ J , et al. 2020 Obesity significantly alters the human sperm proteome, with potential implications for fertility. J Assist Reprod Genet 37 777–787. (https://doi.org/10.1007/s10815-020-01707-8)
Piomboni P , Focarelli R , Stendardi A , et al. 2012 The role of mitochondria in energy production for human sperm motility. Int J Androl 35 109–124. (https://doi.org/10.1111/j.1365-2605.2011.01218.x)
Ramesha KP , Mol P , Kannegundla U , et al. 2020 Deep proteome profiling of semen of Indian indigenous malnad gidda (Bos indicus) cattle. J Proteome Res 19 3364–3376. (https://doi.org/10.1021/acs.jproteome.0c00237)
Redgrove KA , Nixon B , Baker MA , et al. 2012 The molecular chaperone HSPA2 plays a key role in regulating the expression of sperm surface receptors that mediate sperm-egg recognition. PLoS One 7 e50851. (https://doi.org/10.1371/journal.pone.0050851)
Robert M & Gagnon C 1994 Sperm motility inhibitor from human seminal plasma: presence of a precursor molecule in seminal vesicle fluid and its molecular processing after ejaculation. Int J Androl 17 232–240. (https://doi.org/10.1111/j.1365-2605.1994.tb01248.x)
Sampson MJ , Decker WK , Beaudet AL , et al. 2001 Immotile sperm and infertility in mice lacking mitochondrial voltage-dependent anion channel type 3. J Biol Chem 276 39206–39212. (https://doi.org/10.1074/jbc.m104724200)
Schiza C , Korbakis D , Panteleli E , et al. 2018 Discovery of a human testis-specific protein complex TEX101-DPEP3 and selection of its disrupting antibodies. Mol Cell Proteomics 17 2480–2495. (https://doi.org/10.1074/mcp.ra118.000749)
Schjenken JE , Sharkey DJ & Robertson SA 2018 Seminal vesicle – secretion. In: Encyclopaedia of Reproduction. Second Edition, pp. 349–354. (https://doi.org/10.1016/b978-0-12-801238-3.64600-7)
Sha Y , Liu W , Nie H , et al. 2022 Homozygous mutation in DNALI1 leads to asthenoteratozoospermia by affecting the inner dynein arms. Front Endocrinol 13 1058651. (https://doi.org/10.3389/fendo.2022.1058651)
Shannon P , Markiel A , Ozier O , et al. 2003 Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13 2498–2504. (https://doi.org/10.1101/gr.1239303)
Sharma U , Sun F , Conine CC , et al. 2018 Small RNAs are trafficked from the epididymis to developing mammalian sperm. Dev Cell 46 481–494.e6. (https://doi.org/10.1016/j.devcel.2018.06.023)
Shen D , Zhou C , Cao M , et al. 2021 Differential membrane protein profile in bovine X- and Y-sperm. J Proteome Res 20 3031–3042. (https://doi.org/10.1021/acs.jproteome.0c00358)
Sherman BT , Hao M , Qiu J , et al. 2022 DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res 50 W216–W221. (https://doi.org/10.1093/nar/gkac194)
Sievers F , Wilm A , Dineen D , et al. 2011 Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7 539. (https://doi.org/10.1038/msb.2011.75)
Skerrett-Byrne DA , Anderson AL , Hulse L , et al. 2021a Proteomic analysis of koala (phascolarctos cinereus) spermatozoa and prostatic bodies. Proteomics 21 e2100067. (https://doi.org/10.1002/pmic.202100067)
Skerrett-Byrne DA , Bromfield EG , Murray HC , et al. 2021b Time-resolved proteomic profiling of cigarette smoke-induced experimental chronic obstructive pulmonary disease. Respirology 26 960–973. (https://doi.org/10.1111/resp.14111)
Skerrett-Byrne DA , Trigg NA , Bromfield EG , et al. 2021c Proteomic dissection of the impact of environmental exposures on mouse seminal vesicle function. Mol Cell Proteomics 20 100107. (https://doi.org/10.1016/j.mcpro.2021.100107)
Skerrett-Byrne DA , Anderson AL , Bromfield EG , et al. 2022 Global profiling of the proteomic changes associated with the post-testicular maturation of mouse spermatozoa. Cell Rep 41 111655. (https://doi.org/10.1016/j.celrep.2022.111655)
Skerrett-Byrne DA , Teperino R & Nixon B 2024 ShinySperm: navigating the sperm proteome landscape. Reprod Fertil Dev 36 RD24079. (https://doi.org/10.1071/rd24079)
Smyth SP , Nixon B , Anderson AL , et al. 2022 Elucidation of the protein composition of mouse seminal vesicle fluid. Proteomics 22 e2100227. (https://doi.org/10.1002/pmic.202100227)
Smyth SP , Nixon B , Skerrett-Byrne DA , et al. 2024 Building an understanding of proteostasis in reproductive cells: the impact of reactive carbonyl species on protein fate. Antioxid Redox Signal 41 296–321. (https://doi.org/10.1089/ars.2023.0314)
Somashekar L , Selvaraju S , Parthipan S , et al. 2017 Comparative sperm protein profiling in bulls differing in fertility and identification of phosphatidylethanolamine-binding protein 4, a potential fertility marker. Andrology 5 1032–1051. (https://doi.org/10.1111/andr.12404)
Spadafora C 2023 The epigenetic basis of evolution. Prog Biophys Mol Biol 178 57–69. (https://doi.org/10.1016/j.pbiomolbio.2023.01.005)
Stallmeyer B , Bühlmann C , Stakaitis R , et al. 2024 Inherited defects of piRNA biogenesis cause transposon de-repression, impaired spermatogenesis, and human male infertility. Nat Commun 15 6637. (https://doi.org/10.1038/s41467-024-50930-9)
Staudt DE , Murray HC , Skerrett-Byrne DA , et al. 2022 Phospho-heavy-labeled-spiketide FAIMS stepped-CV DDA (pHASED) provides real-time phosphoproteomics data to aid in cancer drug selection. Clin Proteomics 19 48. (https://doi.org/10.1186/s12014-022-09385-7)
Suarez SS & Pacey AA 2006 Sperm transport in the female reproductive tract. Hum Reprod Update 12 23–37. (https://doi.org/10.1093/humupd/dmi047)
Szklarczyk D , Gable AL , Nastou KC , et al. 2021 The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 49 D605–D612. (https://doi.org/10.1093/nar/gkaa1074)
Tomar A , Gomez-Velazquez M , Gerlini R , et al. 2024 Epigenetic inheritance of diet-induced and sperm-borne mitochondrial RNAs. Nature 630 720–727. (https://doi.org/10.1038/s41586-024-07472-3)
Trigg NA , Skerrett-Byrne DA , Xavier MJ , et al. 2021 Acrylamide modulates the mouse epididymal proteome to drive alterations in the sperm small non-coding RNA profile and dysregulate embryo development. Cell Rep 37 109787. (https://doi.org/10.1016/j.celrep.2021.109787)
Tyanova S , Temu T , Sinitcyn P , et al. 2016 The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13 731–740. (https://doi.org/10.1038/nmeth.3901)
Uhlén M , Fagerberg L , Hallström BM , et al. 2015 Proteomics. Tissue-based map of the human proteome. Science 347 1260419. (https://doi.org/10.1126/science.1260419)
Urizar-Arenaza I , Osinalde N , Akimov V , et al. 2019 Phosphoproteomic and functional analyses reveal sperm-specific protein changes downstream of kappa opioid receptor in human spermatozoa. Mol Cell Proteomics 18 S118–s131. (https://doi.org/10.1074/mcp.ra118.001133)
Van den Broeck L , Bhosale DK , Song K , et al. 2023 Functional annotation of proteins for signaling network inference in non-model species. Nat Commun 14 4654. (https://doi.org/10.1038/s41467-023-40365-z)
Vandenbrouck Y , Lane L , Carapito C , et al. 2016 Looking for missing proteins in the proteome of human spermatozoa: an update. J Proteome Res 15 3998–4019. (https://doi.org/10.1021/acs.jproteome.6b00400)
Vicens A , Borziak K , Karr TL , et al. 2017 Comparative sperm proteomics in mouse species with divergent mating systems. Mol Biol Evol 34 1403–1416. (https://doi.org/10.1093/molbev/msx084)
Vitorino Carvalho A , Soler L , Thélie A , et al. 2021 Proteomic changes associated with sperm fertilizing ability in meat-type roosters. Front Cell Dev Biol 9 655866. (https://doi.org/10.3389/fcell.2021.655866)
Wang D , Eraslan B , Wieland T , et al. 2019 A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol Syst Biol 15 e8503. (https://doi.org/10.15252/msb.20188503)
Wen Q , Tang EI , Xiao X , et al. 2016 Transport of germ cells across the seminiferous epithelium during spermatogenesis-the involvement of both actin- and microtubule-based cytoskeletons. Tissue Barriers 4 e1265042. (https://doi.org/10.1080/21688370.2016.1265042)
Willems P , Fijalkowski I & Van Damme P 2020 Lost and found: re-searching and re-scoring proteomics data aids genome annotation and improves proteome coverage. mSystems 5 e00833–00820. (https://doi.org/10.1128/msystems.00833-2000833-20)
Xu K , Yang L , Zhang L , et al. 2020 Lack of AKAP3 disrupts integrity of the subcellular structure and proteome of mouse sperm and causes male sterility. Development 147 dev181057. (https://doi.org/10.1242/dev.181057)
Xu Y , Han Q , Ma C , et al. 2021 Comparative proteomics and phosphoproteomics analysis reveal the possible breed difference in yorkshire and duroc boar spermatozoa. Front Cell Dev Biol 9 652809. (https://doi.org/10.3389/fcell.2021.652809)
Yap YT , Li W , Huang Q , et al. 2023 DNALI1 interacts with the MEIG1/PACRG complex within the manchette and is required for proper sperm flagellum assembly in mice. Elife 12 e79620. (https://doi.org/10.7554/elife.79620)
Yin Y , Cao S , Fu H , et al. 2020 A noncanonical role of NOD-like receptor NLRP14 in PGCLC differentiation and spermatogenesis. Proc Natl Acad Sci U S A 117 22237–22248. (https://doi.org/10.1073/pnas.2005533117)
Zhang M , Chiozzi RZ , Skerrett-Byrne DA , et al. 2022 High resolution proteomic analysis of subcellular fractionated boar spermatozoa provides comprehensive insights into perinuclear theca-residing proteins. Front Cell Dev Biol 10 836208. (https://doi.org/10.3389/fcell.2022.836208)