Genomic insights into the secondary aquatic transition of

Climate change drove evolution, biogeography, and demography

Phylogenetic results (Fig. 1 and Supplementary Fig. 2) confirm previous findings, recovering Aptenodytes (king and emperor penguins) as the sister clade to all other crown penguins, with brush-tailed (Pygoscelis) penguins in turn sister to two clades uniting the banded (Spheniscus) and little (Eudyptula) penguins and the yellow-eyed (Megadyptes) and crested (Eudyptes) penguins6,7,9. Biogeographical reconstructions (Fig. 1, Supplementary Figs. 3–4 and Supplementary Data 1) support a Zealandian origin for penguins6,7. Stem penguins radiated extensively in Zealandia before dispersing to South America and Antarctica multiple times, following the eastward-flowing direction of the Antarctic Circumpolar Current (ACC) (Fig. 1). Crown penguins most likely arose from descendant lineages in South America, before dispersing back to Zealandia at least three times. Interestingly, at least two such dispersals occurred before the inferred onset of the ACC system, suggesting that early stem penguins were not dependent on currents to disperse over long distances. A second pulse of speciation coincides with the onset of the ACC, though understanding whether this pattern is real or an artifact of fossil sampling requires more collecting from early Eocene localities. We infer an age of ~14 Ma for the origin of crown penguins, which is more recent than the ~24 Ma age recovered in genomic analyses, not including fossil taxa7 (Supplementary Fig. 2b) and coincides with the onset of global cooling during the middle Miocene climate transition4,10 (Supplementary Fig. 3a). This young age suggests that expansion of Antarctic ice sheets and the onset of dispersal vectors such as the Benguela Current11 during the middle to late Miocene facilitated crown penguin dispersal and speciation, as hinted at by fossil evidence12.

Incongruences between species trees and gene trees were identified, e.g., alternate topologies occurred at high frequencies (>10%) for several internal branches (Fig. 1c; Supplementary Fig. 5). These patterns indicate that gene tree discordance may be caused by incomplete lineage sorting (ILS) or introgression events. By quantifying ILS and introgression via branch lengths from over 10,000 gene trees, we found that the rapid speciation within crown penguins was accompanied by >5% ILS content within the ancestors of Spheniscus, Eudyptula, Eudyptes, and several subgroups within Eudyptes (Fig. 2a). Our dated tree provides a temporal framework for this rapid radiation: the four extant Spheniscus taxa are all inferred to have split from one another within the last ~3 Ma, and likewise the nine extant Eudyptes taxa likely split from one another in that same time (Fig. 1b). Many closely related penguin species/lineages are known to hybridize in the wild (see supplementary methods). Consistent with this, multiple analyses suggest that introgression also contributes to species tree—gene tree incongruence (Supplementary Figs. 6–9 and Supplementary Data 2; also see Supplementary Methods for further details). This could explain the most notable conflict in previous phylogenetic results, which showed inconsistency over whether Aptenodytes alone7 or Aptenodytes and Pygoscelis together4,5 represent the sister clade to all other extant penguins. Introgression was detected between the ancestor of Aptenodytes and the ancestor of other extant penguins, and is inferred to have occurred when the range of these ancestors overlapped in South America (Fig. 2a and Supplementary Data 2). Introgression (>9%) was also detected between Eudyptula novaehollandiae and Eudyptula minor, and several introgression events were especially pervasive in Eudyptes (Fig. 2a and Supplementary Fig. 6).

Fig. 2: Incomplete lineage sorting, introgression events, and demographic history among penguins.
figure 2

a Model of incomplete lineage sorting (ILS) and introgression events estimated from QuIBL and hybrid pairwise sequentially Markovian coalescent (hPSMC) results. hPSMC was only run for 20 species pairs (see b). Numbers on branches represent the proportion (%) of ILS (orange branches) or introgression (blue lines, blue dashed lines, and blue dotted lines) detected by QuIBL. Proportions <3% were marked with blue dashed lines. Aqua dotted lines represent the ongoing gene flow detected by hPSMC. b Estimated divergence times and time intervals during which gene flow ceased between closely related lineages (see Supplementary Data 2 for details). Each circle represents one species in plot a. Gray lines represent the 95% credible intervals for the divergence times from the Bayesian total-evidence dating tree. Composite oxygen stable isotope (δ18O) data were modified from10 to show the climate fluctuations. c Normalized inferred population size (as a percentage of the maximum population size for each species between 20 and 250 Kya) trends for four groups of species showing similar patterns during the LGP based on PSMC results (full PSMC results shown in Supplementary Fig. 11). WAP = West Antarctic Peninsula, SG = South Georgia, KER = Kerguelen, FAL = Falkland/Malvinas and BAN = Banks Peninsula. Source data is provided as a Source Data file.

Many extant penguin lineages began to diverge within the last 3 Ma (Fig. 1). To obtain insight into this recent phase of penguin diversification, we inferred post-speciation introgression events and estimated the time when gene flow from introgression ceased between 20 pairs of closely related lineages (see Supplementary Methods). Our results provide further evidence for recent introgression between all sampled pairings (Fig. 2b) except for Eudyptes chrysocome and E. filholi, whose ranges are geographically disparate (Fig. 1a). Almost all species exhibit a genomic signature of a period of physical isolation during the Last Glacial Period (LGP) with increased climate fluctuation and environmental uncertainty, followed by postglacial contact and introgression as Earth warmed once again (Supplementary Figs. 8–9). This strongly supports the hypothesis that penguins were impacted by ecosystem-wide, climate-driven refugia/recolonization cycles in the Southern Ocean13,14, a pattern also observed in other marine taxa during the Last Glacial Maximum (e.g.,15). As ice volumes increased during the LGP high-latitude penguin species were likely forced into isolated mid-latitude refugia. As climate warmed from the late Pleistocene to Holocene, these species moved back towards the poles, recolonizing landmasses and islands as they became habitable once again, and, notably, experiencing secondary contact with one another (e.g., on small sub-Antarctic islands).

Today, penguins are under threat from climate change and environmental disruption (see Supplementary Methods for further citations) and half of all extant species are considered either Endangered or Vulnerable (IUCN red list categories). Understanding how past climate events have impacted penguin population size during the LGP is crucial in inferring how penguin populations may respond to future climate change. We estimated the effective population size for all recent penguin taxa except for E. warhami and M. a. waitaha (where data were too limited, Supplementary Data 2) (Fig. 2c, Supplementary Figs. 10–11 and Supplementary Data 2). These analyses provide a window into long-term population histories (very recent trends cannot be accurately recovered with these methods16). Four demographic patterns emerge for this critical time interval, illuminating disparate responses of penguins to glacial-interglacial cycles (Fig. 2c). The most prevalent pattern is shared by nine lineages (Aptenodytes patagonicus, Pygoscelis antarctica, P. papua “KER”, S. demersus, S. humboldti M. a. antipodes, M. a. richdalei, Eudyptes robustus and E. pachyrhynchus), all of which show evidence of population expansion coincident with the beginning of the LGP, followed by population decline towards the end of the LGP. In contrast to this pattern, nine lineages (A. forsteri, P. adeliae, P. papua “WAP”, P. papua “SG”, S. magellanicus, E. moseleyi, E. filholi, E. chrysolophus schlegeli, and E. sclateri) show evidence of population decline coincident with the beginning of the LGP, followed by population expansion towards the end of the LGP. Almost all of the remaining lineages show strong evidence of persistent long-term declines in populations from the early LPG to the end of LPG. All three Eudyptula taxa and Eudyptes chrysolophus chrysolophus underwent a steep population decline spanning the LGP, while three taxa (P. papua “FAL”, S. mendiculus, and E. chrysocome) show evidence of continual population decline across the last 250 thousand years (ka).

Interestingly, taxa that increased in population size towards the end of the LGP (e.g., A. forsteri, P. adeliae, S. magellanicus, E. filholi, E. moseleyi, E. sclateri, and E. schlegeli are typically migratory, and tend to forage offshore (>50 km; see Supplementary Data 117), while taxa that decreased towards the end of the LGP (e.g., S. humboldti, S. demersus, M. a. antipodes and likely M. a. richdalei) tend to be residential, and forage inshore; see Supplementary Data 1. Taxa that disperse farther may have overcome local impacts of global climate cooling during the LGP (e.g., changes in sea-ice extent, prey abundance and terrestrial glaciation, however see18) largely by relocating to lower latitudes (e.g.,14), whereas locally-restricted taxa may have been more prone to sudden population collapses.

Penguins have the slowest evolutionary rates among birds

The integrated evolutionary speed hypothesis (IESH) proposes that temperature, water availability, population size, and spatial heterogeneity influence evolutionary rate19. Life history traits also impact the evolutionary rate, but such relationships remain incompletely understood in birds20. Penguins are long-lived, large-bodied, and produce few offspring, thus providing an ideal case study in how life history may impact evolutionary rate. We tested the IESH using three proxies for evolutionary rate: substitution rate, P and K2P distances between lineages and their ancestors (Supplementary Fig. 12 and Supplementary Data 3). We found that penguins and their sister group (Procellariiformes) had the lowest evolutionary rates of the 17 avian orders sampled by21 (Fig. 3a, Supplementary Fig. 13, and Supplementary Data 3). Because other aquatic orders also show slow rates (e.g., the aquatic Anseriformes show a significantly slower rate than their terrestrial sister group Galliformes), we hypothesize that the rate in penguins represents the culmination of a gradual slowdown associated with increasingly aquatic ecology. Intriguingly, we detected a trend toward decreasing rate over the first ~10 Ma of crown penguin evolution, followed by a marked uptick ~2 Ma, which suggests the onset of glacial-interglacial cycles contributed to a recent increase in evolutionary rates in penguins (Fig. 3b).

Fig. 3: Evolutionary rates in birds.
figure 3

a Evolutionary rate in avian orders based on a ~19 Mbp alignment of highly conserved genome regions. Sphenisciformes and Procellariiformes have the lowest evolutionary rate among modern bird orders (One-sided Wilcoxon Rank sum test, P values < 0.05 for all pairs except for Sphenisciformes and Procellariiformes (P-values > 0.1)). Numbers at the tips represent the sample size in each group. Numbers at nodes represent the divergence times (Ma) between each order and its sister taxon and red dots within the boxplots indicate average values. We did not attempt to estimate the evolutionary rates for orders containing less than three sampled species (gray font; Musophagiformes, Mesitornithiformes, and Struthioniformes). Boxplots show the median with hinges at the 25th and 75th percentile and whiskers extending 1.5 times the interquartile range. Some bird images were downloaded from phylopic.org and were licensed under the Creative Commons (CC0) 1.0 Universal Public Domain Dedication. b Evolutionary rates inferred for extant penguin lineages at internal nodes from the maximum clade credibility tree, calculated using a 500 Mbp genome alignment. Gray shadows represent the 95% credible intervals. ce Correlations between c, body mass and generation time (P value < 0.05), d generation time (gray dots, solid lines, P value < 0.001) or body mass (blue dots, dashed lines P value < 0.05) and average sea surface temperature, e substitution per site per generation time (gray dots, solid lines, P value < 0.001) or substitution per site per million years (purple dots, dashed lines P value < 0.01) and body mass among 18 penguins, estimated using phylogenetic correlation – Phylogenetic Generalized Least Squares Regression with the best-fitting model identified by Akaike Information Criterion. Correlations with linear models were shown with black lines. Source data is provided as a Source Data file.

Extant penguin lineages show a wide range of individual rates, and phylogenetic correlation analyses (phylogenetic generalized least squares regression) shed light on potential factors influencing this disparity (Fig. 3c–e and Supplementary Data 3). Extant penguins showed a significant negative correlation between body mass and average sea surface temperature (Fig. 3d). Despite species from warmer regions having shorter generation times (Fig. 3d), a significant negative correlation was found between evolutionary rate and average sea surface temperature (Fig. 3e), suggesting that temperature may influence penguin evolutionary rates by regulating selective pressures, but not only through its effect on metabolism22. This result is in parallel with studies that show speciation rates to be higher in polar environments than in the tropics, pointing towards faster rates of evolution and more opportunities for divergence at high latitudes23,24. We propose that these patterns together reflect the signature of climate oscillations on high latitude species: polar penguins (e.g. A. forsteri/P. adeliae) were likely forced into more northerly refugia during ice ages, subsequently recolonizing Antarctica during interglacials14. These events may have led to faster evolutionary rates as these lineages underwent population contraction-expansion cycles and were periodically forced to adapt to new environments.

Putative molecular adaptations unique to penguins

As penguins became increasingly adapted to a flightless diving ecology, they encountered novel selection pressures that required modifications to their locomotory strategy, thermoregulation, sensory perception, and diet. We tested whether these phenotypic changes have been facilitated through the evolution of the underlying protein-coding genes (Supplementary Data 4) by identifying positively selected genes (PSGs), rapidly evolving genes (REGs), and pseudogenes that relate to specific adaptations including thermoregulation, oceanic diving, oxygenation, underwater vision, shifts in diet and taste, body size and immunity (see Figs. 4, 5 and Supplementary Methods for additional details and citations). These genes either differ in all penguins compared with other birds, differ in the genus Aptenodytes compared with other penguins, or are under distinct selective pressures within penguins (Supplementary Data 4). In the branch leading to the last common ancestor (bLCA) of penguins, 27 PSGs (false discovery rate [FDR] q < 0.05) and 13 REGs (FDR q < 0.05) were detected. In the bLCA of Aptenodytes, 25 PSGs (FDR q < 0.05) and 3 REGs (FDR q < 0.05) were detected. In the bLCA of penguins and four flightless/nearly flightless birds (Nannopterum harrisi, Rhynochetos jubatus, Zapornia atra, and Laterallus rogersi, see Supplementary Fig. 16a), five PSGs (FDR q < 0.05) and 38 REGs (FDR q < 0.05) were detected. Within penguins, 275 PSGs (FDR q < 0.01) were detected (Supplementary Data 4). We related the gene pathways and known functions of 15 PSGs and six REGs to penguin-specific adaptations (Fig. 4a). We also highlight five genes containing penguin-specific substitutions, seven pseudogenes, and two gene expansions (Fig. 4a, Supplementary Figs. 14, 15).

Fig. 4: Adaptive genes in extant penguin lineages.
figure 4

a Genes with unique evolutionary signals in penguins and their putative adaptive function. b Gene regulatory pathways related to light transmission. c Phylogenetic tree of 45 avian species showing two mutation sites (HBA-αA, A140S, and HBB-βA, L87M) of hemoglobin genes in penguins (marked in red) and outgroups. d Positive selection at multiple sites (41, 62, 111, 113, 127, 141) on the bLCA of extant penguins for MB gene and the structural effects of amino acid substitutions in the chicken MB gene. Molecular models of the chicken MB gene and the MB gene with penguin-specific substitutions may affect the stabilization of MB. Source data is provided as a Source Data file.

Fig. 5: Pseudogenes and site alignments for vision, taste, diet, and immunity genes.
figure 5

a Presence/absence of vision, taste, and dietary genes in penguins. Phylogenetic tree of penguins and select outgroups indicate which species have complete or pseudogenes, related to vision (opsins; RH1, RH2, SWS1, SWS2, LWS, and CYP2J19), taste (umami; TAS1R1, TAS1R3, sweet; TAS1R2, bitter; TAS2R1, TAS2R2, TAS2R3, sour; PKD211, salty; SCNN1A, SCNN1B, SCNN1G) and diet (chitinase; CHIA). “Not found” indicates genes that could not be assembled. b Phylogenetic tree of penguins showing alignments of positively selected sites for four genes related to immunity (TLR4, TLR5, IFIT5, and CD81). Sites are shown below the alignment. The background colors are displayed for sites that have 50% conservation. Source data is provided as a Source Data file.

We identified three REGs that are shared by penguins and other flightless/nearly flightless birds. These genes are likely associated with the shortening, rigidity, and increased density of the forelimb bones which contribute to the flipper-like wing of penguins (Fig. 4a). TBXT and FOXP1 are related to the development of articular cartilage, tendons, and limb bones25,26. SMAD3 is involved in the transforming growth factor-beta signaling pathway, which is important for maintaining articular cartilage and stimulating osteogenesis and bone formation27. Perhaps most interestingly, TNMD, a PSG, is expressed during the differentiation and developmental phase of limb tendon, ligament, and collagen fibrils, and loss of TNMD can result in reduced tenocyte density28. We hypothesize that TNMD may be key to the nearly wholesale replacement of penguin distal wing musculature by tendons, which stiffens and reduces heat loss to the high surface area flipper (Supplementary Fig. 16a-d). We also identified two genes KCNU1 and KCNMA1 that are related to calcium sequestration to be expanded in the genomes of both penguins and grebes (Podiceps cristatus and Podilymbus podiceps) (Fig. 4a, Supplementary Fig. 15). These genes likely contribute to the high bone density characteristic of these taxa, which helps reduce buoyancy for deep diving.

Penguins have densely-packed waterproof feathers, thick skin, and a layer of subcutaneous fat enabling them to thermoregulate in cold environments. We identified four genes under selective pressure in common ancestors of penguins that are related to thermoregulation (Supplementary Data 4). These genes (APPL1, TRPC1, EVPL) showed evidence of positive selection or rapid rates of evolution on the bLCA of extant penguins but not in other birds (Fig. 4a). The white adipose tissue of penguins is important for survival in the cold, acting as an insulative layer and an energy reserve, particularly prior to catastrophic moult29. We hypothesize several of these genes contribute to white adipose fat storage and hence survival in cold environments. APPL1 (Supplementary Fig. 16e) and TRPC1 are related to glucose levels and fatty acid breakdown through adiponectin30,31.

Penguins function under hypoxic conditions during deep dives in part via myoglobin concentration and utilizing anaerobic metabolism32,33. We identified seven genes related to oxygenation that are under positive selection or have penguin-specific substitutions in penguins. Transferrin Receptor 1 (TFRC) shows a positive selection in penguins (Supplementary Fig. 16f). Previous experimental work in cells has reported that TFRC messenger RNA is expressed in an oxygen-dependent manner34. Importantly, TFRC is a top candidate gene for the hypoxia response of domesticated cattle35. We hypothesize that TFRC has contributed to a convergent adaptation to withstanding hypoxia in penguins. Interestingly, FIBB and ANO6, which are involved in blood coagulation, showed a signal of positive selection in Aptenodytes, but not in other genera (Supplementary Fig. 17). Among all penguins, Aptenodytes have the capacity for the deepest diving (>500 m depth)36, and thus, these gene variants may enable these species to dive to extreme depths. While none of the hemoglobin genes were PSGs (P-value: >0.05), we observed that HBA-αA (A140S) and HBB-βA (L87M) genes (Fig. 4c and Supplementary Fig. 18) show penguin-specific amino acid substitutions that are highly conserved across all penguin species, making them candidate molecular adaptations for surviving deep oceanic dives under hypoxic conditions (see also ref. 37). MB is an oxygen-binding myoglobin gene that shows positive selection at multiple sites both between penguins and other birds and among penguins (Fig. 4d and Supplementary Fig. 16g), suggesting that these penguin-specific substitutions may impact the stability of the resulting myoglobins, as seen in extreme deep-diving cetaceans38. While cormorants and petrels also undertake deep (>70 m) dives, we did not observe selection for TFRC and hemoglobin genes in these groups (Fig. 4c). Another PSG, TRPC4, is involved in the cardiovascular system39. Specifically, TRPC4 may help widen blood vessels to decrease blood pressure during deep dives40.

Penguins frequently forage in low light, and exhibit specializations for vision in dim, blue-green marine environments41,42. Morphological research has shown that at least some penguins are cone trichromats with only three functional cone photoreceptor types, blue-shifted long-wavelength visual pigments, and no red oil droplets41. Genomic data support trichromatism in all penguins, in contrast to most other birds which are tetrachromats. The inactivation of the green cone opsin gene (RH2) in the stem penguin lineage is inferred by a 12-base pair (bp) deletion, which encompasses the codon for the critical chromophore-binding lysine (K29643) (Fig. 4a and Supplementary Fig. 19a). As all penguins share this deletion, reduced color vision must have occurred in the penguin stem lineage, similar to secondarily aquatic mammals44. Although penguins lack green cones, the functional orthologs of the remaining visual opsins in penguins strongly indicate the retention of violet (SWS1), blue (SWS2), and red (LWS) cones, plus rods (RH1) (Fig. 5a). This genetic signature is concordant with our experiments on Pygoscelis papua (see Supplementary Methods), which demonstrate a capacity for ultraviolet light perception at 365 nm, likely conferred by the SWS1 opsin. Furthermore, the peak wavelength sensitivity (λmax) of penguin LWS opsins show evidence of shifts in spectral sensitivity to better match ambient underwater light. Relative to key avian model species (e.g., Taeniopygia guttata, Columba livia, Gallus gallus) and Procellariiformes, penguins possess substitutions at five key tuning sites in LWS, four of which (A180, F277, A285, and S308) are associated with blue-shifting this pigment45 (Supplementary Fig. 19b). This suggests that this opsin has been fine-tuned for marine foraging, as observed in cetaceans44. CYP2J19, which encodes a carotenoid ketolase responsible for producing red oil droplets in avian cones46, has been inactivated in most penguins (Supplementary Data 4). Colored oil droplets are thought to fine-tune color vision46, though this comes at the cost of decreased visual sensitivity. Deactivation of CYP2J19 likely allows for higher retinal sensitivity when foraging in dim light conditions, as seen in nocturnal owls and kiwis46. Beyond these key genes, we note that two scotopic photoresponse genes, TMEM30A (PSG) and KCNV2 (REG), show evidence of selection in penguins, and two others, CNGB1 and GNB3, each have a site mutation unique to penguins (Supplementary Fig. 19c, d). These genes play an important role in the transmission of light (Fig. 4b), and may further enhance visual sensitivity at low light levels, as mutations or loss of these genes impact the result in a reduced scotopic photoresponse47,48.

A wholesale reduction in gustation capacity appears to have accompanied the shift to underwater prey capture and consumption in penguins. We verified that penguins only retain genes associated with detecting sour and salty tastants, and lack functional copies of genes linked to umami, sweet and bitter tastants49 (Figs. 4a and 5a). The mutational loss of capacity for umami taste in penguins is puzzling, given the continued consumption of amino acid-rich prey. Intriguingly, the loss of umami has also been reported in secondarily aquatic mammals50. Potential explanations include a lower reliance on taste when swallowing food whole or weakened ability to taste prey due to cold temperatures and the sodium content of seawater (reviewed in50).

A strong genomic indicator of diet is presented by chitinases that are expressed in the gastrointestinal tract51. The chitinase genes (CHIAs) exist as several paralogs, and the retention or loss of these paralogs in mammals has been correlated with diet51. Retention of intact CHIAs correlates with a higher degree of insectivory, and CHIA losses tend to occur in lineages that undergo dietary shifts to carnivory or herbivory. We examined CHIAs in penguins, and in contrast to most examined birds, which have one to four intact CHIAs52, penguins have a single pseudogenized CHIA. At first glance, it is perplexing that penguins would lose CHIAs, as many species consume large amounts of crustaceans. Fossil evidence, however, reveals that stem penguins focused primarily on larger prey items like fish and squid, and that adaptations for capturing smaller planktonic prey arose as recently as the Pliocene6. We propose that the two inactivating mutations shared by extant penguins (Fig. 5) evolved during a ~50 Ma interval during which stem penguins consumed little or no arthropod prey.

Co-evolution between hosts and pathogens is pervasive in vertebrates. Given the range of different climatic niches occupied by penguins, and the differences in pathogen assemblages to which they are undoubtedly exposed, penguins may have undergone significant adaptation to local pathogen pressures53. Accordingly, we detected 51 PSGs in penguins that have a role in immunity (Supplementary Data 4). Several of these genes might be under positive selection corresponding to host-pathogen co-evolution. For instance, we confirm previous reports53,54 that the bacterial-recognizing Toll-like receptors TLR4 and TLR5 (Figs. 4a and 5b) are positively selected in penguins. Moreover, the positively selected sites located proximal (<5 Å) to the lipopolysaccharide-binding site in TLR4 (codon 276, homologous to chicken codon 30255) and at a flagellin-binding site in TLR5 (codon 3356) (Fig. 5b) are both in domains crucial for bacterial recognition. In addition, we detected several other pattern-recognition receptors, such as IFIT5, that are also under positive selection in penguins (Fig. 4a). IFIT5 is a cellular detector of viral RNA57, and we found a cluster of positively selected sites located in a connecting helix forming part of the RNA-binding cleft (codons 407, 409, 413, and 421, corresponding to human codons 412, 414, 418 and 42658,59) (Fig. 5b). This may imply that penguin IFIT5 has undergone adaptation to different viral RNA motifs in response to viral pathogen pressure. We also found evidence of positive selection at viral targets of cell entry. For example, CD81 is a co-receptor required for glycoprotein-mediated hepatitis C viral entry into cells in mammals60, and positive selection has been reported at the glycoprotein interface in bat CD8161. We also found a cluster of positively selected sites in the hepatitis C glycoprotein interface in penguin CD81 (sites 181, 182, and 186, corresponding to human sites 180, 181, and 185, and penguin site 86, corresponding to human site 185) (Fig. 5b). This may suggest that penguins have experienced co-evolution with a viral pathogen that relies on CD81 for cell entry. Finally, we detected positive selection in penguin transferrin, which is part of the “nutritional” immune system that sequesters iron from iron-scavenging pathogens62. Outbreaks of diphtheritic stomatitis in Megadyptes antipodes have caused increasing chick mortality and are hypothesized to be related to increasing susceptibility to Corynebacterium as a secondary infection63 potentially triggered by chick malnutrition due to changes in diet, and potentially iron intake. The co-evolutionary arms race to sequester and scavenge iron has also been detected in mammals and fishes (e.g.,64). Taken together, these observations illustrate that immune genes have undergone diversification in penguins. Furthermore, many positively selected sites were clustered in regions known to be involved in pathogen binding, which provides evidence for extensive host-pathogen co-evolution during the diversification of penguins into novel pathogen environments.

Extant penguins range from ~1 kg in Eudyptula spp. to 40 kg in Aptenodytes forsteri, but giant fossil penguins exceeded 100 kg65. We found two genes associated with large body size that are under positive selection in Aptenodytes compared to all other penguin lineages (Fig. 4a). CREB3L1 is important during bone development, and vertebrates lacking CREB3L1 have underdeveloped growth66. SMARCAD1 is related to the skeleton and plays a role in transcriptional regulation, maintenance of chromosome stability, and various aspects of DNA repair. Vertebrates with mutant SMARCAD1 also have underdeveloped growth67. We hypothesize that these genes have contributed to the large body size of Aptenodytes. Although genetic data are inaccessible for stem penguins, the recovery of Aptenodytes as sisters to all other extant penguins and the large size of many stem penguins (e.g., Kumimanu and Kairuku) suggests positive selection in these genes could be ancestral for crown penguins with selection relaxed in non-Aptenodytes taxa.