CNCP 2025

特邀专家

刘超 刘志伟 闻博 余风潮 叶子璐 周默为 杨奕 毛鹏志 赵群 李功玉 伊心培 夏俊 王宇 王海鹏 单宝珍 曾文锋 姚建华 常乘
刘超
北京航空航天大学

基于多通道对比学习框架的DIA单细胞蛋白质组学队列数据高灵敏度解析

Abstract:

新一代Orbitrap Astral质谱仪,结合高灵敏度的DIA策略,极大提升了蛋白质组学的解析深度,并逐步在单细胞蛋白质组学中得到广泛应用。然而,针对Astral产出的队列数据,现有DIA解析软件会报告大量缺失值,严重制约了生物医学的应用。为此,我们提出了一个专为Orbitrap Astral质谱仪产生的DIA数据设计的鉴定结果后处理及定量软件ApuQuant。ApuQuant通过应用基于多通道对比学习模型的高灵敏度MBR算法,与DIA-NN和Spectranaut相比,将缺失值减少了11.3%-66.7%,并设计了缺失值评价打分,更在模拟单细胞蛋白质组学的微量蛋白质组学样本中提升了35.5%-60.06%的鉴定深度。此外,ApuQuant被进一步应用于A549单细胞蛋白质组分析中,揭示了细胞在不同生理状态下的蛋白质组多样性,展示了其在单细胞分辨率下挖掘生物学机制的巨大潜力。

Keywords:

DIA, 单细胞蛋白质组学, 对比学习, 缺失值, 质谱

刘志伟
西湖大学

用于增强DIA蛋白质组数据分析的预训练端到端Transformer模型

Abstract:

数据非依赖采集质谱技术(Data-Independent Acquisition Mass Spectrometry, DIA-MS)在定量蛋白质组学中正变得日益关键。本研究提出了DIA-BERT,一款利用基于Transformer架构的预训练人工智能(AI)模型分析DIA蛋白质组数据的软件工具。其识别模型使用从现有DIA-MS文件中提取的超过2.76亿个高质量肽段母离子进行训练,而定量模型则基于3400万个来自合成DIA-MS文件的肽段母离子进行训练。与DIA-NN相比,DIA-BERT在五种人类癌症样本集(包括宫颈癌、胰腺腺癌、骨骼肌肉肉瘤、胆囊癌和胃癌)中平均提升了51%的蛋白鉴定数量和22%的肽段母离子数量,且具有高度的定量准确性。该研究强调了预训练模型与合成数据集在提升DIA蛋白质组数据分析方面的巨大潜力。

Keywords:

DIA-MS, Transformer, 预训练模型, 蛋白质组学, 人工智能

闻博
University of Washington

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment

Abstract:

A critical challenge in mass spectrometry proteomics is accurately assessing error control, especially given that software tools employ distinct methods for reporting errors. Many tools are closed-source and poorly documented, leading to inconsistent validation strategies. Here we identify three prevalent methods for validating false discovery rate (FDR) control: one invalid, one providing only a lower bound, and one valid but under-powered. The result is that the proteomics community has limited insight into actual FDR control effectiveness, especially for data-independent acquisition (DIA) analyses. We propose a theoretical framework for entrapment experiments, allowing us to rigorously characterize different approaches. Moreover, we introduce a more powerful evaluation method and apply it alongside existing techniques to assess existing tools. We first validate our analysis in the better-understood data-dependent acquisition setup, and then, we analyze DIA data, where we find that no DIA search tool consistently controls the FDR, with particularly poor performance on single-cell datasets.

Keywords:

false discovery rate, mass spectrometry, entrapment, tandem mass spectrometry, proteomics

余风潮
University of Michigan

Integrated Spectrum-Centric DIA Analysis in the FragPipe Platform

Abstract:

The increasing complexity and scale of quantitative proteomics demand robust, accurate, and scalable solutions for analyzing data-independent acquisition (DIA) mass spectrometry data. We present a comprehensive, spectrum-centric computational framework integrated into the FragPipe platform that enables high-throughput, library-free peptide identification and quantification from DIA datasets, including ion mobility-enhanced diaPASEF data. This unified workflow leverages two key innovations: MSFragger-DIA, a high-speed database search engine tailored for DIA MS/MS spectra, and diaTracer, a tool designed to extract three-dimensional precursor-fragment features from diaPASEF data to generate pseudo-MS/MS spectra. By performing database searches prior to chromatographic feature detection, MSFragger-DIA improves both sensitivity and speed across a range of applications including single-cell proteomics, phosphoproteomics, and complex tissue studies. Meanwhile, diaTracer enables spectral-library-free analysis of diaPASEF datasets, supporting unrestricted identification of post-translational modifications and robust quantification. The full integration of these tools into FragPipe ensures reproducible, end-to-end DIA analysis—from spectral extraction and pseudo-spectrum generation to peptide scoring, protein inference, PTM characterization, and quantitative matrix generation. Benchmarked against state-of-the-art tools such as DIA-NN and Spectronaut, this workflow demonstrates competitive or superior performance in terms of identification depth, quantification completeness, modification coverage, and computational efficiency.

Keywords:

DIA, FragPipe, spectrum-centric, proteomics, mass spectrometry

叶子璐
中国医学科学院系统医学研究院/苏州系统医学研究所

高通量痕量及单细胞蛋白质组学方法开发与应用

Abstract:

本报告将系统介绍基于质谱的高通量单细胞蛋白质组学技术的最新进展,重点聚焦样本前处理自动化、质谱采集策略优化以及数据分析流程标准化等方面的关键创新。通过构建兼具高通量与高灵敏度的技术平台,显著提升了单细胞样品的处理效率与蛋白质鉴定深度,推动该领域向大规模、多组织、多时间点的应用拓展。报告还将展示该技术在细胞异质性解析、胚胎发育过程追踪等生物医学研究中的应用成果,突显其在揭示复杂生命过程中的技术价值。同时,我们将简要介绍主流单细胞蛋白质组数据分析软件的对比测试结果,涵盖蛋白质识别能力、FDR控制水平及新功能表现等关键指标,并形成系统性的评估结论与方法选择建议,为研究者合理选择和应用数据处理工具提供参考。

Keywords:

单细胞蛋白质组学, 高通量, 痕量分析, 质谱, 方法开发

刘凡
Leibniz-Forschungsinstitut für Molekulare Pharmakologie

Developing structural interactomics and its application in cell biology

Abstract:

Proteins in all biological systems are highly organized in three-dimensional space, forming membrane-enclosed or membraneless compartments, signaling pathways, dynamic assemblies, and stable complexes. Many of these structures are only viable in their native environment, making them recalcitrant to traditional biochemical characterization. Proteome-wide cross-linking mass spectrometry offers the opportunity to capture the interactions and spatial arrangement of proteins without having to extract them from their complex biological system. Over the years, we’ve advanced cross-linking mass spectrometry by developing experimental methods and software tools and generated tens of thousands of PPIs from various biological systems, such as mitochondria, synapses, cells and virus particles. These data reveal numerous aspects of living systems - for example protein subcellular localizations, interactions, and architectures of suprabiomolecular machineries. Furthermore, these findings inform functional follow-up studies to further characterize the newly discovered protein structures, interactions, and spatially resolved networks, using approaches from structural biology, molecular biology, cell biology, neuroscience, and informatics.

Keywords:

structural interactomics, cross-linking mass spectrometry, protein interactions, cell biology, proteomics

周默为
浙江大学

自上而下(Top-Down)质谱在结构蛋白质组学中的挑战和机遇

Abstract:

精准表征蛋白质的结构对分子机理的研究至关重要。由于常规商业质谱的分子量检测范围有限,目前主流的蛋白质组学方法主要采用“自下而上”(bottom-up)的策略,即将蛋白质水解为肽段后再进行质谱分析。随着仪器技术的不断进步,“自上而下”(top-down)的完整蛋白质分析方法逐渐成熟,能通过在质谱中筛选并直接打碎完整大分子,快速区分由翻译后修饰、高级结构、非共价结合等因素引起的蛋白质结构变体。但大分子的碎裂图谱较为复杂,且碎裂规律研究尚不够深入,当前的方法仍有很大提升的空间。我将举例探讨自上而下质谱与AI结构预测的结合,以及自上而下质谱原始图谱解析软件的开发。

Keywords:

自上而下质谱, 结构蛋白质组学, 蛋白质结构, 质谱分析, AI预测

杨奕
浙江大学

人工智能驱动的糖基化蛋白质组分析

Abstract:

蛋白质糖基化修饰承载着重要的生命信息,然而其样品组成和化学结构的高复杂性对其分析检测和数据解析提出挑战。传统的数据库搜索方法由肽段序列和糖链生成理论碎片离子的质荷比,根据实验谱图中这些离子的存在情况进行评分,在很大程度上忽略了碎片离子的强度信息。为此,报告人开发了基于深度学习的糖肽质谱图预测方法DeepGlyco,利用树形的长短期记忆网络和图神经网络处理糖链的非线性结构,实现了糖肽MS/MS谱图中肽段b/y离子和糖链B/Y离子峰强度的预测。谱图预测有助于扩大谱图库搜索的范围,与数据非依赖性采集相结合,可提高血清糖蛋白质组分析的覆盖深度。谱图预测有助于糖肽鉴定的质控,通过构造实际并不存在的假糖链,预测其MS/MS谱图,并将其作为“陷阱”与实验谱图匹配,可以估计鉴定结果的错误率。谱图预测有助于糖链结构的鉴定,通过将实验谱图与预测谱图进行比较,对相同单糖组成的不同糖链结构进行打分和排序,可以区分部分糖链结构,其准确率在核心岩藻糖基转移酶基因敲除小鼠数据集上得到验证。

Keywords:

糖基化蛋白质组, 人工智能, 深度学习, 质谱图预测, 糖链结构

毛鹏志
中国科学院计算技术研究所

pLink3: Unified Analysis of Large-Scale Crosslinking Proteomics Data

Abstract:

Crosslinking mass spectrometry (XLMS) is a powerful approach for elucidating the structures of large protein complexes and for systematically profiling cellular protein-protein interactions (PPIs). To meet diverse experimental needs, a wide array of crosslinkers has been developed, and with advances in instrumentation, massive amounts of XLMS data are being generated. This necessitates a search engine that simultaneously achieves high sensitivity, precision, speed, and broad compatibility in crosslinking data analysis. Here we present pLink3, enabling crosslink identification within a unified architecture. The core technical innovation of pLink3 lies in its hierarchical funnel-shaped workflow coupled with machine learning-driven scoring system. To comprehensively assess performance, we benchmarked ten software (pLink3, XlinkX, MeroX, Kojak, CRIMP, Xi, MaxLynx, MS Annika, Scout, and pLink2) using over ten distinct datasets. These datasets employed various validation strategies, including synthetic peptides, 15N metabolic labeling, proteome-scale fractionation, and recombinant protein crosslinking, and involved various functional crosslinkers such as DSS, DSSO, DSBSO, ADH, and Leiker. Across all evaluation metrics, pLink3 maintained a leading performance. For example, compared with Xi, the second most sensitive, even expanded the entrapment database to human scale, pLink3 lost only 4% peptide pairs in the Mechtler2-Syn-DSSO dataset, whereas Xi lost 20%. In the Rappsilber-Ecoli-BS3 dataset, pLink3 identified 817 high-confidence PPIs, nearly twice the 406 PPIs reported by Xi. For speed, pLink3 processed each RAW file in an average of just 1.0 minutes (167 files totaling 2.8 hours), while Xi used 242hr (approximately 10 days). Additionally, pLink3 supports data acquired from a variety of mass analyzers, including Orbitrap, Astral, and timsTOF. Leveraging pLink3's high sensitivity, rapid processing speed, and strong compatibility, along with its precision validated by comprehensive benchmarking, we re-analyzed hundreds of millions of spectra to construct an XL-based structurome database containing over 100,000 unique residue-pairs. This resource will greatly facilitate AI-driven systems-level structural biology research. pLink3 is freely available at https://github.com/pFindStudio/pLink3.

Keywords:

crosslinking mass spectrometry, pLink3, protein-protein interactions, structural proteomics, large-scale data analysis

赵群
中国科学院大连化学物理研究所

化学交联驱动的细胞内蛋白质复合物构象分析新技术

Abstract:

作为生命活动的核心执行者,蛋白质通过形成复合体等多种方式实现其特定的生物学功能。在细胞内,限域效应、拥挤效应和细胞器微环境对于蛋白质复合体的结构与功能具有重要的调控作用。因此,精确解析细胞内蛋白质的相互作用对深入理解其生物学功能以及生命现象的本质至关重要。近年来,化学交联质谱技术逐渐成为解析蛋白质复合物的重要手段。该技术通过使用化学交联剂将空间上靠近的蛋白质氨基酸残基共价连接,并利用质谱技术对交联肽段进行鉴定,从而实现对蛋白质相互作用界面及作用位点的解析。然而,细胞内蛋白质复合物的原位构像解析仍处于起步阶段。针对这一挑战,我们团队开发了一系列新型高生物兼容性的可透膜多功能化学交联剂,实现了活细胞内蛋白质复合物构像的原位交联捕获;发展了多种高选择性的低丰度交联肽段的富集方法和高可信度的交联肽段鉴定技术,显著提高了原位交联信息的鉴定灵敏度、覆盖度和准确度;进而,通过靶向富集特定亚细胞器内的交联蛋白质复合物分析方法的建立,实现了亚细胞器空间分辨的蛋白质相互作用精准解析;在此基础上,利用基于化学交联距离约束的分子动力学技术获得了蛋白质复合物的动态系综构像,实现了活细胞微环境下蛋白质复合物组成、相互作用界面及作用位点的规模化精准解析,为规模化地揭示蛋白质复合物功能状态下的结构调控机制提供了重要的技术支撑。

Keywords:

化学交联, 蛋白质复合物, 构象分析, 质谱, 细胞内

李功玉
南开大学

蛋白手性修饰的组学发现与构效解析

Abstract:

我国科学家主导的大科学计划“蛋白质组学驱动的精准医学(PDPM)”和“人体蛋白质组导航国际大科学计划(π-HuB)”在肝癌、胃癌和肺腺癌等多种肿瘤的蛋白分子图谱绘制与强相关蛋白分子特征发现方面取得了一系列重大研究突破。然而,由于缺乏靶向神经退行性疾病全周期的长程蛋白质标志物,人类神经退行性疾病的早期精准诊断与干预,目前仍面临巨大挑战。蛋白质手性修饰,是指其氨基酸骨架的中心碳原子发生手性翻转。前期研究发现,在多种老年疾病中都存在手性修饰。作为一种低丰度翻译后修饰,由于其不改变序列和分子量,且传统的免疫组化技术对手性异构体的识别特异性及灵敏度均较低,蛋白质手性修饰的规模化鉴定仍充满挑战。针对此,李功玉课题组依托非变性离子淌度质谱平台,开发系列跨尺度手性差异放大创新策略,打破传统质谱难以鉴定蛋白异构修饰的技术壁垒,成功搭建《高性能构象分辨质谱》多场景分析系统,规模化发现神经疾病和多种肿瘤的蛋白质异构体新型标志物,成功解析系列蛋白质异构体疾病标志物的全新构效。该报告将介绍课题组在蛋白手性修饰的组学发现、分子图谱绘制与构效解析等方向上的最新进展。

Keywords:

蛋白手性修饰, 组学, 构效解析, 质谱, 疾病标志物

伊心培
中国科学院上海高等研究院国家蛋白质科学研究(上海)设施

宏蛋白质组深度解析分析方法研究

Abstract:

基于串联质谱的宏蛋白质组学(metaproteomics)是研究复杂微生物群体的重要技术手段。然而,由于其涉及的蛋白质序列数据库规模庞大,显著增加了搜索空间,极大地影响了微生物肽段的鉴定率和准确性。近年来,深度学习技术在蛋白质组学领域展现出强大潜力,已被用于保留时间、谱图预测等多个任务,有望进一步提升宏蛋白质组的解析能力。但针对宏蛋白质组数据的复杂特性,如何与深度学习方法结合,仍缺乏深入的研究。为了解决上述问题,我们开发了一种专门面向宏蛋白质组数据的深度解析分析方法,该方法显著提升了宏蛋白质组数据的识别率与分析深度,为复杂微生物群落的蛋白质组学研究提供了有效的新工具。

Keywords:

宏蛋白质组, 深度解析, 多步数据库搜索, 深度学习, 质谱

夏俊
香港科技大学(广州)

SpectraAI: Deciphering Proteomic Dark Matters with Foundation Models

Abstract:

Accurate identification of proteins is crucial for uncovering their complex roles in biological systems, with peptide sequencing being a key step in this process. The two primary methods for peptide sequencing are database search and de novo sequencing. Database search achieves high accuracy by matching experimental spectra with peptide sequences in a database, but it cannot identify novel peptides, modified peptides, or mutated peptides not present in the database (dark matters in proteomics). On the other hand, de novo sequencing does not rely on a pre-built database, enabling the discovery of novel protein sequences; however, its accuracy still falls short of real-world application requirements. In this talk, I will introduce a series of our works in AI-driven protein identification: 1. AdaNovo, a de novo sequencing algorithm designed for post-translational modifications (PTMs) identification; 2. SearchNovo, a novel protein identification paradigm enjoying the advantages of both database search and de novo sequencing; 3. NovoBench, the first comprehensive deep learning benchmark for de novo sequencing methods; and 4. UltraProt, the first large-scale foundation model for mass spectrometry-based proteomics, which has achieved significant performance advancements in protein identification. Finally, I will share insights and future perspectives on SpectraAI for proteomics and metabolomics.

王宇
鹏城实验室

π-HelixNovo2: making accurate online de novo peptide sequencing available to all

Abstract:

针对蛋白质从头测序方法准确度不高的问题,我们首先提出了互补谱图的新概念来增强蛋白质二级质谱图的信号,开发了从头测序模型π-HelixNovo。理论分析和实际案例分析充分证明了互补谱图策略在增强二级质谱图信号方面的有效性。针对从头测序模型的解码过程,我们引入了双向解码器策略来增强从头测序模型在解码氨基酸序列时的准确率,并将互补谱图和双向解码策略的结合开发出π-HelixNovo2。对比实验的结果表明,π-HelixNovo2表现出了稳定的、显著的性能提升,增强了从头测序模型的可靠性。本研究完成了模型对国产芯片(华为昇腾NPU)的适配,并通过启智AI协作平台将模型接入中国算力网,为用户提供极简化的模型部署与使用体验,并借助于大科学装置“鹏城云脑Ⅱ”和分布式计算,使模型能够在多节点多NPU并行运算,具备了快速的高通量解析能力。

Keywords:

de novo peptide sequencing, π-HelixNovo2, 互补谱图, 双向解码器, 高通量解析

王海鹏
山东理工大学

离子覆盖率增强的多策略肽段从头测序

Abstract:

肽段从头测序可直接由串联质谱图推断氨基酸序列,是鉴定非经典或突变肽段的重要手段,其精度高度依赖于碎片离子覆盖率。本报告以增强碎片离子覆盖率为切入点,基于深度学习技术,探索了三种提升测序精度的策略。GCNovo利用更多碎片离子类型(如内部离子),基于图卷积网络显式建模连续、同源及互补离子关系,在保持较少参数的情况下,实现九个物种数据集上肽段召回率和氨基酸召回率的显著提升。MirrorNovo利用镜像酶(如Trypsin和LysargiNase)酶切策略,在特征空间融合两张互补的镜像谱图,有效弥补单一酶切肽谱中常见的碎片离子缺失,在五个物种数据集上平均肽段召回率相对提升9.3%,平均氨基酸召回率超过95%。MultiNovo通过融合多种碎裂方式的谱图,在统一框架下实现单谱和多谱的高精度从头测序,缓解了当前深度学习模型适用数据类型单一的问题,并具备利用特征离子区分亮氨酸(Leu)和异亮氨酸(Ile)的潜力。上述策略有望提升从头测序在多种应用场景下的准确性与适用性,并为深度学习测序模型的发展提供参考。

Keywords:

肽段从头测序, 离子覆盖率, 深度学习, 多策略, 质谱

单宝珍
Bioinformatics Solutions Inc.

Facts or knowledge? Overfitting problem in deep learning modeling training for de novo peptide sequencing

Abstract:

In recent years, deep neural network based A.I. technology keeps performance improving almost everyday. It helps resolve many practical problems in many different areas especially in natural language understanding and computer vision. Those experience which is used to resolve the problem in natural language understanding and computer vision can be easily ported to resolve the problems in other area such as mass spectrometry data analysis. Since the introduction of DeepNovo algorithm in 2017, deep learning technology has helped to significantly improve the performance even speed of de novo peptide sequencing. However, some artifacts of deep learning model such as overfitting problem are also reported. In this talk, we are presenting some observations of training LLM for mass spectrometry data to avoid overfitting problem. To our knowledge, this remains an open question which needs community’s attention.

Keywords:

de novo peptide sequencing, deep learning, overfitting, mass spectrometry, LLM

曾文锋
西湖大学

FeNNet-MHC: A foundation model for peptide-HLA representation learning and shared neoepitope discovery

Abstract:

MHC-presented peptides form the molecular interface between intracellular protein states and T cell–mediated immune recognition. Mass spectrometry-based immunopeptidomics enables direct detection of naturally presented HLA-bound peptides, facilitating neoantigen discovery. Among these, shared neoepitopes—arising from recurrent driver mutations and presented across multiple HLA alleles—offer unique opportunities for off-the-shelf immunotherapies. However, detecting shared neoepitopes remains difficult due to the massive combinatorial space of peptide–HLA pairs and limitations in current AI models, which learn allele-specific binding but do not generalize across HLA types. We developed FeNNet-MHC, a foundation model that jointly embeds peptide and HLA sequences via contrastive learning. The model was trained on ~600,000 HLA-peptide pairs from 142 HLA class I alleles. The HLA encoder was initialized from the pretrained ESM2 model, while the peptide encoder was trained using the AlphaPeptDeep framework. Joint embeddings were indexed using FAISS to enable scalable binding prediction and epitope retrieval. FeNNet-MHC enabled accurate prediction of MS-detected immunopeptides, correcting false identifications and increasing detections by up to 100% for some public data. Clustering embeddings of >15,000 known HLA variants revealed 24 dominant HLA groups with distinct, consistent peptide-binding motifs. Importantly, shared epitopes could be identified by querying the peptide embedding space across HLA variants, supporting cross-allelic epitope discovery.

Keywords:

FeNNet-MHC, peptide-HLA, representation learning, shared neoepitope, immunopeptidomics

姚建华
腾讯AI生命科学实验室

利用人工智能增强蛋白质组学数据分析能力

Abstract:

近年来,蛋白质组学的进展极大地提升了我们对复杂生物体生物学的理解,特别是在肿瘤微环境、正常组织稳态和疾病发展等领域。然而,快速生成的大量多样化蛋白质组学数据需要增强现有数据分析方法。近年来,人工智能(AI)技术取得了重大突破,出现了无监督学习、迁移学习和大型语言模型等工具。本次演讲将介绍腾讯AI生命科学实验室正在开展的几个项目,重点介绍先进的AI技术在蛋白质组学数据分析中的应用,包括细胞蛋白质组嵌入、蛋白质组解卷积、空间蛋白质组建模和数据库构建等。

Keywords:

人工智能, 蛋白质组学, 数据分析, 无监督学习, 迁移学习

常乘
国家蛋白质科学中心-北京

基于AI的未知蛋白质系统挖掘

Abstract:

已有研究表明,尚有大量非经典开放阅读框(non-canonical open reading frames, ncORFs)翻译而来的未知蛋白质有待深入挖掘,而这些未知蛋白质往往在重要的生理、病理过程中发挥关键调控作用。系统挖掘这些未知蛋白质,即“蛋白质组中的暗物质(Dark Proteome)”,将有助于深入理解生命系统复杂性。为此,我们开发了基于自监督学习的蛋白质组预训练模型π-SPECFormer,以2.91亿张质谱谱图为训练数据,通过对比学习框架理解谱图中的潜在规律,并应用于从头测序、谱图聚类、翻译后修饰鉴定等下游任务,均取得显著的性能提升。进一步,将π-SPECFormer应用于一组人类蛋白质组数据的重分析时,谱图、肽段和蛋白层面的鉴定量分别提升了280.0%、128.7%和25.8%,发现了大量包含突变的新蛋白。这表明蛋白质组预训练模型具有巨大的应用价值和潜力,为系统挖掘未知蛋白质提供了新思路。

Keywords:

未知蛋白质, 人工智能, 自监督学习, 预训练模型, 质谱