dbPepVar

The dbPepVar is a new proteogenomics database which combines genetic variation information from dbSNP with protein sequences from NCBI's RefSeq. We then perform a pan-cancer analysis (Ovarian, Colorectal, Breast and Prostate) using public mass spectrometry datasets to identify genetic variations and genes present in the analyzed samples. As results, were identified 2,661 variant peptides in breast cancer (BrCa), 2,411 in colon-rectal cancer (CrCa), 3,726 in ovarian cancer (OvCa), and 2,543 in prostate cancer (PrCa).

Compared to other approaches, our database contains a greater diversity of variants, including missense, nonsense mutations, loss of termination codon, insertions, deletions (of any size), frameshifts and mutations that alter the start translation. Besides, for each protein, only the variant tryptic peptides derived from enzymatic cleavage (i.e., trypsin) are inserted, following the criteria of size, allelic frequency and affected regions of the protein. In our approach, Mass spectrometry (MS) data is submitted to the dbPepVar variant and reference base separately. The outputs are compared and filtered by the scores for each base. Using public MS data from four types of cancer, we mostly identified cancer-specific SNPs, but shared mutations were also present in a lower amount.

Click on legends of plots to activate or deactivate labels.
Use regex to search in datatables.

Mutated Genes of Samples by Cancer

Mutated Genes of unique SNPs identified from Peptides

Amino acid changes of Samples by Cancer

Amino acid changes of unique SNPs identified from Peptides

Properties Changes of Samples by Cancer

Mutations of Samples per Chromosome by Cancer

Citation

# If you have used dbPepVar data, please cite:

Lucas Marques da Cunha^1,2 , Patrick Terrematte³ , Tayná da Silva Fiúza¹ , Vandeclécio L. da Silva¹ , José Eduardo Kroll¹ , Sandro José de Souza^1,2 , Gustavo Antônio de Souza^1,5 (2022) "dbPepVar: a novel cancer proteogenomics database". in IEEE Access, vol. 10, pp. 90982-90994, 2022, doi: 10.1109/ACCESS.2022.3201897 .
[BibTex] [RIS]

# If you have used Proteogenomics Viewer, please cite:

José Eduardo Kroll¹ , Vandeclécio L. da Silva¹ , Sandro José de Souza^1,2 , Gustavo Antônio de Souza^1,5 (2017) "A tool for integrating genetic and mass spectrometry‐based peptide data: Proteogenomics Viewer - A genome browser‐like tool, which includes MS data visualization and peptide identification parameters". Bioessays 39 (7), https://doi.org/10.1002/bies.201700015
[BibTex] [RIS]

Affiliations

¹Bioinformatics Multidisciplinary Environment - BioME, Federal University of Rio Grande do Norte - UFRN, Brazil
²Federal University of Rondonia - UNIR, Brazil
³Metropolis Digital Institute, UFRN, Brazil
⁴Brain Institute, UFRN, Brazil
⁵Department of Biochemistry, UFRN, Brazil

Contact

The dbPepVar team is available for users that want to import their data on demand. In order to incorporate new data, we will execute the sanity checks with curation, data cleaning and preparation steps required to integrate the new data.

# If you wish to incorporate new data, or send suggestions, please contact:

lucas.marques {at} unir.br.

Presentation of Proteogenomic Viewer:

Citation:

Kroll, J.E., da Silva, V.L., de Souza, S.J. and de Souza, G.A. (2017) "A tool for integrating genetic and mass spectrometry‐based peptide data: Proteogenomics Viewer - A genome browser‐like tool, which includes MS data visualization and peptide identification parameters". Bioessays 39 (7), https://doi.org/10.1002/bies.201700015 .
[BibTex] [RIS]

Fasta file:

Fasta dbPepVar

Log files:

Missense and Nonsense mutations - Minor allele frequency (MAF) < 5%

Missense and Nonsense mutations - Minor allele frequency (MAF) >= 5%

Frameshift mutations

Stop loss mutations

UTR'Var mutations

Data format description:

The dbPepVar fasta file construction process:

A) Initially, the reference protein is mutated according to dbSNP information. The mutated peptides are then located on the generated protein.

B) A list containing the mutated peptides for each protein present in RefSeq is generated.

C) Final fasta file is generated by concatenating the mutated peptides of each protein, generating a new theoretical sequence.

The dbPepVar provides a log file containing information about mutated peptides. The header fields are the protein identifier (RefSeq), the SNP identifier, and the position of the peptide in the reference protein. A tab delimits the fields. Each entry has the sequence of reference and the mutated peptide. Each type of mutation is in separate files, and the missense and nonsense mutations are available in the Minor Allele Frequency (MAF) files.