The dbPepVar is a new proteogenomics database which combines genetic variation information from dbSNP with protein sequences from NCBI's RefSeq. We then perform a pan-cancer analysis (Ovarian, Colorectal, Breast and Prostate) using public mass spectrometry datasets to identify genetic variations and genes present in the analyzed samples. As results, were identified 2,661 variant peptides in breast cancer (BrCa), 2,411 in colon-rectal cancer (CrCa), 3,726 in ovarian cancer (OvCa), and 2,543 in prostate cancer (PrCa).

Compared to other approaches, our database contains a greater diversity of variants, including missense, nonsense mutations, loss of termination codon, insertions, deletions (of any size), frameshifts and mutations that alter the start translation. Besides, for each protein, only the variant tryptic peptides derived from enzymatic cleavage (i.e., trypsin) are inserted, following the criteria of size, allelic frequency and affected regions of the protein. In our approach, Mass spectrometry (MS) data is submitted to the dbPepVar variant and reference base separately. The outputs are compared and filtered by the scores for each base. Using public MS data from four types of cancer, we mostly identified cancer-specific SNPs, but shared mutations were also present in a lower amount.


Click on legends of plots to activate or deactivate labels.
Use regex to search in datatables.

Loading...
Loading...
Loading...
Loading...
Mutated Genes of Samples by Cancer
Loading...
Loading...
Mutated Genes of unique SNPs identified from Peptides
Loading...
Loading...
Amino acid changes of Samples by Cancer
Loading...
Loading...
Amino acid changes of unique SNPs identified from Peptides
Loading...
Loading...
Properties Changes of Samples by Cancer
Loading...
Mutations of Samples per Chromosome by Cancer
Loading...

Citation

# If you have used dbPepVar data, please cite:
Lucas Marques da Cunha1,2 , Patrick Terrematte3 , Tayná da Silva Fiúza1 , Vandeclécio L. da Silva1 , José Eduardo Kroll1 , Sandro José de Souza1,2 , Gustavo Antônio de Souza1,5 (2022) "dbPepVar: a novel cancer proteogenomics database". in IEEE Access, vol. 10, pp. 90982-90994, 2022, doi: 10.1109/ACCESS.2022.3201897 .
[BibTex] [RIS]

# If you have used Proteogenomics Viewer, please cite:
José Eduardo Kroll1 , Vandeclécio L. da Silva1 , Sandro José de Souza1,2 , Gustavo Antônio de Souza1,5 (2017) "A tool for integrating genetic and mass spectrometry‐based peptide data: Proteogenomics Viewer - A genome browser‐like tool, which includes MS data visualization and peptide identification parameters". Bioessays 39 (7), https://doi.org/10.1002/bies.201700015
[BibTex] [RIS]

Affiliations
1Bioinformatics Multidisciplinary Environment - BioME, Federal University of Rio Grande do Norte - UFRN, Brazil
2Federal University of Rondonia - UNIR, Brazil
3Metropolis Digital Institute, UFRN, Brazil
4Brain Institute, UFRN, Brazil
5Department of Biochemistry, UFRN, Brazil

Contact

The dbPepVar team is available for users that want to import their data on demand. In order to incorporate new data, we will execute the sanity checks with curation, data cleaning and preparation steps required to integrate the new data.
# If you wish to incorporate new data, or send suggestions, please contact:
lucas.marques {at} unir.br.

Loading...
Loading...
Loading...
Loading...
Loading...

Presentation of Proteogenomic Viewer:

Citation:

Kroll, J.E., da Silva, V.L., de Souza, S.J. and de Souza, G.A. (2017) "A tool for integrating genetic and mass spectrometry‐based peptide data: Proteogenomics Viewer - A genome browser‐like tool, which includes MS data visualization and peptide identification parameters". Bioessays 39 (7), https://doi.org/10.1002/bies.201700015 .
[BibTex] [RIS]

Fasta file:

Log files:



Data format description:

The dbPepVar fasta file construction process:

A) Initially, the reference protein is mutated according to dbSNP information. The mutated peptides are then located on the generated protein.

B) A list containing the mutated peptides for each protein present in RefSeq is generated.

C) Final fasta file is generated by concatenating the mutated peptides of each protein, generating a new theoretical sequence.



The dbPepVar provides a log file containing information about mutated peptides. The header fields are the protein identifier (RefSeq), the SNP identifier, and the position of the peptide in the reference protein. A tab delimits the fields. Each entry has the sequence of reference and the mutated peptide. Each type of mutation is in separate files, and the missense and nonsense mutations are available in the Minor Allele Frequency (MAF) files.