See https://omics.pnl.gov/software/ms-gf for more info on how to perform a database search on MSMS dataset with MS-GF+ and how to generate a mzID file. Note that most functions in these package require data from a competitive target decoy search.

parse_msgf_mzid(mzid_path)

Arguments

mzid_path

Location of the mzID file.

Value

A data frame containing the following 7 columns:

spec_id

Id of the spectrum from the searched dataset file.

sequence

Amino acid sequence matching the spectra.

protein_id

Id of the sequence from the database file.

score

score assigned to the peptide to spectrum match (PSM).

database

Name of the database file used to search the spectra.

decoy

TRUE if decoy PSM, FALSE otherwise.

database_size

Number of sequences in the database file.

Details

We take the MS-GF+ SpecEValue as the PSM score for FDR calculation.

Examples

## Location of the zipped data files zip_file_path = system.file("extdata", "extdata.zip", package = "saas") ## Unzip and get the (temporary) location of the mzid file with the MS-GF+ search results from a ## competitive target decoy search of the complete pyrococcus proteome against a pyrococcus dataset. mzid_file_path = unzip(zip_file_path, 'pyrococcus.mzid',exdir = tempdir()) ## Parse the mzid file parse_msgf_mzid(mzid_file_path)
#> # A tibble: 15,639 x 7 #> spec_id sequence protein_id #> <dbl> <chr> <chr> #> 1 9834 GLEVSGYNCYIYPAMALAYGTSAIGAHHK Q8U1K3|Formaldehyde:ferredoxin #> 2 10918 MLVDSLGDIVITNDGATILDEMDIQHPAAK Q8TZL6|Thermosome, #> 3 12207 IADEMGMDTISLGVSIAHVMEAVER Q8U1K3|Formaldehyde:ferredoxin #> 4 12179 IADEMGMDTISLGVSIAHVMEAVER Q8U1K3|Formaldehyde:ferredoxin #> 5 11387 MLVDSLGDIVITNDGATILDEMDIQHPAAK Q8TZL6|Thermosome, #> 6 11027 LLELMGIPIVQAPSEGEAQAAYMAAK O93634|Flap #> 7 7833 AVNLNQFENDANFEAHYYGTAK Q8TZW7|Cysteine #> 8 9573 LYDLGVQGADLIAMNTDAQHLAITK Q8U3E3|Cell #> 9 12557 TFTATASQGLALMHEILFIAAGMR Q51804|Pyruvate #> 10 11856 EYYWIDLGTPEDLFYAHQIALDQLSR Q8U2G7|NDP-sugar #> # ... with 15,629 more rows, and 4 more variables: score <dbl>, database <chr>, #> # decoy <lgl>, database_size <dbl>