See https://omics.pnl.gov/software/ms-gf for more info on how to perform a database search on MSMS dataset with MS-GF+ and how to generate a mzID file. Note that most functions in these package require data from a competitive target decoy search.
parse_msgf_mzid(mzid_path)
mzid_path | Location of the mzID file. |
---|
A data frame containing the following 7 columns:
Id of the spectrum from the searched dataset file.
Amino acid sequence matching the spectra.
Id of the sequence from the database file.
score assigned to the peptide to spectrum match (PSM).
Name of the database file used to search the spectra.
TRUE if decoy PSM, FALSE otherwise.
Number of sequences in the database file.
We take the MS-GF+ SpecEValue as the PSM score for FDR calculation.
## Location of the zipped data files zip_file_path = system.file("extdata", "extdata.zip", package = "saas") ## Unzip and get the (temporary) location of the mzid file with the MS-GF+ search results from a ## competitive target decoy search of the complete pyrococcus proteome against a pyrococcus dataset. mzid_file_path = unzip(zip_file_path, 'pyrococcus.mzid',exdir = tempdir()) ## Parse the mzid file parse_msgf_mzid(mzid_file_path)#> # A tibble: 15,639 x 7 #> spec_id sequence protein_id #> <dbl> <chr> <chr> #> 1 9834 GLEVSGYNCYIYPAMALAYGTSAIGAHHK Q8U1K3|Formaldehyde:ferredoxin #> 2 10918 MLVDSLGDIVITNDGATILDEMDIQHPAAK Q8TZL6|Thermosome, #> 3 12207 IADEMGMDTISLGVSIAHVMEAVER Q8U1K3|Formaldehyde:ferredoxin #> 4 12179 IADEMGMDTISLGVSIAHVMEAVER Q8U1K3|Formaldehyde:ferredoxin #> 5 11387 MLVDSLGDIVITNDGATILDEMDIQHPAAK Q8TZL6|Thermosome, #> 6 11027 LLELMGIPIVQAPSEGEAQAAYMAAK O93634|Flap #> 7 7833 AVNLNQFENDANFEAHYYGTAK Q8TZW7|Cysteine #> 8 9573 LYDLGVQGADLIAMNTDAQHLAITK Q8U3E3|Cell #> 9 12557 TFTATASQGLALMHEILFIAAGMR Q51804|Pyruvate #> 10 11856 EYYWIDLGTPEDLFYAHQIALDQLSR Q8U2G7|NDP-sugar #> # ... with 15,629 more rows, and 4 more variables: score <dbl>, database <chr>, #> # decoy <lgl>, database_size <dbl>