Checks if protein id appears in the headers of a fasta file.
id_is_present(protein_id, fastapath)
protein_id | Vector of protein ids. |
---|---|
fastapath | Location of the fasta file. |
Logical vector, TRUE if protein id is present in provided fasta file, FALSE otherwise.
## Location of the zipped data files zip_file_path = system.file("extdata", "extdata.zip", package = "saas") ## Unzip and get the (temporary) location of the mzid file with the MS-GF+ search results from a ## competitive target decoy search of the complete pyrococcus proteome against a pyrococcus dataset. mzid_file_path = unzip(zip_file_path, 'pyrococcus.mzid',exdir = tempdir()) ## Read and parse the mzid file dat = parse_msgf_mzid(mzid_file_path) ## Unzip and get the (temporary) location of the file with fasta headers. ## Each fasta header contains a protein_id from the protein subset of interest. ## These protein_ids match the protein_ids in the mzid result file. fasta_file_path = unzip(zip_file_path, 'transferase_activity_[GO:0016740].fasta', exdir = tempdir()) protein_ids = unique(dat$protein_id) head(protein_ids)#> [1] "Q8U1K3|Formaldehyde:ferredoxin" "Q8TZL6|Thermosome," #> [3] "O93634|Flap" "Q8TZW7|Cysteine" #> [5] "Q8U3E3|Cell" "Q51804|Pyruvate"is_subset = id_is_present(protein_ids, fasta_file_path) ## Check how many of the identified proteins are subset and non subset protiens. table(is_subset)#> is_subset #> FALSE TRUE #> 1587 20