If you have in-house standards which have been acquired with MS2 spectra data, then you can construct the in-house MS2 spectra databases using the metid package.

There are no specific requirements on how to run the LC/MS data for users. As the in-house database construction in metid is used for users to get the in-house databases for themselves (including m/z, retention time and MS/MS spectra of metabolites, for level 1 annotation (Sumner et al., 2007)), so the users just need to run the standards using the same column, LC-gradient, and MS settings with their real samples in the lab

Data preparation

Firstly, please transform your raw standard MS data (positive and negative modes) to mzXML format using ProteoWizard. The parameter setting is shown in the figure below:

Data organization

Secondly, please organize your standard information as a table, and output it in a csv or xlsx format. The format of standard information can refer to our demo data in demoData package.

From column 1 to 11, the columns are “Lab.ID”, “Compound.name”, “mz”, “RT”, “CAS.ID”, “HMDB.ID”, “KEGG.ID”, “Formula”, “mz.pos”, “mz.neg”, “Submitter”, respectively. It is OK if you have other information for the standards. As the demo data show, there are other additional information, namely “Family”, “Sub.pathway” and “Note”.

  • Lab.ID: No duplicated.

  • mz: Accurate mass of compounds.

  • RT: Retention time, unit is second.

  • mz.pos: Mass to change ratio of compound in positive mode, for example, M+H. You can set it as NA.

  • mz.neg: Mass to change ratio of compound in negative mode, for example, M-H. You can set it as NA.

  • Submitter: The name of person or organization. You can set it as NA.

Then create a folder and put your mzXML format datasets (positive mode in ‘POS’ folder and negative mode in ‘NEG’ folder) and compound information in it. The mzXML file should have the collision energy in the name of each file. For example, test_NCE25.mzXML.

The names of the mzXML files should be like this: xxx_NCE25.mzXML.

Run construct_database() function

Here we use the demo data from demoData package to show how to use the construct_database() function to construct database.

We first prepare dataset.

Download the data here. and then put all of them in the “database_construction” folder.

Then there will be a folder named as database_construction in your work directory like below figure shows:

Then we run construct_database() function and then we can get the database.

library(metid)

new.path <- file.path("./database_construction")

test.database <- construct_database(
  path = new.path,
  version = "0.0.1",
  metabolite.info.name = "metabolite.info_RPLC.csv",
  source = "Michael Snyder lab",
  link = "http://snyderlab.stanford.edu/",
  creater = "Xiaotao Shen",
  email = "shenxt1990@163.com",
  rt = TRUE,
  mz.tol = 15,
  rt.tol = 30,
  threads = 5
)
#> 
[32mReading metabolite information...
#> 
[39m
[32mReading positive MS2 data...
#> 
[39m
[32mReading MS2 data...
#> 
[39m
[32mProcessing...
#> 
[39m
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%
#> 
#> 
[31mOK
#> 
[39m
[32mReading negative MS2 data...
#> 
[39m
[32mReading MS2 data...
#> 
[39m
[32mProcessing...
#> 
[39m
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%
#> 
#> 
[31mOK
#> 
[39m
[32mMatching metabolites with MS2 spectra (positive)...
#> 
[39m
[31mOK
#> 
[39m
[32mMatching metabolites with MS2 spectra (negative)...
#> 
[39m
[31mOK
#> 
[39m
[41mAll done!
#> 
[49m

The arguments of construct_database() can be found here construct_database().

test.database is a databaseClass object, you can print it to see its information.

test.database
#> -----------Base information------------
#> Version: 0.0.1 
#> Source: Michael Snyder lab 
#> Link: http://snyderlab.stanford.edu/ 
#> Creater: Xiaotao Shen ( shenxt1990@163.com )
#> With RT information
#> -----------Spectral information------------
#> There are 14 items of metabolites in database:
#> Lab.ID; Compound.name; mz; RT; CAS.ID; HMDB.ID; KEGG.ID; Formula; mz.pos; mz.neg; Submitter; Family; Sub.pathway; Note 
#> There are 170 metabolites in total
#> There are 113 metabolites in positive mode with MS2 spectra.
#> There are 112 metabolites in negative mode with MS2 spectra.
#> Collision energy in positive mode (number:):
#> Total number: 2 
#> NCE25; NCE50 
#> Collision energy in negative mode:
#> Total number: 2 
#> NCE25; NCE50 
#> 

Note: test.database is only a demo database (metIdentifyClass object). We will don’t use it for next metabolite identification. Then please save this database in you local folder, please note that the saved file name and database name must be same. For example:

save(test.database, file = "test.database")

If you save the test.database as a different name, it will be a error when you use it.

MS1 database

If you do not have MS2 data, you can also use construct_database() function to construct MS1 database.

Session information

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] massdataset_0.99.1 magrittr_2.0.1     tinytools_0.9.1    metid_1.2.1       
#> 
#> loaded via a namespace (and not attached):
#>   [1] colorspace_2.0-2      rjson_0.2.20          ellipsis_0.3.2       
#>   [4] leaflet_2.0.4.1       rprojroot_2.0.2       circlize_0.4.14      
#>   [7] GlobalOptions_0.1.2   fs_1.5.0              clue_0.3-59          
#>  [10] rstudioapi_0.13       mzR_2.26.1            listenv_0.8.0        
#>  [13] furrr_0.2.3           affyio_1.62.0         fansi_0.5.0          
#>  [16] codetools_0.2-18      ncdf4_1.17            doParallel_1.0.16    
#>  [19] cachem_1.0.5          impute_1.66.0         knitr_1.33           
#>  [22] jsonlite_1.7.2        Cairo_1.5-12.2        cluster_2.1.2        
#>  [25] vsn_3.60.0            png_0.1-7             BiocManager_1.30.16  
#>  [28] readr_2.0.0           compiler_4.1.0        httr_1.4.2           
#>  [31] rvcheck_0.1.8         assertthat_0.2.1      fastmap_1.1.0        
#>  [34] lazyeval_0.2.2        limma_3.48.1          cli_3.0.1            
#>  [37] htmltools_0.5.2       tools_4.1.0           gtable_0.3.0         
#>  [40] glue_1.4.2            affy_1.70.0           dplyr_1.0.7          
#>  [43] Rcpp_1.0.7            MALDIquant_1.20       Biobase_2.52.0       
#>  [46] cellranger_1.1.0      jquerylib_0.1.4       pkgdown_2.0.1        
#>  [49] vctrs_0.3.8           preprocessCore_1.54.0 iterators_1.0.13     
#>  [52] crosstalk_1.1.1       xfun_0.24             stringr_1.4.0        
#>  [55] globals_0.14.0        openxlsx_4.2.4        lifecycle_1.0.0      
#>  [58] XML_3.99-0.6          future_1.21.0         zlibbioc_1.38.0      
#>  [61] MASS_7.3-54           scales_1.1.1          MSnbase_2.18.0       
#>  [64] ragg_1.1.3            pcaMethods_1.84.0     hms_1.1.0            
#>  [67] ProtGenerics_1.24.0   parallel_4.1.0        RColorBrewer_1.1-2   
#>  [70] ComplexHeatmap_2.8.0  yaml_2.2.1            memoise_2.0.0        
#>  [73] pbapply_1.4-3         ggplot2_3.3.5         sass_0.4.0           
#>  [76] stringi_1.7.3         S4Vectors_0.30.0      desc_1.3.0           
#>  [79] foreach_1.5.1         BiocGenerics_0.38.0   zip_2.2.0            
#>  [82] BiocParallel_1.26.1   shape_1.4.6           rlang_0.4.11         
#>  [85] pkgconfig_2.0.3       systemfonts_1.0.2     matrixStats_0.60.0   
#>  [88] mzID_1.30.0           evaluate_0.14         lattice_0.20-44      
#>  [91] purrr_0.3.4           htmlwidgets_1.5.3     tidyselect_1.1.1     
#>  [94] ggsci_2.9             parallelly_1.27.0     plyr_1.8.6           
#>  [97] R6_2.5.0              IRanges_2.26.0        generics_0.1.0       
#> [100] DBI_1.1.1             pillar_1.6.2          MsCoreUtils_1.4.0    
#> [103] tibble_3.1.3          crayon_1.4.1          utf8_1.2.2           
#> [106] plotly_4.9.4.1        tzdb_0.1.2            rmarkdown_2.9        
#> [109] GetoptLong_1.0.5      grid_4.1.0            readxl_1.3.1         
#> [112] data.table_1.14.0     digest_0.6.27         tidyr_1.1.3          
#> [115] gridGraphics_0.5-1    textshaping_0.3.6     stats4_4.1.0         
#> [118] munsell_0.5.0         ggplotify_0.0.8       viridisLite_0.4.0    
#> [121] bslib_0.3.1