SpectraToQueries

Lifecycle: experimental R-CMD-check r-universe badge Codecov test coverage

Repository to translate spectra to queries.

Requirements

Here is what you minimally need:

  • A file containing MS/MS spectra with associated skeleton information (or any other relevant chemical classification) provided as metadata. This structure information, stored in the metadata field “skeleton”, allows the generation of queries specific to a given skeleton by extracting repetitive skeleton-specific fragmentation patterns. The MIADB file is provided as an example.

Installation

As the package is not (yet) available on CRAN, you will need to install with:

install.packages(
  "SpectraToQueries",
  repos = c(
    "https://spectra-to-knowledge.r-universe.dev",
    "https://bioc.r-universe.dev",
    "https://cloud.r-project.org"
  )
)

Use

To reproduce the example that uses the Monoterpene Indole Alkaloids Database (.mgf) file by default, which includes the annotation of spectral skeletons:

SpectraToQueries::spectra_to_queries()

To reproduce the “grouped” example that uses the MIADB file, which includes an expert-based annotation of spectral “super skeletons” (combination of skeletons exhibiting a high structural similarity):

SpectraToQueries::spectra_to_queries(
  spectra = system.file(
    "extdata",
    "spectra_grouped.rds",
    package = "SpectraToQueries"
  ),
  export = "data/interim/queries-grouped.tsv"
)

To generate diagnostic ions queries from your spectra:

SpectraToQueries::spectra_to_queries(
  spectra = "yourAwesomeSpectra.mgf",
  export = "path/yourEvenBetterResults.tsv"
)

Showing all parameters:

SpectraToQueries::spectra_to_queries(
  spectra = NULL,
  export = "data/interim/queries.tsv",
  beta_1 = 1.0,
  beta_2 = 0.5,
  dalton = 0.01,
  decimals = 4L,
  intensity_min = 0.0,
  ions_max = 10L,
  n_skel_min = 5L,
  n_spec_min = 3L,
  ppm = 30.0,
  fscore_min = 0.0,
  precision_min = 0.0,
  recall_min = 0.0,
  zero_val = 0.0
)

Main Citations

Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids: https://doi.org/10.1186/s13321-025-01009-0

Additional software credits

Package Version Citation
base 4.5.1 R Core Team (2025)
BiocGenerics 0.54.0 Huber et al. (2015)
BiocManager 1.30.26 Morgan and Ramos (2025)
BiocParallel 1.42.1 Morgan et al. (2025)
BiocVersion 3.21.1 Morgan (2024)
knitr 1.50 Xie (2014); Xie (2015); Xie (2025)
MsBackendMgf 1.16.0 Gatto, Rainer, and Gibb (2025)
pkgload 1.4.0 Wickham et al. (2024)
rmarkdown 2.29 Xie, Allaire, and Grolemund (2018); Xie, Dervieux, and Riederer (2020); Allaire et al. (2024)
Spectra 1.18.2 Rainer et al. (2022)
testthat 3.2.3 Wickham (2011)
tidytable 0.11.2 Fairbanks (2024)
tidyverse 2.0.0 Wickham et al. (2019)

References

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2024. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Fairbanks, Mark. 2024. tidytable: Tidy Interface to data.table. https://markfairbanks.github.io/tidytable/.
Gatto, Laurent, Johannes Rainer, and Sebastian Gibb. 2025. MsBackendMgf: Mass Spectrometry Data Backend for Mascot Generic Format (Mgf) Files. https://doi.org/10.18129/B9.bioc.MsBackendMgf.
Huber, W., Carey, V. J., Gentleman, R., Anders, et al. 2015. Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.
Morgan, Martin. 2024. BiocVersion: Set the Appropriate Version of Bioconductor Packages. https://doi.org/10.18129/B9.bioc.BiocVersion.
Morgan, Martin, and Marcel Ramos. 2025. BiocManager: Access the Bioconductor Project Package Repository. https://bioconductor.github.io/BiocManager/.
Morgan, Martin, Jiefei Wang, Valerie Obenchain, Michel Lang, Ryan Thompson, and Nitesh Turaga. 2025. BiocParallel: Bioconductor Facilities for Parallel Evaluation. https://doi.org/10.18129/B9.bioc.BiocParallel.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rainer, Johannes, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, et al. 2022. “A Modular and Expandable Ecosystem for Metabolomics Data Annotation in r.” Metabolites 12: 173. https://doi.org/10.3390/metabo12020173.
Wickham, Hadley. 2011. testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Jim Hester, and Lionel Henry. 2024. pkgload: Simulate Package Installation and Attach. https://github.com/r-lib/pkgload.
Xie, Yihui. 2014. knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.