Data-driven selection of Measurable Compound Lists significantly improves method development
26 August 2025
The study addresses a major issue in the analysis of chemical spaces, such as the exposome - the vast amount of chemical species (known and unknown) that humans are exposed to. In non-targeted analysis (NTA), efforts are made to detect as many unknown chemicals as possible, analysing all chemicals present rather than just searching for components from a pre-selected list. For this, liquid chromatography high-resolution mass spectrometry (LC–HRMS) is an established and widely used analytical technology. However, in method development, optimising the selectivity and sensitivity of LC–HRMS, chemists have to rely on a set of pre-selected chemical compounds referred to as ‘internal standards’. These should represent the chemical space being analysed, but in an NTA of large, heterogeneous chemical spaces, this poses a problem.
Not only is the set of internal standards in itself fairly limited, they also introduce a bias. For instance, it is common to optimise LC–HRMS exposome analysis based on a set of already regulated, known chemicals of environmental concern. This narrows the measurable coverage of the exposome to compounds that are structurally or physicochemically similar. Thus, the real sample chemical complexity is poorly represented, limiting the chemical space that is actually measured and thus hampering the discovery rate of unknown chemicals.
The approach now presented in Environmental Science & Technology Letters offers a viable extension to method development for non-targeted exposome analysis, significantly extending the coverage of the measurable chemical space. In their paper, the researchers describe a data-driven strategy enabling the unbiased selection of structures for LC-HRMS method development from a vast chemical subspace of interest, such as, for example, the CompTox database of the U.S. Environmental Protection Agency (containing over a million chemicals).
Using precomputed PubChem physicochemical properties and predicted mobility and ionisation efficiency from molecular fingerprints, the researchers compiled Measurable Compound Lists (MCLs) of over a hundred chemicals to be used as internal analytical standards. The paper demonstrate that this data-driven, heterogenous selection of structures indeed significantly advances method development of LC–HRMS for NTA analysis.
In the specific context of the exposome chemical subspace, as represented by the CompTox dataset, MCLs are effective tools for understanding and expanding the chemical coverage of NTA methods in identifying unknown or undetected chemicals of environmental concern. They lead to a greater chemical coverage and a broader predicted and experimental LC–HRMS applicability of MCLs compared to common European “watch list” contaminants. Furthermore, MCLs can assist users in assessing the boundaries of chemical space, thereby reducing the risk of false positive detections in environmental analysis.
Renai, Lapo, Viktoriia Turkina, Tobias Hulleman, Alex Nikolopoulos, Andrea FG Gargano, Elvio Amato, Massimo Del Bubba, and Saer Samanipour. "A Novel Chemical Space Dependent Strategy for Compound Selection in Non-Target LC–HRMS Method Development Using Physicochemical and Structural Data." (2025). Environ. Sci. Technol. Lett. 2025, Online publication 18 August. DOI: 10.1021/acs.estlett.5c00759