Preprocessing approaches in machine learning-based groundwater potential mapping: an application to the Koulikoro and Bamako regions, Mali
Identifiers
Permanent link (URI): http://hdl.handle.net/10017/59881DOI: 10.5194/hess-26-221-2022
ISSN: 1027-5606
Date
2022-01-18Academic Departments
Universidad de Alcalá. Departamento de Geología, Geografía y Medio Ambiente
Teaching unit
Unidad Docente Geología
Funders
Ministerio de Ciencia e Innovación
Ministerio de Educación, Cultura y Deporte
Minsterio de Ciencia, Innovación y Universidades
Bibliographic citation
Hydrology and Earth System Sciences, 2022, v. , n. , p. -
Description / Notes
23 p.
Project
info:eu-repo/grantAgreement/MICIN// PRE2019-090026/ES//
info:eu-repo/grantAgreement/MECD/Salvador de Madariaga/ PRX18%00235/ES//
info:eu-repo/grantAgreement/MCEU//RTI2018- 099394-B-I00/ES//
Document type
info:eu-repo/semantics/article
Version
info:eu-repo/semantics/publishedVersion
Rights
© Author(s) 2022
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Access rights
info:eu-repo/semantics/openAccess
Abstract
Groundwater is crucial for domestic supplies in the Sahel, where the strategic importance of aquifers will increase in the coming years due to climate change. Groundwater potential mapping is a valuable tool to underpin water management in the region and, hence, to improve drinking water access. This paper presents a machine learning method to map groundwater potential. This is illustrated through its application in two administrative regions of Mali. A set of explanatory variables for the presence of groundwater is developed first. Scaling methods (standardization, normalization, maximum absolute value and max?min scaling) are used to avoid the pitfalls associated with reclassification. Noisy, collinear and counterproductive variables are identified and excluded from the input dataset. A total of 20 machine learning classifiers are then trained and tested on a large borehole database (n = 3345) in order to find meaningful correlations between the presence or absence of groundwater and the explanatory variables. Maximum absolute value and standardization proved the most efficient scaling techniques, while tree-based algorithms (accuracy >0.85) consistently outperformed other classifiers. The borehole flow rate data were then used to calibrate the results beyond standard machine learning metrics, thereby adding robustness to the predictions. The southern part of the study area presents the better groundwater prospect, which is consistent with the geological and climatic setting. Outcomes lead to three major conclusions: (1) picking the best performers out of a large number of machine learning classifiers is recommended as a good methodological practice, (2) standard machine learning metrics should be complemented with additional hydrogeological indicators whenever possible and (3) variable scaling contributes to minimize expert bias.
Files in this item
| Files | Size | Format |
|
|---|---|---|---|
| preprocessing_gomez_HESS_2022.pdf | 868.5Kb |
|
| Files | Size | Format |
|
|---|---|---|---|
| preprocessing_gomez_HESS_2022.pdf | 868.5Kb |
|















