Water access data processing
Data on drinking water coverage by region was acquired from the WHO/UNICEF JMP. The JMP acts as official custodian of global data on water supply, sanitation and hygiene2 and assimilates data from administrative data, national census and surveys for individual countries, and maintains a database that can be accessed online through their website. We accessed data tables for national and subnational drinking water service levels from https://washdata.org.
JMP datasets are not geographically linked to official boundary files. We joined the tables to GIS boundaries obtained from the following open-source collections: GADM (https://gadm.org), the Spatial Data Repository of the Demographic and Health Surveys Program of USAID (DHS) and the Global Data Lab of Radboud University (GDL)2,50,51,52,53. Subnational regions reported by the JMP are unstructured, representing various regional administrative levels (province, state, district and others).
The JMP national and subnational data were joined to GIS boundaries using a custom geoprocessing tool built in Python and ArcGIS 10. The tool joins the available JMP subnational-level survey data to the closest name match of regional boundary names from a merged stack of GADM (admin1, admin2 and admin3), DHS and GDL boundaries worldwide. The JMP national-level survey data is then joined to GADM national (admin0) boundaries for countries which have no subnational data available. Finally, the two boundary-joined datasets (national and subnational) are merged, processed and exported as a seamless global fabric of water-stressed-population data at the highest respective spatial resolutions available (Fig. 1a).
JMP does not report the breakdown between the SMDW and basic service level within subnational regions, and instead reports a combined category called ‘at least basic’ (ALB). To estimate the SMDW values in subnational regions, a simple cross-multiplication was performed using the splits at the national level:
where ALBnational, ALBsubnational and SMDWnational are known values.
Validation of the cross-estimation of share of SMDW from ALB for subnational regions was conducted on a reference dataset of nationally representative household surveys that collected data on all criteria for SMDW54, shown in Extended Data Fig. 2. We report regression results of R2 = 0.87 and a standard error of 3.67, indicating a bias which over-reports SMDW share and a probable underestimate of people living without SMDW in our study. This discrepancy comes from JMP calculations of SMDW that rely on the minimum value of multiple drinking water service criteria (…….