Humanitarian data analysis
Context and problem statement
Mapping activities provide information about certain situations in countries (especially humanitarian actions). Validators are making important actions for validation of mapping activities. These two actions analysis brought together in the right context can help with providing important missing information for crisis management and humanitarian action planning. In general, mappers provide any geolocalised information about some activities (like genital mutilation) given activities proposed by organisations. Validators usually serve an essential role in ensuring data quality and providing feedback to new mappers. To learn more about these activities visit the website of Humanitarian Open street maps.
Our research question is based on the observations of activities we consider across different countries and organizations, who participate in mapping. In particular we are interested to understand, how can we explain the variation in mapping and validation rate of the information mapped?
Data description
We have worked with two main data files and some axillary ones. This first dataset contained information on the overall monthly activity based on sessions in the HOT Tasking Manager. 2 project activity sessions per month Another geojson data file contains the monthly activity per project. The project location is given as the centroid of the area of interest. A project will be present multiple times in this data set, if mapping happening in more than one month.
Data has been accessed through https://humstats.heigit.org/ and is aggregated data from the Humanitarian data resources.
Figure. The amount of sessions per month is tied to several factors. Big events like mapathons or disaster activations can be a huge factor for more contributions. But the actual amount of objects to be mapped can vary a lot between sessions. Comparing this graph with the one on total working hours can give further insights (source https://humstats.heigit.org/).
Data analysis
The cumulative mapping sessions in hours are shown in Figure. We start to analyze the dataset by looking at activities in different countries. There is quite strong variability in these activities.
The main results about analysis of indicators and indices for the data:
- Distribution of sessions mapped shows clear leaders in the number of mapping and validation activities e.g. Nigeria (see github links for plots)
- Distribution of organizations (located in different and across countries) one can also see clear leaders in mapping activities leading (e.g. Red Cross).
Globally for making sense out of data of mapping activities the correlation analysis and significance testing is done. One needs to remove first very correlated variables, so that in correlation matrix we do not see only high correlation between sessions mapped and sessions validated, which are expected to be correlated by design.
To find further model insights for understanding the underlying indices we need to find independent variables.
Geospatial analysis of the dataset is done using kepler.gl.
Figure. Visualisation of the mapped activities from the geojson file downloaded from https://humstats.heigit.org/ Visualisation is done using Kepler.gl web application.
The project is based on two projects, one of them has been part of Digital Master project (project by Moussa Sidibe in 2023) and full code and repositories provided are below.
References and links:
- https://humstats.heigit.org/
- www.kepler.gl
- www.hotosm.org
- Herfort, B., Lautenbach, S., Porto de Albuquerque, J. et al. The evolution of humanitarian mapping within the OpenStreetMap community. Sci Rep 11, 3037 (2021). https://doi.org/10.1038/s41598-021-82404-z
- Herfort, Benjamin; Raifer, Martin; Reinmuth, Marcel; Stier, Jochen; Klerings, Alina (2020): Evolution of humanitarian mapping within the OpenStreetMap Community. Talk at the State of the Map conference 2020, Cape Town.
Github repositories: