Back in 2020 we completed our Version 1 map of Ghana’s croplands for the year 2018 (see figure below), which is a proof of concept for the ability to create high resolution, annually updateable map of crop field boundaries over smallholder-dominated cropland sat a national scale. The methods include a novel procedure for converting daily, high resolution PlanetScope imagery into cloud-free seasonal composites, and a rigorous approach to training and validating a machine learning model, which are described in our working paper (which is nearly through the peer review process) along with the results. The field boundary maps can be viewed on our web map, which is undergoing a redesign that will be completed in a few weeks, as we transition data to a STAC-compliant server(developed by Azavea). The map data can be downloaded here (please read the paper before using to understand the accuracy of the dataset).
Since completing those Version 1 results, we have been working to improve the accuracy and precision of the field boundaries, so that we can release a more accurate Version 2 map. We have replaced the Random Forests model we used to identify cropland in the Version 1 map with a more accurate convolutional neural network, U-Net. We train this more powerful model not just to classify pixels into cropland and non-cropland classes, but also to distinguish between the edges and interiors of fields. This approach allows us to develop maps that delineate individual fields much more precisely, for about a tenth of the computational cost. The figure below shows the score maps(probabilities) for the field interior class resulting from a single model trained using 4,100 training labels collected across the entire country. Compared to the Version 1 maps, you can see that the individual fields are more clearly separated, and that the predictions are more confident.
This enables individual field boundaries to be more precisely mapped, as seen in the next figure, which compares Version 1 (middle row) and Version 2 (bottom row) field boundaries.
We are now refining the U-Net model, to improve its accuracy. Once that is completed (in the coming weeks), we will release our Version 2 field boundaries, which should be used in place of Version 1.
For more details about the evolution of our mapping approach, you can view our presentation at the Planet Explore 2021 conference this week, in the session on Global Agriculture: From Smallholder to Commercial Scale.
As of November 30, we have completed a full active learning process for each of the 16 mapping areas of interest covering Ghana. The map indicates the location of training labels within each AOI, color-coded according to which iteration they were selected in, as well as validation sites.
100 labels were collected for each iteration in each AOI. We are now using the final models to create the full cropland probability maps for each AOI and running the final segmentation. Stay tuned for a look at the full maps!
Africa, like most of the developing world, has a data problem. A shortage of available data ties the hands of researchers looking to perform the kind of high-quality studies that are needed to help devise useful policy. Remote sensing should provide a means to equalize some of the data disparity between the global North and South (after all, satellite trajectories don’t discriminate), but it takes more than a cache of imagery to fix the problem. These data require some fairly intense processing to become usable information, and that processing often doesn’t come cheap. This can be a problem for groups working in cash-strapped Africa.
Azavea believes that there is an opportunity for open source tools to make a difference in supporting this type of work. We were given a chance to test this idea when our VP of Research, Rob Emanuele, made the acquaintance of the principal investigator with the Agricultural Impacts Research Group at Clark University, Lyndon Estes. Lyndon was looking to expand upon a proof-of-concept project to detect smallholder agricultural fields (small farms usually supporting a single family) from WorldView-2 imagery.
Despite the promising early results of this research, scaling up to much larger inputs revealed the seams in their codebase. This is where Azavea comes in. With a generous grant from the Omidyar Network, Lyndon and his team were able to harness Azavea’s experience with big data raster processing and its open source tools like GeoTrellis to substantially clean up their prototype and get it ready for production.
The active learning mapping application
The system that our new collaborators had in mind consisted of a few components. While we would ultimately focus on data munging and machine learning over those data, the models would need an authoritative source of training data to work properly. This would be the primary focus of the group at Clark. This manifested in their work on a novel crowdsourcing platform for annotating satellite imagery with regions thought to be agricultural fields. A group of mappers, hired by SpatialCollective in Kenya, would label the agricultural fields in satellite imagery and, using Bayesian averaging, the mapping application would coalesce a set of mappers’ inputs (scored using a rigorous accuracy assessment protocol) to derive a high-quality training set for the machine learning process. This—spoiler alert—improved the accuracy of the final fitted models.
The other central methodological approach was to use an “active learning” process to refine the training set. Human mappers received unmapped areas that the machine learning model was the least certain about. These areas were then labeled and added to the training set.
Azavea was able to help the efforts on the mapping application by both creating a cloud free mosaic of Planet Labs’ PlanetScope data, and by using Raster Foundry to catalogue and serve out the images in true color and false color NDVI to the mapping application.
A workflow based on Cloud Optimized GeoTIFFs
For us, the task list was simple:
Create a cloud-free feature source from the PlanetScope imagery
Train and apply a machine learning model
Deliver the final classifications as imagery suitable for visualization
But this project ended up requiring us to reevaluate some of our core tools and establish a new approach to analysis. As the entire GIS community becomes more focused on applications that rely on cloud-native geospatial, there has been a push to adopt Cloud-Optimized GeoTIFFs, or COGs, as an imagery source. Rather than building specialized catalogs of image tiles—requiring an ETL process and, sometimes, custom file formats—we want to point to a COG (or a collection of COGs) and be able to query those files directly with minimal fuss. This improves portability and ease of use, without substantially affecting performance.
Azavea has been part of the push towards cloud-native geospatial and COG-oriented workflows. But it was only with the release of GeoTrellis 2.0 in August 2018 that COG become a core building block of the library. This project provided a helpful challenge to help shape and test the requisite tooling.
A cloud-free mosaic
The classification algorithm had simple demands: two base maps of satellite imagery with one covering the growing season (December–April) and one covering the off season (July–November). The master grid—specifying the size, extent, and resolution of the tiles—was defined, and each cell would need to have associated with it a pair of PlanetScope images, one for each season, but these data were still safely stowed on Planet’s servers. We wrote an application to mine Planet’s catalog for imagery that intersected an area of interest and associated with each tile the newest image for each season that covered the tile and passed a cloud-free test furnished by our friends at Clark. The end result was a minimal catalog of COGs that we could use as input to the classifier.
Finding fields with random forests
The consumer for this imagery is a random forest classifier following the prototype provided by our collaborators. But in order to build a system that scaled well and was easy to maintain, we wanted to shift over to a system that naturally bridged between the machine learning apparatus provided by SparkML and the raster-processing infrastructure of Geotrellis. Our comrades-in-arms at Astraea, fortunately, have this handled. We used their RasterFrames project to give us native access to Geotrellis tiles in a Spark DataFrame and also to help convert the pixel data into a form usable by SparkML. This would end up being less straightforward than it sounds, due to the COG-based catalog.
We settled on using rasterio to perform a resampling windowed read of the COGs and GeoPySpark to manage the conversion to a RasterFrame and from a RasterFrame to a COG catalog on the output side. The model required us to extend RasterFrames as well, incorporating a prototype implementation of focal operations into the library—an implementation that we’ll be working to move into the upstream version over the coming year. The bulk of the application itself was straightforward to write, and performed as expected, despite the number of custom solutions needed to work around the technical requirements for the project. This will serve as a solid foundation for the project that can grow and improve as Lyndon and his team push the work forward. (It also provides a template for improvements to Geotrellis and RasterFrames to make this kind of workflow more natural. Stay tuned!)
We concluded our part of this project in the Fall of 2018, and the Clark crew have been at work with their team of mappers in Africa supplying training data and refining their model. The results are coming in, and it seems promising, as can be seen by the following images:
In the future, the source for much of this work will be released to the public under the agroimpacts GH repo. Naturally, there are still problems to solve, but the project is now on a solid footing to start contributing real data to the study of agriculture in Africa.
This was a great opportunity to work with a talented team of researchers. We are glad the relationship was mutually beneficial. Says, Lyndon; “We are very excited to continue refining and applying our platform, which we intend to use to produce high accuracy, high resolution cropland maps for all of Ghana by the end of this year. We feel confident that we can achieve this goal, now that we have the highly scalable and robust machine learning pipeline developed by Azavea. Their effort in developing this went far above and beyond what we all initially imagined would be the scope of effort, and is thus a really critical contribution to our project’s success.”
We’d like to make a special thanks to the team of researchers and project collaborators that contributed to the project, including: Lei Song, Su Ye, Sitian Xiong, and Ron Eastman (Clark University); Dennis McRitchie; Ryan Avery and Kelly Caylor (UC Santa Barbara), SpatialCollective; Meridia. The project began at Princeton, where Lyndon worked with Stephanie Debats, at the time a PhD student in Civil & Environmental Engineering, and her supervisor, Professor Kelly Caylor (now at UC Santa Barbara).
In map after map after map, many African countries appear as a void, marked with a color that signifies not a percentage or a policy but merely offers an explanation: “no data available.”
Where numbers or cartography has left African countries behind, developers are stepping in with open-source tools that allow anyone from academics to your everyday smartphone user to improve maps of the continent.
One such project is Missing Maps, which invites people to use satellite imagery on mapping platform OpenStreetMap to fill out roads, buildings and other features in various parts of Africa that lack these markers. Active projects on åMissing Maps include everything from mapping houses in Malawi to marking roads in the Democratic Republic of Congo.
Missing Maps co-founder Ivan Gayton said humanitarian organizations could use the refined maps for development projects or to respond to future disasters or disease outbreaks.
“My intention was to get people like Doctors Without Borders and the Red Cross to rectify another injustice which is, that very basic public health infrastructure, which is a map, is not available in most of Africa,” Gayton told Quartz.
In July, Missing Maps launched MapSwipe, a smartphone app that helps whittle down the areas needed for mapping on OpenStreetMap by giving anyone with an iPhone or Android phone the ability to swipe through satellite images and indicate if they contain features like houses, roads or paths. These are then forwarded onto Missing Maps for precise marking of these features.
“Having a mobile application means having an order of magnitude more of people that are able to do it, larger volunteers, and it seems like they go faster,” Gayton said.
Missing Maps’s approach is similar to that of Mapping Africa, a project developed at Princeton University that pays users to look at satellite images and identify croplands.
“Humans are very good at recognizing patterns in noisy images,” Princeton research scientist Lyndon Estes said of Mapping Africa, which he started.
People who sign up on Amazon’s Mechanical Turk service are given satellite images of random patches of land across Africa and asked to determine if the land is being used for farming.
That data can be used to figure out how farmers are coping with the climate around them and with droughts, Estes said.
He and his collaborators have thus far used the imagery for their own agriculture-related studies in Zambia and Kenya, but Estes said it could be used for other purposes, such as determining ownership of disputed land.
“If you can establish that agriculture is being done in an area already, that provides a valuable source of information for establishing tenure rights or giving people informationthat they might not be able to get,” Estes said. “The ultimate ambition is to get large areas, if not all of sub-Saharan Africa mapped.”
Missing Map’s work is available to all on OpenStreetMap. Estes said he’s considering uses for Mapping Africa’s data beyond his own studies.
One outlet for Mapping Africa’s data could be AfricaMap, a Harvard University project where users can compile data on everything from ethnic groups to mother tongues to slave trade routes and layer it over a map of the continent.
“The basic idea is you can go in there, create a map and any of the 20,000 layers in the system, all of them would be available to load into your own map,” said project manager Ben Lewis.
Lewis helped put together AfricaMap at the request of Harvard professors who were looking for a way to get old geographic data of Africa online. The project was so successful it expanded into WorldMap for the entire globe.
Africa Map’s 20,000 users have put the service to work mapping hospitals and clinics in Ghana, or creating a layer that shows types of indigenous music across the continent.
Lewis plans to soon launch a program that will collate different mapping data layers from across the Internet, expanding the amount of layers on WorldMap into the hundreds of thousands, he said.
“The core reason, in my mind, to start with Africa was to, first of all, make the case that there is data,” Lewis said. “A lot of times, it’s just [that] people don’t know about it.”