Search by AOI

This tutorial shows how to get data from an area of interested provided in a geojson file.

Requirements for this demo:

To start, include the packages that we are going to need:

from pystac_client import Client
import geopandas as gpd
import rasterio
import requests
import json

Then, let´s create a connection to the archive API

auth_payload = json.dumps({
  "client_id": "<API_CREDENTIAL_ID>",
  "client_secret": "<API_CREDENTIAL_SECRET>",
  "audience": "",
  "grant_type": "client_credentials"

auth_headers = {
  'Content-Type': 'application/json'

auth_response = requests.request("POST", "", headers=auth_headers, data=auth_payload)

access_token = auth_response.json()["access_token"]

headers = {"authorizationToken": f"Bearer {access_token}"}

archive ="", headers=headers)

Replacing and with your actual values. Contact our Customer Succes teams for instructions on how to obtain your credentials.

Now, we are going to search by AOI so we need the AOI defined as a GeoJson polygon. This can be done in the code directly like this:

# The geojson defining the AOI
aoi = {
        "coordinates": [
        "type": "Polygon"

Or we can directly import a GeoJson file into memory as follows:

with open("ancilliary_files/AOI.geojson") as fp:
    aoi = json.loads(["geometry"]    

Using any of those methods, the variable aoi now contains the coordinates of the vertices of the poylgon of our AOI. Now it is time to query to the archive:

items =    
    intersects = aoi,

By using the item_colletion() method we get all the elements as a pystac object. If they are not needed we can use item_collection_as_dict() to get a Python dict instead.

Now, items is a FeatureCollection that contains all the matching items. Items can be accesed with the bracket operator. For example:

> '20230828_132930_SN28_RR_VISUAL_0_3_6_31N_426_4580'

We can transform the FeatureCollection into a GeoPandas dataframe for example:

gdf = gpd.GeoDataFrame.from_features(items, crs=f"epsg:{items[0].properties['proj:epsg']}" )
gdf['id'] = [ for x in items]
gdf.set_index('id', inplace=True)
gdf['capture_date'] = [gpd.pd.to_datetime(f"{x[0]}T{x[1]}") for x in gdf.index.str.split("_")]

Given that we searched on an area in a long period of time, it is possible that the we have data from multiple dates, satellites a.k.a captures. If we look into the data, we can for example group it by capture using the satl:outcome_id property and count how many tiles per capture matched our AOI.

gdf.groupby([pd.Grouper(key="capture_date", freq="D"),"satl:outcome_id"]).aggregate(tiles=("platform","count"))
capture_date satl:outcome_id                                   tiles                    
2023-08-05   a914b272-3e59-4551-a690-8c4ff0ad8147--157143      6
2023-08-08   2328a86c-892a-4f21-a95e-45b0bad8128d--157143      6
2023-08-12   9487f2f4-1154-4083-a138-2c998340a794--157143      6
2023-08-17   2f39d9a4-0953-4919-9ed3-286f3195f29f--157271      3
2023-08-19   de325a55-1b3c-44d9-9e28-1f5ab323a5ae--157143      6
2023-08-21   6e3679db-6d81-43bc-9f37-0fcafb4e6cd0--157143      6
2023-08-27   54eddfb1-93ab-41b3-83ae-9d11479a53b3--157143      6
2023-08-28   59bd62cf-9dee-4604-90f6-21c4bedd83e1--157143      6

At this stage, we can for example filter those captures with a certain cloud cover, or discard data that is not useful for the analysis. In this example we will go directly to the final stage: mosaic all the tiles for each capture together and save them as a GeoTiff file on the disk to further exploration:

Let's first defain a mosaicing function. It will need rasterio

def mosaic_from_items(items, outfile, asset_name="analytic"):

    src_files_to_mosaic = []
    for item in items:
        url = item.assets[asset_name].href                
        src =        

    mosaic, out_trans = merge(src_files_to_mosaic, indexes=[1,2,3])

    meta = {"driver": "GTiff",
    "height": mosaic.shape[1],
    "width": mosaic.shape[2],
    "transform": out_trans,
    "count": mosaic.shape[0],
    "dtype": mosaic.dtype

    with, "w",  **meta ) as dest:

    return mosaic

And then for each capture, call the mosaicing function

for (capture_date, outcome_id), group_df in gdf.groupby([pd.Grouper(key="capture_date", freq="D"),"satl:outcome_id"]):
    subItems = ItemCollection([x for x in items if in group_df.index])    
    fname = capture_date.strftime("%Y%m%d")+".tiff"   
    mosaic_from_items(subItems, fname)    

The result is a GeoTiff file on the specified folder, with the date of each capture as the filename.

Last update: 2024-04-17