Enhancing Geospatial Capabilities for Notebooks via External Runtimes – SYNTASA™

Syntasa provides unified and enhanced geospatial processing capabilities for Notebook Processes and Notebook Workspaces, enabling advanced spatial and spatio-temporal analytics at scale.

By integrating industry-standard geospatial libraries into supported external runtimes, Syntasa allows data scientists and engineers to perform complex spatial operations without manual dependency management or fragmented configurations.

Overview

Syntasa’s geospatial enhancement introduces a single runtime toggle that enables multiple geospatial engines simultaneously. This approach replaces legacy, library-specific configurations with a consistent and simplified setup.

Once enabled, geospatial functions are automatically registered within the Spark session and are immediately available for use in Spark SQL, PySpark, and Scala notebooks.

Supported Geospatial Libraries

The following libraries are pre-configured and available when geospatial support is enabled:

Apache Sedona
High-performance spatial data processing engine for large-scale geospatial workloads (formerly GeoSpark).
GeoMesa
A suite of tools for spatio-temporal indexing, querying, and analysis.
H3 Spark
Uber’s hierarchical hexagonal spatial indexing system for efficient spatial aggregation and clustering.

Enabling Geospatial Support

Geospatial support is available on AWS Kubernetes and Amazon EMR runtimes.

In newer environments, this capability may be enabled by default. However, it can be explicitly controlled through Runtime or Process-level configuration.

Configuration Key

To enable all supported geospatial libraries, add the following property:

Key	Value	Description
`syntasa.spark.geospatial.enable`	`True`	Enables Apache Sedona, GeoMesa, and H3 Spark support

Note:
This unified property replaces older, library-specific flags such as syntasa.spark.sedona.enable.

Using Geospatial Functions in Notebooks

When geospatial support is enabled:

No manual JAR imports are required
No custom Spark extensions need to be registered
Functions are automatically available within the Spark session

You can use these functions directly within Spark SQL or DataFrame expressions.

Python Example (PySpark)

from pyspark.sql.functions import expr

# 1. Sedona: Convert WKT to Geometry
cities_df = cities_df.withColumn(
    "geom",
    expr("Sedona_ST_GeomFromText(wkt)")
)

# 2. GeoMesa: Spatial containment query
result = spark.sql("""
    SELECT city, population
    FROM cities_table
    WHERE st_contains(
        st_makeBBOX(-10, 35, 30, 60),
        geom
    )
""")

# 3. H3: Convert latitude/longitude to H3 cell (resolution 5)
stores_h3 = stores_df.withColumn(
    "h3_index",
    expr("h3_from_latlng(lat, lon, 5)")
)

Scala Example

import org.apache.spark.sql.functions._

// Sedona: Geometry creation
val sedonaDF =
  df.withColumn("geom", expr("Sedona_ST_GeomFromText(wkt)"))

// H3: Cluster points by H3 cell
val clusters =
  df.withColumn("h3", expr("h3_from_latlng(lat, lon, 9)"))
    .groupBy("h3")
    .count()

Migration Guide

If you are upgrading from an older version of Syntasa or migrating from a manually configured Sedona or GeoMesa setup, follow these steps:

Remove Manual Imports
Explicit imports of Sedona or GeoMesa classes are no longer required for standard SQL or DataFrame operations.
Remove Legacy Configurations
Delete older Spark configurations such as:
- spark.sql.extensions
- Custom serializer or registrar settings used for manual registration
Update Function Usage
Ensure notebooks use the standardized function naming:
- Sedona_ST_* for Sedona
- st_* for GeoMesa
- h3_* for H3 Spark

Troubleshooting

“Function Not Found” Errors

If Spark reports an error such as h3_from_latlng is undefined:

Verify that syntasa.spark.geospatial.enable is set to True
Confirm the notebook is running on a supported external runtime (AWS Kubernetes or EMR)

Note:
Geospatial functions are not supported on local or internal notebook kernels.

Performance Considerations

H3 Resolution

H3 resolutions range from 0 to 15. Higher resolutions (for example, 13+) can generate a very large number of cells and significantly impact memory usage and join performance.

Spatial Joins

For large-scale spatial joins and containment queries, Apache Sedona is generally recommended due to its optimized spatial partitioning and indexing strategies.

Summary

By enabling unified geospatial support, Syntasa allows notebooks to seamlessly leverage Apache Sedona, GeoMesa, and H3 Spark without complex setup or maintenance overhead. This approach simplifies configuration, improves consistency across environments, and enables scalable spatial analytics directly within Notebook Processes and Workspaces.

{[{category.name}]}