Syntasa provides unified and enhanced geospatial processing capabilities for Notebook Processes and Notebook Workspaces, enabling advanced spatial and spatio-temporal analytics at scale.
By integrating industry-standard geospatial libraries into supported external runtimes, Syntasa allows data scientists and engineers to perform complex spatial operations without manual dependency management or fragmented configurations.
Overview
Syntasa’s geospatial enhancement introduces a single runtime toggle that enables multiple geospatial engines simultaneously. This approach replaces legacy, library-specific configurations with a consistent and simplified setup.
Once enabled, geospatial functions are automatically registered within the Spark session and are immediately available for use in Spark SQL, PySpark, and Scala notebooks.
Supported Geospatial Libraries
The following libraries are pre-configured and available when geospatial support is enabled:
- Apache Sedona
High-performance spatial data processing engine for large-scale geospatial workloads (formerly GeoSpark). - GeoMesa
A suite of tools for spatio-temporal indexing, querying, and analysis. - H3 Spark
Uber’s hierarchical hexagonal spatial indexing system for efficient spatial aggregation and clustering.
Enabling Geospatial Support
Geospatial support is available on AWS Kubernetes and Amazon EMR runtimes.
In newer environments, this capability may be enabled by default. However, it can be explicitly controlled through Runtime or Process-level configuration.
Configuration Key
To enable all supported geospatial libraries, add the following property:
| Key | Value | Description |
|---|---|---|
syntasa.spark.geospatial.enable | True | Enables Apache Sedona, GeoMesa, and H3 Spark support |
Note:
This unified property replaces older, library-specific flags such as syntasa.spark.sedona.enable.
Using Geospatial Functions in Notebooks
When geospatial support is enabled:
- No manual JAR imports are required
- No custom Spark extensions need to be registered
- Functions are automatically available within the Spark session
You can use these functions directly within Spark SQL or DataFrame expressions.
Python Example (PySpark)
from pyspark.sql.functions import expr
# 1. Sedona: Convert WKT to Geometry
cities_df = cities_df.withColumn(
"geom",
expr("Sedona_ST_GeomFromText(wkt)")
)
# 2. GeoMesa: Spatial containment query
result = spark.sql("""
SELECT city, population
FROM cities_table
WHERE st_contains(
st_makeBBOX(-10, 35, 30, 60),
geom
)
""")
# 3. H3: Convert latitude/longitude to H3 cell (resolution 5)
stores_h3 = stores_df.withColumn(
"h3_index",
expr("h3_from_latlng(lat, lon, 5)")
)Scala Example
import org.apache.spark.sql.functions._
// Sedona: Geometry creation
val sedonaDF =
df.withColumn("geom", expr("Sedona_ST_GeomFromText(wkt)"))
// H3: Cluster points by H3 cell
val clusters =
df.withColumn("h3", expr("h3_from_latlng(lat, lon, 9)"))
.groupBy("h3")
.count()Migration Guide
If you are upgrading from an older version of Syntasa or migrating from a manually configured Sedona or GeoMesa setup, follow these steps:
- Remove Manual Imports
Explicit imports of Sedona or GeoMesa classes are no longer required for standard SQL or DataFrame operations. - Remove Legacy Configurations
Delete older Spark configurations such as:spark.sql.extensions- Custom serializer or registrar settings used for manual registration
- Update Function Usage
Ensure notebooks use the standardized function naming:Sedona_ST_*for Sedonast_*for GeoMesah3_*for H3 Spark
Troubleshooting
“Function Not Found” Errors
If Spark reports an error such as h3_from_latlng is undefined:
- Verify that
syntasa.spark.geospatial.enableis set toTrue - Confirm the notebook is running on a supported external runtime (AWS Kubernetes or EMR)
Note:
Geospatial functions are not supported on local or internal notebook kernels.
Performance Considerations
H3 Resolution
H3 resolutions range from 0 to 15. Higher resolutions (for example, 13+) can generate a very large number of cells and significantly impact memory usage and join performance.
Spatial Joins
For large-scale spatial joins and containment queries, Apache Sedona is generally recommended due to its optimized spatial partitioning and indexing strategies.
Summary
By enabling unified geospatial support, Syntasa allows notebooks to seamlessly leverage Apache Sedona, GeoMesa, and H3 Spark without complex setup or maintenance overhead. This approach simplifies configuration, improves consistency across environments, and enables scalable spatial analytics directly within Notebook Processes and Workspaces.