In modern data environments, security is not just about who can log in to an application (Control Plane), but specifically what data they can touch during computation (Data Plane). Syntasa’s Data Plane Access Control feature bridges this gap by ensuring that the permissions you define in the UI—through Role-Based Access Control (RBAC) and Sharing—are physically enforced at the infrastructure level in AWS and GCP.
This guide explains how Syntasa translates high-level sharing settings into “Least Privilege” security policies for Spark jobs and Notebook sessions.
The Foundation: RBAC and Sharing
Before a single line of code is executed, access is determined by the platform’s governance layer:
RBAC (Role-Based Access Control): Defines what a user can do (e.g., “Can I create a Notebook?” or “Can I launch a Spark Cluster?”).
Sharing Options: Defines which specific objects a user can see. In Syntasa, data access is primarily governed by Event Stores.
- If an Event Store is shared with you as a Viewer, you have read-only access.
- If shared as an Editor, you have read/write access.
The Rule: If an Event Store is not shared with you, it effectively does not exist for your compute sessions.
The Enforcement: Data Plane Access Control
When you launch a Notebook or a Spark Job, the Syntasa Auth Service performs a real-time “Security Handshake” to enforce your permissions.
How it works in AWS
Syntasa uses AWS IAM Session Policies to scope your access dynamically:
- Policy Generation: The Auth Service looks at all Event Stores shared with you. It generates a scoped JSON policy that explicitly lists only the S3 paths and Glue databases associated with those Event Stores.
- STS AssumeRole: The Runtime or Notebook service takes its base IAM role and “narrows” it using the generated policy via the AWS Security Token Service (STS).
- Credential Injection: The resulting temporary, short-lived credentials are injected into your Spark session. Even if you try to manually access an S3 bucket that the base cluster role has access to, AWS will deny the request if it isn’t in your specific session policy.
How it works in GCP
Similar logic is applied using GCP Service Account Impersonation and IAM Conditions. The system ensures the temporary OAuth2 tokens generated for your session are restricted to the BigQuery datasets and GCS buckets defined in your Syntasa workspace.
Credential Management & Store
To support secure data plane access, Syntasa utilizes a centralized Credential Store. This system manages the “secrets” required to interact with external data sources and cloud providers.
- Secure Storage: Credentials (API keys, Service Account JSONs, Secret Keys) are encrypted at rest.
- Scoped Retrieval: When a Spark job runs, the Credential Store provides the necessary secrets to the execution engine only if the user (or the job owner) has the appropriate RBAC permissions to use that credential.
- Lifecycle Management: Administrators can rotate credentials in the Store without breaking existing pipelines, as the jobs reference the “Credential Object” rather than hardcoded strings.
Application in Real-World Scenarios
Scenario A: Collaborative Notebooks
User A creates a Notebook and shares it with User B.
- The Access Logic: User B can see the code. However, when User B clicks “Start” on the Spark Runtime, the Data Plane Access feature checks User B’s Event Store assignments.
- The Result: If User A used data from “Event Store X” which is not shared with User B, User B’s Spark session will fail to read that data, ensuring data privacy is maintained even when code is shared.
Scenario B: Scheduled Production Jobs
Previously, scheduled jobs often ran under a “System” user with broad permissions.
- The Access Logic: Scheduled jobs now resolve the identity of the Job Owner.
- The Result: The job is restricted by the same IAM Session Policies as a manual run. If the Job Owner’s access to a specific S3 bucket is revoked, the scheduled job will automatically lose access during its next run.
Summary of Benefits
- Zero Trust Architecture: No user has “blanket” access to the data lake. Access is always temporary and scoped.
- Simplified Governance: Admins manage access in one place (Syntasa UI) and the platform handles the complex cloud IAM configurations automatically.
- Auditability: Every data access event in AWS/GCP is tied to a specific Syntasa username in the cloud provider’s audit logs (CloudTrail/Stackdriver).