For platform administrators configuring Service Accounts on the Syntasa platform.
Companion pages:
- Service Accounts for Runtimes and Notebooks: User Guide
- Service Account Isolation from Syntasa System Folders: Design Spec
When a Service Account (SA) is attached to a Notebook Workspace or Runtime Template, Syntasa uses its credentials for customer-owned data and AWS services, and for Spark event history written under the Syntasa system bucket. Other Syntasa-managed infrastructure (JAR/config staging, workspace metadata, log uploads, etc.) is handled by the cluster's own IAM role and does not require permissions on the SA.
This document defines the IAM policies you should attach to your SA's IAM User or IAM Role.
Layered access control - read this first
The SA needs IAM permission to call AWS services (S3, Glue, etc.) — without these, AWS rejects the calls before any Syntasa code runs. So the SA must have:
S3 access to the customer's data bucket
S3 access to
syn-spark-history/on the Syntasa system bucket (Spark drivers write event logs here)Glue Catalog read access for any Spark SQL (
SHOW DATABASES,SELECT FROM …) to workGlue Catalog write access if users create or alter tables
On top of that, Syntasa Authorization controls which databases and tables each individual user can actually see and modify. Syntasa Authz is the user-level access control; the IAM policy is the identity-level access control for the SA itself.
Because user-level filtering is handled by Syntasa Authz, the recommended Glue scope is full catalog (*) — broad at the AWS layer, fine-grained at the Syntasa layer. If your security policy requires belt-and-suspenders, you can additionally scope the Glue Resource ARNs to specific databases at the AWS layer (see Scenario C).
Quick reference
If your users will… |
Include this block |
Required? |
|---|---|---|
Read or write S3 data in the customer bucket |
(1) Customer Data S3 |
Required |
Run any Spark job with event logging enabled (default for all notebooks and batch runs) |
(2) Spark Event History S3 |
Required |
Run Spark SQL ( |
(3) Glue Catalog — Read |
Required |
Create / alter Glue tables and partitions |
(4) Glue Catalog — Write |
Required if users create / alter tables |
Run Athena queries directly from notebooks |
(5) Athena Query |
Required if users use Athena |
Identity policy template
Replace placeholders with your actual values:
<CUSTOMER_DATA_BUCKET>— the S3 bucket holding customer data<SYNTASA_SYSTEM_BUCKET>— the Syntasa-owned system bucket (the value ofKERNEL_SYN_BUCKET/NOTEBOOK_SERVICE_STORAGE_BUCKET; ask your Syntasa platform team if unsure)<REGION>— AWS region (e.g.us-east-1)<ACCOUNT_ID>— your AWS account ID<WORKGROUP_NAME>— Athena workgroup (only block 5)
Glue Resource ARNs default to
*(full catalog). Syntasa Authorization handles which databases / tables each user can actually see. If you want IAM-level scoping in addition, replace*with specific database names (see Scenario C).
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CustomerDataS3",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::<CUSTOMER_DATA_BUCKET>",
"arn:aws:s3:::<CUSTOMER_DATA_BUCKET>/*"
]
},
{
"Sid": "SparkEventHistoryS3List",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<SYNTASA_SYSTEM_BUCKET>",
"Condition": {
"StringLike": {
"s3:prefix": [
"syn-spark-history",
"syn-spark-history/*"
]
}
}
},
{
"Sid": "SparkEventHistoryS3Objects",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::<SYNTASA_SYSTEM_BUCKET>/syn-spark-history/*"
},
{
"Sid": "GlueCatalogRead",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions",
"glue:SearchTables"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/*/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:userDefinedFunction/*/*"
]
},
{
"Sid": "GlueCatalogWrite",
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:DeleteDatabase",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:CreatePartition",
"glue:UpdatePartition",
"glue:DeletePartition",
"glue:BatchCreatePartition",
"glue:BatchUpdatePartition",
"glue:BatchDeletePartition",
"glue:CreateUserDefinedFunction",
"glue:UpdateUserDefinedFunction",
"glue:DeleteUserDefinedFunction"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/*/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:userDefinedFunction/*/*"
]
},
{
"Sid": "AthenaQuery",
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
"athena:ListQueryExecutions",
"athena:GetWorkGroup"
],
"Resource": [
"arn:aws:athena:<REGION>:<ACCOUNT_ID>:workgroup/<WORKGROUP_NAME>"
]
}
]
}
Remove GlueCatalogWrite if users only read tables. Remove AthenaQuery if users don't run Athena queries directly. The two SparkEventHistory blocks may be omitted only if Spark event logging is disabled platform-wide at the runtime template level.
Why the SA needs write access to syn-spark-history/: when a Spark job is launched from a notebook or batch runtime with an SA attached, the driver writes its event log to s3://<SYNTASA_SYSTEM_BUCKET>/syn-spark-history/ using the SA's credentials. Without this block the driver fails at SparkContext init with AccessDenied on PutObject. The path is fixed; only the bucket name varies per deployment.
Trust policy (for IAM Role SAs only)
If your SA is an IAM Role (recommended over IAM User — see the User Guide), attach this trust policy to the role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<SYNTASA_ACCOUNT_ID>:role/<SYNTASA_INFRA_ROLE>"
},
"Action": "sts:AssumeRole"
}
]
}
Replace <SYNTASA_ACCOUNT_ID> and <SYNTASA_INFRA_ROLE> with the values your Syntasa platform team provides.
Also required on the role:
-
MaxSessionDuration≥43200seconds (12 hours)
aws iam update-role --role-name <YourRoleName> --max-session-duration 43200
Optional (recommended for cross-account): require an external ID for the AssumeRole call so it cannot be triggered without Syntasa-side coordination. Ask your Syntasa platform team for the external ID value, then add a condition:
"Condition": {
"StringEquals": {
"sts:ExternalId": "<EXTERNAL_ID>"
}
}
What the SA does NOT need
The following permissions are not required on the SA:
Permission |
Why not |
|---|---|
|
The cluster's own IAM role handles all Syntasa-internal storage except Spark event history. Granting these to the SA is harmless but unnecessary. |
|
Cluster control-plane operations run as the platform's IAM role. |
|
The Syntasa infra role assumes your SA, not the other way around. |
|
Never required. |
Common scenarios
Scenario A — Read-only notebook user (recommended default)
User runs Spark SQL against existing Glue tables; does not create or alter tables.
Include: (1) Customer Data S3 + (2) Spark Event History S3 + (3) Glue Catalog Read. Drop (4) Glue Catalog Write since the user doesn't need it.
Scenario B — Read / write notebook user
User reads from one bucket, writes results to another, and creates / alters Glue tables.
Include: (1) Customer Data S3 — list all relevant bucket ARNs in Resource, (2) Spark Event History S3, (3) Glue Catalog Read, (4) Glue Catalog Write.
Scenario C — Defense in depth (database-scoped Glue)
Same as A or B above, but your security policy requires AWS-layer scoping in addition to Syntasa Authz user-level filtering.
Replace the database/* and table/*/* parts of Glue Resource ARNs with the specific database name(s) — e.g. database/marketing and table/marketing/*. Add a separate statement per database if you have several.
Scenario D — Athena power user
User runs Athena queries directly (not via Spark) from notebooks.
Include: (1), (2), (3), optionally (4), plus (5) Athena Query. Athena writes results to a workgroup-configured output location — make sure (1) covers that path too.
Scenario E — Cross-account data bucket
Customer data lives in a different AWS account than the SA.
Cross-account S3 access requires permission on both sides:
SA identity policy — the same (1) Customer Data S3 block, with the
ResourceARN using the data account's bucket name (the account ID is implicit in the ARN).-
Bucket policy on the data bucket — must explicitly
Allowthe SA principal. Add this to the data bucket's bucket policy:{ "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SA_ACCOUNT_ID>:role/<SA_ROLE_NAME>" }, "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<CUSTOMER_DATA_BUCKET>", "arn:aws:s3:::<CUSTOMER_DATA_BUCKET>/*" ] }For IAM User SAs, use
arn:aws:iam::<SA_ACCOUNT_ID>:user/<SA_USERNAME>as the principal.
Validation checklist
After attaching the policy, verify the SA works end-to-end:
# In a Syntasa notebook attached to this SA:
spark.sql("SHOW DATABASES").show() # Syntasa Authz returns user-visible databases
spark.read.parquet("s3://<CUSTOMER_BUCKET>/some-path/").show() # exercises CustomerDataS3
spark.sql("CREATE TABLE my_db.test AS SELECT 1").collect() # write through Syntasa Authz to user's DB
# Then check the Spark History Server UI for the run — confirms SparkEventHistoryS3 worked.
If a test fails:
Error |
Likely cause |
|---|---|
|
Missing or wrong bucket ARN in block (1) |
|
Block (2) missing — the SA can't write Spark event logs. Add the two |
|
This should not happen — the cluster identity owns the other Syntasa prefixes. Contact Syntasa support if you see this. |
|
Block (3) missing, or its Resource ARN doesn't cover the database being queried (only relevant if you scoped it down per Scenario C — widen the ARN or add the missing database) |
|
Block (4) missing — add it if users create / alter tables |
|
Set role's |
|
Trust policy missing the Syntasa infra role principal |
Note on user-level access: the IAM policy controls what the SA can call. Which databases and tables a user can actually see and modify is governed by Syntasa Authorization, which filters on top of the SA's IAM-level access. Granting the SA database/* in Glue does not expose all databases to all users.