The `%sql` magic command allows you to execute Spark SQL queries directly within your Python or Scala notebooks. It provides a seamless way to interact with your data using SQL syntax while maintaining the ability to use the results in subsequent code cells.
Basic Usage
You can use `%sql` for single-line queries or `%%sql` for multi-line blocks.
Single-Line Query
%sql SELECT * FROM range(10)Multi-Line Query
%%sql
SELECT
category,
COUNT(*) as count
FROM
sales_data
GROUP BY
categoryWorking with Results (`_sqldf`)
Every successful execution of a `%sql` command automatically captures the result in two ways:
- Python Variable - A Spark DataFrame named `_sqldf` is created in the notebook's namespace.
- Temporary View - A Spark temporary view named `_sqldf` is registered, allowing you to query the result in subsequent SQL cells.
Example: Using results in Python
# After running a %sql query
df = _sqldf
print(df.count())Example: Using results in SQL
%sql SELECT * FROM _sqldf WHERE count > 100Variable Interpolation and Parameters
Syntasa supports two ways to pass variables from your notebook into SQL queries.
Legacy Interpolation (`$var` or `${var}`)
You can substitute scalar variables (strings, integers, floats, and booleans) directly into your SQL.
- Python - `tbl = "my_table"` $\rightarrow$ `%sql SELECT * FROM $tbl`
- Scala - `val limit = 10` $\rightarrow$ `%sql SELECT * FROM range($limit)`
Named Parameter Markers (`:name`)
For injection-safe and type-correct binding (requires Spark 3.4+), use the colon prefix.
%sql SELECT * FROM users WHERE user_id = :uidThe system will look for a variable named `uid` in your notebook and bind it to the query.
Features and Behavior
Interactive Tables
Results are rendered as interactive HTML tables.
- Display Limit - By default, the UI displays up to 10,000 rows.
- Truncation - If a result exceeds the limit, a footer will notify you that the results are truncated and suggest using `_sqldf` to access the full dataset.
Multi-Statement Support
You can run multiple SQL statements in a single `%%sql` cell by separating them with semicolons (`;`).
%%sql
USE my_database;
CREATE OR REPLACE TEMP VIEW recent_logs AS SELECT * FROM logs WHERE date > '2023-01-01';
SELECT COUNT(*) FROM recent_logs;Comments
- SQL Comments - Standard `--` (line) and `/* ... */` (block) comments are supported.
- Python-style Comments - You can use `#` for line comments; the system will strip them before execution to prevent Spark syntax errors.
Cross-Language Support
- Scala Notebooks - Scala `val` bindings are automatically bridged. You can define a value in Scala and immediately use it in a `%sql` cell as a parameter or via interpolation.
- %run Support - If you use `%run` to include another notebook, any `%sql` cells in the child notebook will execute correctly, and any temporary views created will be available in the parent notebook.
Error Handling
- Syntax Errors - If your SQL has a syntax error, the system will display the error message from Spark without interrupting the notebook kernel.
- Unresolved Variables - If you use a `$var` that isn't defined, a warning will be displayed, and the literal string `$var` will be sent to Spark.
- Missing Parameters - If a `:name` marker is used but the variable is not found in the notebook, the query will not be sent to Spark, and an error will be shown.