Superset's SQL feature is a powerful component designed to streamline data exploration and analysis. It includes the SQL Lab for interactive querying, SQL Query for crafting and executing queries, and Query History for tracking and managing past queries. These tools collectively enhance the efficiency and flexibility of working with large datasets, making Superset a preferred choice for data analysts and scientists.
Clicking on the SQL link in the upper menu will provide a drop-down of available selections – SQL Lab, Saved Queries, and Query History.
- SQL Lab – this interface allows the user to explore the existing database connections to find databases and tables to query and save as new datasets.
- Saved Queries – this section provides a list of the previously saved SQL queries in a searchable, sortable format.
- Query History – this section provides a log of all the SQL queries performed by date and time as well as the SQL query itself and the ability to launch the SQL Lab editor using that query.
Exploring the Dataset with SQL Lab
SQL Lab is the workspace for users to manage all aspects of SQL usage in the Superset environment. In SQL Lab, you can explore databases to discover available datasets and tables or create a new table or view using the SQL editor.
Key features include:
- Write custom SQL queries on any database you've connected
- Browse database metadata: tables, columns, indexes, and partitions.
- A multi-tab environment to work on multiple queries at a time.
- Support for long-running queries.
- Uses the Celery distributed queue to dispatch query handling to workers
- A search engine to find queries executed in the past
- Supports templating using the Jinja templating language which allows for using macros in your
SQL code
This powerful interface enables users to explore data without needing to download it, access the Athena interface directly via AWS, have an SQL IDE on their workstation, or install additional software. This feature is part of the Syntasa ecosystem, requiring no additional cost or seat licenses. We will walk through a couple of examples.
SQL Editor Sample
Steps to configure the initial view:
- Select the DATABASE to query – for this example, we are using the Athena connection
- Select the SCHEMA – this is the database name from Athena to query against.
- Select the table to view in the SEE TABLE SCHEMA section.
Now, using the SQL editor, data can be explored or exposed, filtered to very specific requirements, or pared down to create project-specific datasets. After that, charts or new datasets can be saved and created through the editor.
Here’s an example:
We are looking at a dataset that contains an event_type element in the data, so we will create a simple SQL query to filter the dataset just for events that contain a specific event type:
SELECT * from brian_v720_a1_prod.brian_acled_test_file
WHERE event_type = ‘Protests’
The results in the SQL Lab editor look like this:
You will notice the query appears in the RESULTS section as well as a count of the rows in the query result. From here there are a few options to proceed:
- DOWNLOAD TO CSV – Download the results to a CSV file for other use, passing them to an analyst, etc.
- COPY TO CLIPBOARD – This allows the data results to be copied and pasted into another utility like Excel, etc. (remember there is a file limit set to consider!)
- CREATE CHART – this option provides a quick way to create a new chart from the queried data without the need to change utilities or software platforms. The user is taken to the Create Chart Screen where it loads the query results into a table and allows the option to adjust, edit, or add columns, and create a dataset by saving it as a new dataset name.
Clicking the Save button will open the Save Chart pop-up with an option to save the chart & go to the selected dashboard.
Saving the Query
Superset SQL Lab allows users to save their SQL queries for later use. This is particularly useful for complex queries that may need to be run frequently or shared with team members. Clicking on the light blue ‘SAVE’ button itself allows the user to save the specific query for future use. (Refer to screenshot below)
Once you click the Save button to save the query, you will be redirected to the 'Saved queries' page where you can see all the saved queries.
Saving the Query as Dataset
You can either save the results of a query as a new dataset or update an existing dataset with the query results. This dataset can then be used to create visualizations and dashboards in Superset, enabling a seamless transition from data exploration to data visualization.
Here are the steps to save the query results as a new dataset:
- After running the query, click on the Save Dropdown icon and select the option Save Dataset. This will open the Save or Overwrite Dataset pop-up.
- Select the Option Save as new and enter the name of the new dataset.
- Click the Save & Explore button.
You can select the option Overwrite existing if you want to overwrite the existing dataset instead of creating a new one.
The newly created dataset can be accessed from the Datasets screen as shown below:
Remember, since this dataset is created from a query on the original dataset, it is considered a virtual dataset rather than a physical dataset. This distinction is indicated by the Type field, with virtual datasets shown as blue tiles and physical datasets shown as black and white tiles. A virtual dataset is a view of data created using SQL Lab, allowing you to use JOINs and other SQL operations to build your dataset.