Using Zeppelin Notebooks in Syntasa – SYNTASA™

Article Summary

Create a Zeppelin Runtime (if one does not already exist)
Launch Zeppelin
Create a Zeppelin Notebook (Default & Big Query)

Overview

Establishing a Runtime that is compatible with running Zeppelin Notebooks is a one-time activity. Please verify a Zeppelin-compatible Runtime does not already exist before creating and performing the following steps.

Use of Zeppelin requires a cluster to be up and running, there are no cluster size requirements to make use of Zeppelin, so best practice is to use a very minimal amount of resources. We typically use a two-worker cluster using n1-highmem-2 nodes on GCP and use the equivalent on AWS, costing roughly $0.15 per node per hour. In this configuration, the cost of operation would be in the range of $0.40-$0.50 per hour.

Also, we highly recommend setting a max uptime for the cluster, so if the cluster does not get shut down by the user, it will automatically shut down after the specified max uptime has been reached.

Creating a Zeppelin Runtime

Please follow the below steps for creating a Zeppelin-compatible Runtime. If the Runtimes already exist, please move to the next section.

Create a new Runtime by copying the default Spark cluster.
Turn OFF the ‘Terminate on Completion’ button and turn ON the ‘Zeppelin’ button.
Update the cluster base name to ‘zeppelin – cluster’.
Update the master and worker instances type to a tiny instance type (Recommended: n1-highmem-2).
Update HDD to 100 GB.
Update the Worker Instance count to 2 (minimum workers require for the cluster in GCP)
Update the Idle Time per your requirement (suggested 5hrs - 18000 sec)
Update spark configurations as below, remove all other configs.
- Executor memory: 2G
- Driver memory: 2G
- Cores: 1
Click on save to create the Runtime.
Following is the sample Zeppelin Runtime screenshot:

Launch a Zeppelin Notebook

Turn on interactive mode
Create or modify runtime with the Zeppelin runtime.
Select the Zeppelin link after the cluster has been created (could take as long as a few minutes)

Navigate to the Zeppelin home page using the link from the above step.
steps Update GCP Project Details in Interpreter Settings.
- Click on the ‘anonymous’ user on the rightmost corner of the page and select ‘Interpreter’.
- Search for BigQuery interpreter
- Edit the BigQuery interpreter properties and add the GCP project id. Save the changes.

Create a Zeppelin Notebook

Select the ‘Notebook’ option from the main menu
Create a New notebook or select an existing notebook from the list.

Create a New Notebook:
- Click on ‘Create new note’ to start a new notebook.
- Name your notebook and select Default (Spark) or BigQuery as the default interpreter. Hit on Create button.

The following screen will appear, here you can start writing queries and execute them.

Add '%pyspark' if you're using spark or ‘#standardSQL’ if you chose Big Query as your default editor as a tag before your query. Hit on the run button or shift+enter to execute the query.

Add new paragraphs to run more queries.
Use a markdown interpreter to write comments. Add ‘%md’ as the first line to use the markdown interpreter. Below is the sample screenshot.

{[{category.name}]}

Article Summary

Overview

Creating a Zeppelin Runtime

Launch a Zeppelin Notebook

Create a Zeppelin Notebook