Zeppelin Notebooks – SYNTASA™

It is possible to create a SPARK runtime with Apache Zeppelin enabled.

Follow the below are the steps:

Go to Runtime and click the plus icon ().
Insert your configurations for the SPARK runtime:
- Turn on the Zeppelin switch for your runtime (turn off "Termination on completion" so that you get more time to experiment with the notebook).
- Ensure "Idle Time Deletion Interval In secs" is set, the default recommended setting is 600.
Run any job from GCP QA with this created runtime. Example - BQ Loader job or an AA Loader job.
When a cluster is spun up with your runtime, go to STEP LOGS.
Click on details in the INFRASTRUCTURE CREATE CLUSTER step. A window with Zeppelin URL should be displayed.
Copy-paste the Zeppelin URL in your local browser.
Access to all the data from the Zeppelin notebook is now available.

Available Interpreters

Spark (version 2+)
BigQuery
Python
.. many more

Configuring Zeppelin with BigQuery

BigQuery requires a minor set up of PROJECT_ID when creating a new Zeppelin notebook.

Using Zeppelin Notebook with Spark

There are no pre-requisites to use Spark with Zeppelin. When you create a new notebook, you can attach a Spark interpreter to it and start writing your Spark code.

Refer to the below step-by-step guide for using Spark runtime in your Zeppelin notebook:

Notice the URL in your browser - The last section of the URL has a unique code - for e.g 2ECQ4APT9 - this is the name of the folder where JSON format of the notebook is stored.

All notebooks are stored at "PROJECT-ID/syn-cluster-config/zeppelin-notebooks" PATH:

spark.version
// prints the spark version
// res4: String = 2.3.3 

spark.sql(s"""show databases""").show(50,false)

Show databases from your notebook:

Show tables in your database:

Look at the schema of one of the tables:

View records per day for this table:

Load DF and create a table in the database:

Now you can see your table atb_metrics_sample in your database:

Let us save the data into this table:

See records in this table:

Features

Notebooks are saved to Google Cloud Storage. This allows a user to open the notebook in the future and resume his/her work (All notebooks are stored at "PROJECT-ID/syn-cluster-config/zeppelin-notebooks" PATH).
All the Syntasa data that is usually accessible through Hive/Spark is accessible from within the Zeppelin Notebook (Time taken to run your piece of code from notebook will depend on the Spark runtime configurations that you create).
Similar to Jupyter, it allows you to run your code in blocks.

Note - This is currently in experimental mode. More developments to follow.

{[{category.name}]}

Configuring Zeppelin with BigQuery

Using Zeppelin Notebook with Spark

Features