Applicable to Syntasa platforms installed in an Amazon AWS environment, this sample notebook provides examples of installing libraries within your notebook.
When using SYNTASA you can use the local notebook (which won't have Apache Spark available), this is useful for running simple Python/Scala Code. Additionally, users can attach a Runtime to run jobs on a bigger external cluster (for Spark and Distributed workloads). See the Gear icon to the right of the Notebook to open the Syntasa runtime screen.
Installing Libraries using Jupyter Magic Command
This is only for installing in the local environment (within the notebook) and can help when writing generic Python code without any Spark Dependencies.
Once you run the below cell, you should be able to import the libraries you tried to install. In the below example, we installed "boto3", "pillow", and "s3fs".
!pip3 install pip --upgrade
!pip3 install boto3 pillow s3fs --upgrade
## Using the '!' symbol will execute the command within the local notebook environment
## You can run multiple commands in sequence
In the below code, we are trying to import the libraries and print their installed version:
import boto3
import PIL
import s3fs
print(f'Boto3 Version :: {boto3.__version__}')
print(f'Pillow Version :: {PIL.__version__}')
print(f'S3FS Version :: {s3fs.__version__}')
Installing Libraries on an attached Runtime (Cluster) using Spark Magic and Syntasa Helper Methods
In this next section, we will install libraries on a remote cluster which will be attached to this notebook.
After creating a notebook instance, please click the Play button to start a Syntasa Runtime.
Depending on the Runtime Size and AWS connection, this could take anywhere between 5-15 minutes. Please note, after the runtime is spun up the Status in the top bar will change from [[ RUNTIME STATUS: Creating ]] to [[ RUNTIME STATUS: Running ]].
At this point, Livy should connect and allow you to run the code remotely. Please note that the cells after this point are annotated with "%%spark" at the top of each cell. These will run the code blocks on the remote cluster.
%%spark
from syn_utils import *
synutils.lib.installPyPI('boto3 pillow s3fs')
%%spark
import boto3
import PIL
import s3fs
print(f'Boto3 Version :: {boto3.__version__}')
print(f'Pillow Version :: {PIL.__version__}')
print(f'S3FS Version :: {s3fs.__version__}')
Additionally to Install libraries and be able to view the output logs of the install (to debug any issues)
Please use the below workaround to view debug logs in case you are running into issues using the previous method. This method additionally will let you run yum commands to install system libraries as well.
%%spark
from urllib3.exceptions import InsecureRequestWarning
from urllib3 import disable_warnings
from syn_utils.syn_notebook.lib import AmazonClusterPyPackageInstaller
import requests
requests.packages.urllib3.disable_warnings()
disable_warnings(InsecureRequestWarning)
TYPE_YUM='yum'
TYPE_PIP='pip'
INSTALL='install'
UNINSTALL='uninstall'
def install_package(cmd):
amazon_installer = AmazonClusterPyPackageInstaller()
for ip in amazon_installer.node_ip_addresses:
amazon_installer.execute_command(cmd, ip)
def custom_pkg_install(packages_arr, run_type, inst_uninst):
pkgs = ' '.join(packages_arr)
cmd_to_run = ''
if run_type == TYPE_YUM:
cmd_to_run = f'sudo yum {inst_uninst} {pkgs} -y'
elif run_type == TYPE_PIP:
if inst_uninst == INSTALL:
cmd_to_run = f'sudo python3 -m pip {inst_uninst} {pkgs}'
elif inst_uninst == UNINSTALL:
cmd_to_run = f'sudo python3 -m pip {inst_uninst} {pkgs} -y' #need to add '-y' here as uninstall stops execution and waits for user input.
print(f'Running Command [{cmd_to_run}]')
install_package(cmd_to_run)
Install a package from PIP using the above method (on an attached cluster/runtime)
%%spark
# To install a python package on the remote cluster (requests and matplotlib)
custom_pkg_install(['requests','matplotlib'], TYPE_PIP, INSTALL)
Import the above-installed packages on the remote cluster
%%spark
# Now import the above installed libs
import requests
import matplotlib
print(f'Requests Version :: {requests.__version__}')
print(f'MatPlotLib Version :: {matplotlib.__version__}')
Uninstall a package from PIP using the above method (on an attached cluster/runtime)
%%spark
# To uninstall a python package from the remote cluster (requests and matplotlib)
custom_pkg_install(['requests','matplotlib'], TYPE_PIP, UNINSTALL)
Install a package from YUM using the above method (on an attached cluster/runtime)
%%spark
# To install a yum package (python3-devel and python3-tools)
custom_pkg_install(['python3-devel','python3-tools'], TYPE_YUM, INSTALL)
Uninstall a package from YUM using the above method (on an attached cluster/runtime)
%%spark
# To uninstall a yum package (python3-devel and python3-tools)
custom_pkg_install(['python3-devel','python3-tools'], TYPE_YUM, UNINSTALL)
The attached sample notebook includes all the above scenarios when you are only using the local notebook and when runtimes are attached to the notebook.
The best way to understand and learn how to perform this function is through hands-on experience. Follow the steps below to create the sample notebook in your Syntasa environment:
- Download the sample notebook .ipynb file from this article.
- Create a new notebook in your Syntasa environment using the import notebook option.