What is the recommended best method to install new libraries/dependencies for Python?Responded
There are multiple methods to install Python libraries/dependencies in a notebook. I would like to know the best method to install these in a notebook so that I may be able to most efficiently productionize my code in a Spark Processor. Sample code demonstrating this would be very beneficial.
There are multiple methods to install Python modules depending on the scenario. If you want to use pure Python code and a Spark cluster is not required you can use “pip” to install in your notebook. Similar to these examples:
!pip install pip --upgrade
!pip install wheel pandas imageai==2.1.6 boto3 numpy==1.19.3 keras==2.4.3 pillow scipy h5py opencv-python keras-resnet pixellib scikit-image koalas tensorflow==2.4.0
You can find more information about “pip” and available commands and options (i.e.pip uninstall, –ignore-installed, --force-reinstall) in the official documentation.
Please note, we recommend installing with version numbers if you know what module version you are using.
The other method of installing Python modules is by adding as configurations in the Syntasa Spark cluster runtime. This will install selected packages when the cluster is being instantiated. We have two options here “online” and not online.
Add these two configs to the runtime (see screenshot below as well):
- Download packages to a local, cloud or network folder (i.e. tensorflow-2.4.0-cp37-cp37m-manylinux2010_x86_64.whl)
- Move or copy the file(s) the following folder in the Syntasa project bucket: /syn-cluster-config/deps/python
- Add these two lines to the runtime
Please sign in to leave a comment.