What is the recommended best method to install new libraries/dependencies for Python?

Responded

Comments

1 comment

  • Avatar
    Mike Z (Edited )

    There are multiple methods to install Python modules depending on the scenario. If you want to use pure Python code and a Spark cluster is not required you can use “pip” to install in your notebook. Similar to these examples:

    !pip install pip --upgrade
    !pip install wheel pandas imageai==2.1.6 boto3 numpy==1.19.3 keras==2.4.3 pillow scipy h5py opencv-python keras-resnet pixellib scikit-image koalas tensorflow==2.4.0

    You can find more information about “pip” and available commands and options (i.e.pip uninstall,  –ignore-installed, --force-reinstall) in the official documentation

    Please note, we recommend installing with version numbers if you know what module version you are using.

    The other method of installing Python modules is by adding as configurations in the Syntasa Spark cluster runtime. This will install selected packages when the cluster is being instantiated. We have two options here “online” and not online.

    Online

    Add these two configs to the runtime (see screenshot below as well):

    syntasa.python.enable.dependencies       true
    syntasa.python.dependencies.names.online wheel,pandas,imageai==2.1.6,boto3,numpy==1.19.3,keras==2.4.3,pillow,scipy,h5py,opencv-python,keras-resnet,pixellib,scikit-image,koalas,tensorflow==2.4.0

    Not online

    1. Download packages to a local, cloud or network folder (i.e. tensorflow-2.4.0-cp37-cp37m-manylinux2010_x86_64.whl)
    2. Move or copy the file(s) the following folder in the Syntasa project bucket: /syn-cluster-config/deps/python
    3. Add these two lines to the runtime 
    syntasa.python.enable.dependencies true
    syntasa.python.dependencies.names tensorflow-2.4.0-cp37-cp37m-manylinux2010_x86_64.whl
    0
    Comment actions Permalink

Please sign in to leave a comment.