Are all default Python libraries available in Notebook also available by default in Spark Processor?
If a user does a
!pip freeze
in the Notebook a listing of all the available libraries appears and not all of these libraries are available by default if I use a Jupyter Notebook locally (i.e. numpy and pandas), which, is great.
Are all the Python libraries available to me by default in Notebook also available in Spark Processor? If not, are there designs to sync the two?
Am now always thinking from an adoption perspective, it won't make much sense to the user why there is a disconnect, if there is a disconnect.
-
We used to have some of the standard libraries to be available in Spark Processors. But we started running into more issues as EMR/Dataproc has there own libraries and dependencies, that were conflicting with the preinstalled packages we were providing. So we removed them.
But with Jupyter Notebook, as kernel is completely managed by Syntasa, we have the flexibility of preinstalling some libraries with out any issues.We will have this disconnect for the foreseeable future. But eventually we would like to start utilizing Kubernetes Spark Cluster for all the data processing instead of EMR/Dataproc. Once Kubernetes Spark becomes main stream, we can try to be in sync as much as we can.
-
As Spark processor is not an interactive experience for the user, it is not possible.
As versions of default packages might be constantly being updated by EMR/Dataproc we can't have a static list and show in one of the section of the Spark processor.
We could dynamically get it but it is possible only after the runtime is started. So when user is writing the spark code, runtime might not even be up.
-
What would be suggested best practice then for the user?
Two scenarios come to mind...
- Specify whatever libraries/dependencies you need to install in the Spark Processor code?
- Run without trying to install any and let it error out?
Scenario two sounds like bad practice. I don't know all the ramifications of running scenario one, but it sounds tolerable.
Please sign in to leave a comment.
Comments
5 comments