To successfully operationalize Jupyter notebooks in production, the .ipynb file alone is often not sufficient. Most real‑world workflows rely on additional assets such as:
- Custom Python modules (
.py) - Configuration files (
.json,.yaml,.yml) - Lookup or reference datasets (
.csv,.parquet, etc.) - Environment setup scripts (
.sh,requirements.txt)
This article explains how to structure, reference, and manage these supporting files within Syntasa Notebook Workspaces so that your notebooks run reliably both in JupyterLab and as automated Notebook Process nodes in App Studio pipelines.
Overview of File Management in Syntasa Workspaces
Each Syntasa Workspace provides a persistent filesystem backed by cloud storage (such as S3 or GCS). Files stored here are available during interactive development and when the notebook is executed as part of a production pipeline.
Choosing the correct location and referencing files properly is essential for:
- Collaboration across teams
- Reproducible execution
- Smooth promotion from development to production
Placing Supporting Files in JupyterLab
Within a Workspace, supporting files can be stored in two primary locations depending on how they are intended to be used.
Personal Directory (/home/<username>)
Recommended for:
- Individual development
- Prototyping and experimentation
- Private utilities not yet ready for team use
Characteristics:
- Visible only to the owning user in JupyterLab
- Accessible to Notebook Processes that reference notebooks stored in this directory
- Not editable or visible to other team members
Note: If a production notebook depends on files stored in a personal directory, other users will not be able to view or maintain those files, which may create operational risk.
Shared Directory (/shared)
Recommended for:
- Production workflows
- Shared utility modules
- Configuration files
- Common reference datasets
Characteristics:
- Visible to all users who have access to the Workspace
- Ideal for collaboration and long‑term maintenance
Best Practice: Any notebook intended to run as a Notebook Process in a Syntasa App should store its supporting files in the
/shareddirectory or its subfolders.
Uploading Files to JupyterLab
- Open your Workspace in JupyterLab.
- Navigate to the target folder in the left‑hand file browser.
- Click the Upload (↑) icon, or drag and drop files from your local machine.
Referencing Files Within a Notebook
All file references inside notebooks should use relative paths. This ensures compatibility between:
- Interactive execution in JupyterLab
- Automated execution in App Studio Notebook Process nodes
Importing Custom Python Modules
If utils.py is located in the same directory as your notebook:
import utils # or from utils import my_custom_function
If it is in a subfolder (for example, lib/utils.py):
from lib import utils
Reading Configuration or Data Files
Example directory structure:
notebook.ipynb config/settings.json Example code:
import json
config_path = 'config/settings.json'
with open(config_path, 'r') as f:
config = json.load(f)
Tip: Avoid absolute paths such as /home/jdoe/... or /shared/.... Relative paths provide portability and prevent environment‑specific failures.Using Supporting Files in a Notebook Process
When a Notebook Process node is executed in App Studio, Syntasa creates an execution environment that mirrors your Workspace file structure.
How File Resolution Works
- Working Directory: Automatically set to the folder containing the notebook.
- Storage Access: The process is linked to the Workspace’s underlying cloud storage volume.
- Execution Context: Spark or Python engines see the same directory layout as in JupyterLab.
As a result:
pd.read_csv('data.csv')will work in production if it works in JupyterLab and the file exists in the same folder (or a subfolder).
Critical Configuration: Workspace Selection
When configuring the Notebook Process node in App Studio, you must select the same Workspace that contains:
- The notebook file
- All supporting files and folders
This selection determines which storage volume is mounted during execution.
If the wrong Workspace is selected, the process will fail to locate required files even if the notebook path appears correct.
Advanced: Environment Setup Files
Some notebooks require additional dependencies or system configuration before execution.
requirements.txt
You can store a requirements.txt file in your Workspace and install dependencies during execution:
!pip install -r requirements.txt
Recommendation: For production pipelines, define these packages in the Runtime Environment configuration instead of installing them dynamically during notebook execution. This improves performance, stability, and reproducibility.
Shell Scripts (.sh)
Shell scripts can be used for advanced setup tasks such as:
- Downloading external drivers
- Configuring system libraries
- Initializing custom tools
Example usage:
import subprocess subprocess.run(["sh", "setup_env.sh"]) These scripts should also be stored in the Workspace (preferably under /shared for production workflows).
Production Readiness Checklist
Before promoting a notebook to a Notebook Process, verify the following:
- All supporting files (
.py,.csv,.json,.yaml) are in the same folder or a subfolder of the notebook. - All file references use relative paths.
- Production notebooks store shared assets in the
/shareddirectory. - The Notebook Process node is configured with the correct Workspace.
- Required dependencies are defined in the Runtime Environment (preferred) or via controlled setup scripts.
Conclusion
Proper management of supporting files is essential for reliable, collaborative, and maintainable notebook‑based pipelines in Syntasa. By using shared directories, relative paths, and consistent workspace configuration, you ensure that notebooks behave identically in development and production—reducing errors and simplifying long‑term operations.