The Notebook Process node in Syntasa App Studio enables you to take a Jupyter Notebook developed in a Workspace and run it as a production-grade, scheduled component within a data pipeline. This article provides a professional, end-to-end walkthrough for configuring, parameterizing, executing, and monitoring a Notebook Process.
Prerequisites
Before you begin, ensure that the following are in place:
- A Notebook Workspace has been created.
- A Jupyter Notebook file (
.ipynb) exists in that workspace. - A Syntasa App has been created in App Studio.
Add the Notebook Process Node
- Open your App in App Studio.
- From the left-hand palette, locate the Notebook icon under the Processes section.
- Drag and drop the Notebook icon onto the canvas.
- Click the node to open the Configuration Sidebar.
Select and Synchronize the Notebook
This step links the process node to your notebook source code and workspace context.
Workspace Selection
- Select the Workspace that contains your notebook.
- This determines both the storage context and the runtime file system used during execution.
Notebook Name
- Browse the workspace directory and select the target Notebook.
Launch Button
- Opens the selected notebook in a new JupyterLab tab.
- Useful for quick edits or verification without leaving App Studio.
Refresh Button
- Re-scans the notebook file.
- Updates metadata and parameter definitions in the UI after any notebook changes.
Configure Parameters (Parameters Preview)
Syntasa uses Papermill-style parameterization to inject runtime values into notebooks.
In JupyterLab
- Add a cell tagged exactly as
parameters(lowercase). - Define default values, for example:
batch_id = "default" process_date = "1970-01-01"
In App Studio
- Navigate to the Parameters section of the configuration sidebar.
- The system displays a Preview of all variables detected in the tagged cell.
Override Values
- Map notebook parameters to App-level or system variables.
- Example:
process_date = {{process_date}}Troubleshooting
- If parameters do not appear:
- Verify the cell tag is
parameters(lowercase). - Click Refresh in the configuration panel.
- Verify the cell tag is
Runtime and Infrastructure Settings
Define the execution environment for the notebook.
Runtime
- Choose the appropriate Python or Spark runtime.
- Ensure it matches the libraries and framework versions required by your code.
Compute Profile
- Configure CPU and memory allocation.
- For Spark workloads, ensure adequate Driver and Executor memory based on data volume.
Environment Variables
- Add key-value pairs such as:
LOG_LEVELAPI_ENDPOINTENV
These variables are injected into the runtime environment during execution.
Define Outputs
A Notebook Process produces two primary output types.
Data Outputs
- If your notebook writes data to a table or file path, register that location as an Output Dataset.
- This enables downstream nodes to detect data readiness and establish dependencies.
Executed Notebook Artifact
- By default, Syntasa saves a copy of the notebook after execution, including:
- Cell outputs
- Logs
- Charts and visualizations
- Configure the storage path for this artifact in the Output tab.
Scheduling and Triggering Execution
Notebook Processes participate fully in App orchestration.
Connect Dependencies
- Draw a connector from upstream nodes (for example, Crawlers or Event Stores) to the Notebook Process.
- The notebook will execute only after upstream dependencies complete successfully.
Configure Schedule
- Open App Settings.
- Define triggers such as:
- Hourly
- Daily
- Weekly
- Event-based
Deploy
- Click Build and Deploy to activate the pipeline.
Monitoring and Troubleshooting
App Monitor
- View real-time execution status in the Monitor tab.
Logs
- Click the Notebook Process node in Monitor view to access:
- Python tracebacks
- Spark driver and executor logs
Executed Notebook Review
- Download or open the executed notebook to inspect:
- Failed cells
- Intermediate outputs
- Runtime parameters
This is often the fastest way to identify logic or data issues.
Summary of Key UI Actions
| Action | Purpose |
|---|---|
| Launch | Opens JupyterLab to edit the source notebook. |
| Refresh | Synchronizes App Studio with the latest notebook changes and parameters. |
| Preview | Displays variables detected in the parameters cell. |
| Output Path | Defines where the executed notebook snapshot is stored. |
Conclusion
The Notebook Process node bridges interactive data science development and production-grade orchestration. By carefully configuring workspace linkage, parameters, runtime resources, outputs, and schedules, teams can reliably operationalize notebooks as scalable and auditable components of enterprise data pipelines.