When subscribing to Syntasa monitoring and managed services, there are several data checks and notifications that are configured so that the team can provide fast remedies when needed. The setup of the monitoring services consists of two parts: the gathering of needed system information and the analysis and notifying of specified scenarios based on the information.
Information gathering
The needed system information to be gathered is configurable per data type. Each data type is configured on the Syntasa environment based on the desired monitoring. For the configured data types, the gathered data is sent to the Syntasa services team for analysis and monitoring.
Job execution
Each job running throughout the Syntasa platform creates operational data recording the start, finish, status, execution time, etc. of the job. This information can be seen in each job's Operations - Activity Logs screen. The job execution data is used to monitor the current status and trends of scheduled job executions.
Cluster information
Each job running throughout the Syntasa platform utilizes and starts a cluster, templatized and named Runtime in the Syntasa platform, on the cloud environment to execute the jobs. Also, users can take advantage of the Interactive Mode while building and testing an app to start a cluster indefinitely. The cluster information is used to monitor the current status and trends of cluster use.
User-defined events
User-defined events created in the Syntasa platform enable users to define scenarios, for example, a certain file arriving in a specified location, that can then be used to kick off a job. The creation of user-defined events produces event log data that can be examined for anomalies or missing files.
Table metadata
Each table available and being written to by the Syntasa platform has metadata capturing the existing/created partitions, the number of rows per partition, etc. This is seen as the State tab of a table node in an app. The table metadata is used to identify missing partitions and irregularities in the number of rows within a partition of a table.
Analysis and monitoring
The data types configured to be gathered are received and processed automatically regularly. Each data type can be utilized to perform various near real-time checks as well as long-term trends to ensure the current and potential future health of the Syntasa platform.
Job health
Analysis and monitoring can be configured per job and are performed for numerous situations. The very basic check is if the status of a job has moved to Failed or is running longer than normal, but the monitoring services go further to analyze the trend and pattern of the jobs' start and end times to ensure the jobs are running within their normal timeframe.
A job may not have started within its regular timeframe due to several scenarios, for example, a user may have temporarily turned off the schedule of a job and forgot to turn it back on; a job is scheduled not by time but based on an event and that event has failed to trigger in its regular timeframe; or a service outage at the Syntasa platform or cloud provider level.
Furthermore, the regularly gathered job execution data is analyzed for trends like a gradual increase in execution time. Such a scenario would not be seen by viewing a handful of recent executions but signaled by analysis of the executions over a period of time. Such an increase may not be an issue as it could be related to an increase in the amount of data being processed, but it could point out a needed optimization in the app or Syntasa platform.
Cluster status
The clusters that are started by jobs or in an ad-hoc manner via the Interactive Mode produce status and log data capturing runtime type, uptime, etc. The gathered cluster information is monitored for abnormalities such as running longer than normal or beyond a specified threshold. Such a situation can lead to unwanted costs from the cloud provider and can be a source for notification to be remedied by the Syntasa services team.
Event regularity
User-defined events may be created, for example, to listen for a regularly received data file that may, in turn, be used to kick off a job, as opposed to having the job scheduled at "best guess" time. Similar to the analysis of job execution data for missing jobs, the gathered user-defined event log data is monitored and analyzed for missing or late events, for example, an expected data file that has not been received in its normal timeframe.
Such a timeframe is dynamic and based on recent events, based on the gathered user-defined event log data, instead of setting a hard-coded timeframe. This allows the notifications to be flexible and adjust automatically.
Data integrity
Beyond verifying that the jobs ran successfully and on time, the gathered table metadata is analyzed to ensure the jobs processed and produced the amount of data that is expected. The metadata's recent entries are reviewed for patterns discerning the timing of when a new partition is created and how much data exists per partition.
Missing partitions or peculiarities in the number of rows per partition create notifications that are examined to ensure the data is complete. The analysis may result in no issue as, for example, there had been a decrease in the amount of data received, or there very well may have been an issue with the processing of the data within the app or upstream in the data that was delivered to the Syntasa platform.
System heartbeat
In addition to specific data types gathered coinciding with operational and business analysis and monitoring, the regularly gathered information serves as a heartbeat for the Syntasa platform. Similar to reviewing metadata of tables within the monitored Syntsasa environment, metadata of the gathered monitoring information is reviewed to ensure the data is being received in a regular pattern. A missing heartbeat could be a simple hiccup or be a symptom of a wider problem within the monitored Syntasa environment.