Best process for IO Bound jobs loading data to event store
I have an IO Bound data extraction process where I would like to load a (relatively) small dataset into an event store. Can you please advise what is the best process to perform this activity? Is it still recommended to spin up a dataproc cluster, or would the shell processor be the best way to achieve this? If it is the later, what function do I use to load the data into an event store?
-
Syntasa platform supports a series of From<> processes for connecting with various external systems to bring the data into Syntasa Event Store.
From File Process: Connects with all file bases input source systems like FTP,SFTP, GCS, S3, ADLS etc.,
From DB Process: Connects with all the major databases like MySQL, Postgres SQL, Terradata, MS SQL, Oracle, Snowflake, Elastic Search, Mongo DB etc.,If there is requirement to bring data from other sources not supported by above processes, users have the flexibility to utilize Spark Code Process or Container Code Process and write custom code to implement the integration.
If Spark is not needed to establish the integration and can be done using standalone scala or python code, then best choice would be to use Container Code Process instead of Spark Code Process.
Container Code Process launches a single node Kubernetes Container Runtime instead of launching a bigger Spark Cluster, which is more suitable for Spark code rather than standalone python code.
Please sign in to leave a comment.
Comments
2 comments