What is the recommended best practice to query Presto using Syntasa Notebook?
RespondedI'm hoping I can get some sample code that helps demonstrate the best method of connecting to Presto in a Syntasa Notebook.
-
Official comment
It looks like the suggested code may not work for all situations. A jar file needs to be downloaded to all executors in a notebook which isn't the easiest thing to do. The below is what was recommended instead:
https://github.com/prestodb/presto-python-client
Though, this is an edge case we ran into over 1.5 years ago when our notebook capability was still a bit new. Syntasa Notebooks have matured immensely these days. For probably 99% of our cases (so far) we're able to use spark.sql queries in notebooks to analyze the tables. Attaching an external cluster is not necessary for most high level exploratory understanding of datasets that don't require much in terms of resources.
Comment actions -
You can install this library and try it from notebook without using spark
-
Mike Z,
Eric has suggested the below code:from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder \ .appName("PrestoQuery") \ .config("spark.jars.packages", "com.facebook.presto:presto-jdbc:0.240") \ # Replace with the appropriate version .getOrCreate() # Set Presto connection properties url = "jdbc:presto://presto-server.example.com:8080/" properties = { "user": "your_username" } # Query query = "SELECT * FROM your_table" # Read data using the jdbc data source df = spark.read \ .format("jdbc") \ .option("url", url) \ .option("query", query) \ .option("user", properties["user"]) \ .load()
Please sign in to leave a comment.
Comments
5 comments