What is the recommended best practice to query Presto using Syntasa Notebook?

Responded

Comments

5 comments

  • Official comment
    Avatar
    Mike Z (Edited )

    It looks like the suggested code may not work for all situations. A jar file needs to be downloaded to all executors in a notebook which isn't the easiest thing to do. The below is what was recommended instead:

    https://github.com/prestodb/presto-python-client

    Though, this is an edge case we ran into over 1.5 years ago when our notebook capability was still a bit new. Syntasa Notebooks have matured immensely these days. For probably 99% of our cases (so far) we're able to use spark.sql queries in notebooks to analyze the tables. Attaching an external cluster is not necessary for most high level exploratory understanding of datasets that don't require much in terms of resources.

    Comment actions Permalink
  • Avatar
    Mahesh Shenoy

    Mike Z  Was this been addressed? If yes please add the details so this can be referred later.

    Sarath Botlagunta / Shahdy Ali Hassan

    0
    Comment actions Permalink
  • Avatar
    Mahesh Shenoy

    Sarath Botlagunta / Pradeepraj Chandrasekaran : Can you please provide more details on this

    0
    Comment actions Permalink
  • Avatar
    Pradeepraj Chandrasekaran

    You can install this library and try it from notebook without using spark

    https://github.com/trinodb/trino-python-client

    0
    Comment actions Permalink
  • Avatar
    Labinot Dvorani

    Mike Z, 

    Eric has suggested the below code:

    from pyspark.sql import SparkSession
    
    # Create a Spark session
    spark = SparkSession.builder \
        .appName("PrestoQuery") \
        .config("spark.jars.packages", "com.facebook.presto:presto-jdbc:0.240") \  # Replace with the appropriate version
        .getOrCreate()
    
    # Set Presto connection properties
    url = "jdbc:presto://presto-server.example.com:8080/"
    properties = {
        "user": "your_username"
    }
    
    # Query
    query = "SELECT * FROM your_table"
    
    # Read data using the jdbc data source
    df = spark.read \
        .format("jdbc") \
        .option("url", url) \
        .option("query", query) \
        .option("user", properties["user"]) \
        .load()
    0
    Comment actions Permalink

Please sign in to leave a comment.