Solving Spark error: “detected implicit cartesian product for FULL OUTER join between logical plans"

Error

I encountered an error when I want to outer join two dataframes using PySpark.

joined_df = (
    df1
    .join(df2), how='outer')
)

org.apache.spark.sql.AnalysisException:
detected implicit cartesian product for FULL OUTER join between logical plans

Solution

To enable crossJoin in SparkSession can solve this problem.

spark.sql.crossJoin.enabled: true

Code example

spark = (
    SparkSession
    .builder.appName('my_spark')
    .config("spark.sql.crossJoin.enabled", "true")
    .getOrCreate()
)
廣告

發表迴響

在下方填入你的資料或按右方圖示以社群網站登入:

WordPress.com 標誌

您的留言將使用 WordPress.com 帳號。 登出 /  變更 )

Twitter picture

您的留言將使用 Twitter 帳號。 登出 /  變更 )

Facebook照片

您的留言將使用 Facebook 帳號。 登出 /  變更 )

連結到 %s