Error
I encountered an error when I want to outer join two dataframes using PySpark.
joined_df = (
df1
.join(df2), how='outer')
)
org.apache.spark.sql.AnalysisException:
detected implicit cartesian product for FULL OUTER join between logical plans
Solution
To enable crossJoin
in SparkSession can solve this problem.
spark.sql.crossJoin.enabled: true
Code example
spark = (
SparkSession
.builder.appName('my_spark')
.config("spark.sql.crossJoin.enabled", "true")
.getOrCreate()
)