Solving Spark error: “Decompression error: Version not supported" on GCP Dataproc

My gcloud command on terminal to create cluster

sudo gcloud dataproc clusters create my-project \
    --bucket my-bucket \
    --project my-gcp-project \
    --region asia-east1 \
    --zone asia-east1-b \
    --image-version=2.0-ubuntu18 \
    --master-machine-type n1-highmem-8 \
    --master-boot-disk-size 30 \
    --worker-machine-type n1-highmem-8 \
    --worker-boot-disk-size 100 \
    --num-workers 6 \
    --metadata='PIP_PACKAGES=xxhash' \
    --optional-components=JUPYTER \
    --initialization-actions gs://goog-dataproc-initialization-actions-asia-east1/python/pip-install.sh
    --subnet=default


Error

This error occurs during specific PySpark code running.

java.io.IOException: Decompression error: Version not supported

Solution

Change image-version from 2.0-ubuntu18 to 2.1-ubuntu20 can solve this version not supported error.

--image-version=2.1-ubuntu20 \

發表留言