My gcloud command on terminal to create cluster
sudo gcloud dataproc clusters create my-project \
--bucket my-bucket \
--project my-gcp-project \
--region asia-east1 \
--zone asia-east1-b \
--image-version=2.0-ubuntu18 \
--master-machine-type n1-highmem-8 \
--master-boot-disk-size 30 \
--worker-machine-type n1-highmem-8 \
--worker-boot-disk-size 100 \
--num-workers 6 \
--metadata='PIP_PACKAGES=xxhash' \
--optional-components=JUPYTER \
--initialization-actions gs://goog-dataproc-initialization-actions-asia-east1/python/pip-install.sh
--subnet=default
Error
This error occurs during specific PySpark code running.
java.io.IOException: Decompression error: Version not supported
Solution
Change image-version from 2.0-ubuntu18 to 2.1-ubuntu20 can solve this version not supported error.
--image-version=2.1-ubuntu20 \