Install Latest Python
sudo apt update
sudo apt -y upgrade
python3 -V
Install Development Tools
sudo apt install -y build-essential libssl-dev libffi-dev python3-dev
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.10
Install PIP
sudo apt install -y python3-pip
Install pyspark
pip install pyspark
source ~/.profile
sudo apt install default-jdk scala git
Verify Spark Installation
pyspark
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Welcome to SPARK version 3.2.1
Using Python version 3.8.10 (default, Nov 26 2021 20:14:08)
Spark context Web UI available at http://dollar.lan:4040
Spark context available as 'sc' (master = local[*], app id = local-1644524108365).
SparkSession available as 'spark'.
>>> big_list = range(1000)
>>> rdd = sc.parallelize(big_list, 2)
>>> odds = rdd.filter(lambda x:x %2 != 0)
>>> odds.take(5)
[1, 3, 5, 7, 9]