Thursday, February 10, 2022

pyspark Installation

Install Latest Python

sudo apt update

 sudo apt -y upgrade

 python3 -V

Install Development Tools

sudo apt install -y build-essential libssl-dev libffi-dev python3-dev

sudo apt install software-properties-common -y

sudo add-apt-repository ppa:deadsnakes/ppa

sudo apt install python3.10

Install PIP 

sudo apt install -y python3-pip

Install pyspark

pip install pyspark

source ~/.profile

sudo apt install default-jdk scala git

Verify Spark Installation

pyspark

Python 3.8.10 (default, Nov 26 2021, 20:14:08) 

[GCC 9.3.0] on linux

Welcome to SPARK   version 3.2.1

Using Python version 3.8.10 (default, Nov 26 2021 20:14:08)

Spark context Web UI available at http://dollar.lan:4040

Spark context available as 'sc' (master = local[*], app id = local-1644524108365).

SparkSession available as 'spark'.

>>> big_list = range(1000)

>>> rdd = sc.parallelize(big_list, 2)

>>> odds = rdd.filter(lambda x:x %2 != 0)

>>> odds.take(5)

[1, 3, 5, 7, 9]                                  

Ctrl-D to exit