pySpark Development environment setup

John Di Zhang
Nov 15, 2020

1. Sample startup repository

https://github.com/zdjohn/spark-setup-workshop

2. Setup dev denpendency

tox is not just a testing tool. It helps you to set up a dev environment with common development library. of course, you can update/change based on your own preference.

3. Setup project dependency

note: please make sure your dev venv is now activated

  • install dependencies: pip install -r requirements.txt
  • add dev virtual environment to jupyter notebook python -m ipykernel install --user --name=pyspark-sample

4. spark test run

  • jupyter notebook
  • spark-submit --master local[*] --deploy-mode client helloworld.py

5. package your project

  • run tox -e pack

--

--

John Di Zhang

a dad, a codesmith, a phd in process, a master of none