pySpark Development environment setup
Nov 15, 2020
1. Sample startup repository
https://github.com/zdjohn/spark-setup-workshop
2. Setup dev denpendency
python3
pip3
- install tox
pip install tox
(ref: https://tox.readthedocs.io/en/latest/index.html)
tox is not just a testing tool. It helps you to set up a dev
environment with common development library. of course, you can update/change based on your own preference.
- run
tox -e dev
dev dependency is configured inside tox.ini file: https://github.com/zdjohn/spark-setup-workshop/blob/master/tox.ini - source to tox dev virtual environment
source .tox\\dev\\bin\\activate
3. Setup project dependency
note: please make sure your dev venv is now activated
- install dependencies:
pip install -r requirements.txt
- add
dev
virtual environment to jupyter notebookpython -m ipykernel install --user --name=pyspark-sample
4. spark test run
jupyter notebook
spark-submit --master local[*] --deploy-mode client helloworld.py
5. package your project
- run
tox -e pack