Databricks Connect
To install DB Connect on your local machine:
- Create a new Anaconda environment
conda create --name dbconnect python=3.8 conda activate dbconnect
- Uninstall pyspark
pip uninstall pyspark
- Install databricks-connect
pip install -U "databricks-connect==8.1.*"
- Configure databricks-connect. Use the values below
databricks-connect configure
- Test connection
databricks-connect test
Dependencies
Pyspark will run if you install databricks-connect as the instructions above indicate. However, you will receive some warning messages indicating that there are still some dependencies which are not installed on your machine. To get rid of the winutils warning, do the following:
Install hadoop binaries
- Download
hadoop-3.3.1
to your local machine, by clicking the appropriate file on the Hadoop download page. I saved the zipped file toC:/Users/myuser/Documents/lib
- Unzip hadoop-3.3.1. When it is completely unzipped, you will be able to see a folder named bin. This is the location of the hadoop binaries.
- Add the hadoop filepath to your Environment variables.
- Add a new user variable named
HADOOP_HOME
and set the value as the filepath of the unzipped folderhadoop-3.3.1
- Edit the system variable
PATH
, and the hadoop filepath to the list ofPATH
variables
- Add a new user variable named
Add winutils.exe to the hadoop binaries
- Clone the winutils repo onto your machine
- Find a winutils.exe file in one of the bin folders.
- Copy this file to hadoop-3.3.1/bin