Databricks Connect

To install DB Connect on your local machine:

Create a new Anaconda environment

conda create --name dbconnect python=3.8
conda activate dbconnect

Uninstall pyspark
```
pip uninstall pyspark
```

Install databricks-connect

pip install -U "databricks-connect==8.1.*"

Configure databricks-connect. Use the values below
```
databricks-connect configure
```
Test connection
```
databricks-connect test
```

Dependencies

Pyspark will run if you install databricks-connect as the instructions above indicate. However, you will receive some warning messages indicating that there are still some dependencies which are not installed on your machine. To get rid of the winutils warning, do the following:

Install hadoop binaries

Download hadoop-3.3.1 to your local machine, by clicking the appropriate file on the Hadoop download page. I saved the zipped file to C:/Users/myuser/Documents/lib
Unzip hadoop-3.3.1. When it is completely unzipped, you will be able to see a folder named bin. This is the location of the hadoop binaries.
Add the hadoop filepath to your Environment variables.
- Add a new user variable named HADOOP_HOME and set the value as the filepath of the unzipped folder hadoop-3.3.1
- Edit the system variable PATH, and the hadoop filepath to the list of PATH variables

Add winutils.exe to the hadoop binaries

Clone the winutils repo onto your machine
Find a winutils.exe file in one of the bin folders.
Copy this file to hadoop-3.3.1/bin