Databricks

Using DataHeroes within your Databricks environment is very easy and requires a single step in which the dh_spark library, DataHeroes library for distributed architectures such as Databricks clusters, is installed.

dh_spark, can be installed in one of the following ways:

  1. Cluster Library Tab: Upload the provided dh_spark.whl file to your cluster via the Libraries tab in the Databricks UI.
  2. Magic Command: Use the pip magic command to install the library directly in your notebook. For example: pip install /dbfs/path/to/your/dh_spark_library.whl. Be sure to replace /dbfs/path/to/your/dh_spark_library.whl with the actual path to the .whl file in your Databricks File System (DBFS).

Once the library is installed, follow this example to train higher quality machine learning models, using DataHeroes state-of-the-art data reduction and data cleaning techniques on Databricks’ powerful clusters.