Skip to content

DataHeroes CLI Commands¤

dataheroes-init¤

The dataheroes-init command provides an interactive CLI for initializing DataHeroes configuration. It allows you to configure your DataHeroes account and connection credentials.

Usage¤

dataheroes-init

This will start an interactive menu-driven interface with the following options:

  1. Activate the DataHeroes account (using email)
  2. Setup Databricks credentials
  3. Setup AWS credentials
  4. Setup GCP credentials
  5. Setup Azure credentials
  6. View current configuration
  7. Exit

Note: If you run dataheroes-init without any command-line options and no configuration file (.dataheroes.config) is found in any of the standard locations (see Configuration Storage below), the tool will skip the main menu and directly prompt you to activate your account via email. Once activated, the configuration file will be created, and subsequent runs will show the full menu.

Command-line Options¤

You can also use command-line options to set specific configuration values directly:

# Activate license with email
dataheroes-init --email=YOUR_EMAIL_ADDRESS

# Set Databricks credentials
dataheroes-init --databricks_api_key=YOUR_API_KEY --databricks_workspace_url=YOUR_WORKSPACE_URL

# Set multiple configurations at once
dataheroes-init --email=YOUR_EMAIL_ADDRESS --databricks_api_key=YOUR_API_KEY

Available Options¤

License Activation¤

  • --email: Email for license activation (required for license activation)

When using the --email option, the command will automatically attempt to activate the license with the DataHeroes licensing server.

Databricks Credentials¤

  • --databricks_api_key: Databricks access token.

    Can be generated from the User Settings page in your Databricks workspace.

    Go to Settings -> User Settings -> Access Tokens.

    Click "Generate New Token".

    Save the token securely as it cannot be viewed again after creation. - --databricks_workspace_url: Format: https://.cloud.databricks.com.

    This is the URL you use to access your Databricks workspace. - --databricks_http_path: Databricks HTTP path (SQL warehouse path or Spark cluster path).

    For SQL warehouses: /sql/1.0/warehouses/.

    For clusters: /sql/protocolv1/o//.

    This path directs your queries to the right compute resource. - --databricks_catalog and --databricks_schema: Catalog and schema. Relevant to where the data is located.

    Taken from the unity catalog.

AWS Credentials¤

  • --aws_access_key_id and --aws_secret_access_key: Key pair that can be generated following the instructions from here.
  • --aws_region: The region of the buckets that will be used.

GCP Credentials¤

  • --gcp_quota_project_id: Can be found on the GCP dashboard (called Project ID, not Project Name).
  • --gcp_client_id and --gcp_client_secret: Click here to create a client_id and client_secret pair.

    You can also follow the steps here and get the client_id and client_secret from the generated local credential file. - --gcp_refresh_token: The refresh token is automatically retrieved through a browser login based on the client_id and client_secret.

    If you don't have access to a browser, you can run dataheroes-init from a different device and copy the dataheroes configuration file to the browserless device.

    Azure Credentials¤

    • --storage_connection_string: Can be found following these steps.

Configuration Storage¤

The configuration is stored in a .dataheroes.config file. The command will use the highest priority location for the configuration file:

  1. Path specified in the DATAHEROES_CONFIG_PATH environment variable (can be a full file path or a directory containing .dataheroes.config)
  2. Current working directory (./.dataheroes.config)
  3. User's home directory (~/.dataheroes.config)

If a configuration file is found, it will be used. If multiple exist, the one with the highest priority (lowest number in the list above) is used. When saving configuration changes (either through options or the interactive menu), the highest priority existing file will be updated, or if none exist, a new file will be created in the user's home directory.

Security¤

  • When displaying existing configuration values, sensitive information is masked by showing only the last 4 characters
  • For any new input that contains sensitive information, the command will confirm with the user before saving