Skip to content

Application setup

Apache Hop can run as local application or as Docker-based web service.

Local application

The application is downloaded and unpacked in ~/apps/hop.

HOP_CONFIG_FOLDER=/home/rolf/dev/hop 1]

The default setup uses the config/ directory for both configuration and projects. A different config location can be provided via the environment variable HOP_CONFIG_FOLDER.

HOP_AUDIT_FOLDER=/home/rolf/.config/hop/audit

Logging from the HOP GUI goes into the audit folder as hopui.log.

:::: tip ::: title Tip :::

It is possible to work with multiple \"main configuration\" directories, as a way to separate workspaces. ::::

The sample projects are copied into this config directory as well, to experiment with.

:::: {} ::: title /home/rolf/dev/hop :::

~/dev
└── hop
    ├── environments ①
    ├── metadata ②
    ├── projects ③
    ├── hop-config.json ④
    └── ...other-files
::::

::: callout-list 1. Environments to use (e.g. lifecycle stages)

  1. Global metadata folder managed by Hop

  2. Sample projects copied from the application config folder

  3. Main configuration, managed by Hop. :::

Docker web service (not in use) {#docker_web_service(not_in_use)}

For the web service version, the suggested approach is to mount a local folder as config+projects locationm, and then specify an environment variable HOP_CONFIG_FOLDER to point to the mount location.

$ docker run -d -p 8080:8080 -v /home/rolf/dev/hop:/config -e HOP_CONFIG_FOLDER=/config --name=apache-hop apache/incubator-hop-web

:::: caution ::: title Caution :::

The web service container runs as root. This means local files will be owned by root as well.

I have not tested whether it is possible to start the container as another user. This may lead to permission problems in the container, for logs etc. ::::

Hop projects

Projects can be located anywhere, so we can add them in expected places.

:::: {} ::: title Hop projects layout :::

~/dev
├── dataworkbench
│   ├── etl-project ①
│   │   ├── datasets ②
│   │   ├── metadata ③
│   │   ├── tests ④
│   │   ├── project-config.json ⑤
│   │   └── ...other-files
│   └── ...other-projects
└── sandbox
    ├── etl-experiment ⑥
    └── ...other-experiments
::::

::: callout-list 1. Regular ETL or data orchestration repository.

  1. Project datasets as CSV files (default directory name is datasets)

  2. Project metadata (default directory name is metadata)

  3. Test scenarios

  4. Hop project configuration.

  5. Experimental ETL or data orchestration work (setup as above). :::

Concepts

The Hop config folder can be seen as a workspace on a local machine.

Metadata is managed per workspace and per project, and contains for instance database connections that use variables for host names.

A project can have a parent project from which it inherits all metadata objects (and the variables used in it). Ultimately, a project inherits the metadata of the Hop config folder.

Environments are also linked to a project, and contain values for the variables in the metadata. These are typically used for lifecycle stages (development, staging, production).

Environments consist of one or more JSON files with variables, so it is also possible to split these up.

folder "$HOP_CONFIG_FOLDER" {
  frame hop-config.json as hopConfigJson {
    map "Project(s)" as Project {
      name =>
      home => ProjectFolder
      configFilename => project-config.json
    }

    map "Environment(s)" as Environment {
      name =>
      project =>
      purpose =>
      configurationFiles =>
    }
  }

  Project "1" -right- "*" Environment

  folder metadata {
  }
  note bottom of metadata: Runtime configurations,\ndatabase connections, etc.

  frame EnvironmentConfigFile {
    map EnvironmentConfig {
      variables =>
    }
  }

  note bottom of EnvironmentConfigFile: Can be "anywhere", I keep system\ninformation in the Hop config folder,\nto have version control outside the project.
}

folder ProjectFolder {
  frame "project-config.json" as projectConfigJson {
    map ProjectConfig {
      parent =>
      metadataFolder =>
      unitTestFolder =>
      dataSetsCsvFolder =>
      variables =>
    }
  }

  folder metadata as projectMetadata {
  }

  metadata <|-- projectMetadata: inherits from
}

Project *-- ProjectFolder

Environment *-- EnvironmentConfigFile

  1. These variables are set in xref🐚environment.adoc[