Application setup¶
Apache Hop can run as local application or as Docker-based web service.
Local application¶
The application is downloaded and unpacked in ~/apps/hop.
HOP_CONFIG_FOLDER=/home/rolf/dev/hop1]-
The default setup uses the
config/directory for both configuration and projects. A different config location can be provided via the environment variableHOP_CONFIG_FOLDER. HOP_AUDIT_FOLDER=/home/rolf/.config/hop/audit-
Logging from the HOP GUI goes into the audit folder as
hopui.log.
:::: tip ::: title Tip :::
It is possible to work with multiple \"main configuration\" directories, as a way to separate workspaces. ::::
The sample projects are copied into this config directory as well, to experiment with.
:::: {}
::: title
/home/rolf/dev/hop
:::
~/dev
└── hop
├── environments ①
├── metadata ②
├── projects ③
├── hop-config.json ④
└── ...other-files
::: callout-list 1. Environments to use (e.g. lifecycle stages)
-
Global metadata folder managed by Hop
-
Sample projects copied from the application config folder
-
Main configuration, managed by Hop. :::
Docker web service (not in use) {#docker_web_service(not_in_use)}¶
For the web service version, the suggested approach is to mount a local
folder as config+projects locationm, and then specify an environment
variable HOP_CONFIG_FOLDER to point to the mount location.
$ docker run -d -p 8080:8080 -v /home/rolf/dev/hop:/config -e HOP_CONFIG_FOLDER=/config --name=apache-hop apache/incubator-hop-web
:::: caution ::: title Caution :::
The web service container runs as root. This means local files will be owned by root as well.
I have not tested whether it is possible to start the container as another user. This may lead to permission problems in the container, for logs etc. ::::
Hop projects¶
Projects can be located anywhere, so we can add them in expected places.
:::: {} ::: title Hop projects layout :::
~/dev
├── dataworkbench
│ ├── etl-project ①
│ │ ├── datasets ②
│ │ ├── metadata ③
│ │ ├── tests ④
│ │ ├── project-config.json ⑤
│ │ └── ...other-files
│ └── ...other-projects
└── sandbox
├── etl-experiment ⑥
└── ...other-experiments
::: callout-list 1. Regular ETL or data orchestration repository.
-
Project datasets as CSV files (default directory name is
datasets) -
Project metadata (default directory name is
metadata) -
Test scenarios
-
Hop project configuration.
-
Experimental ETL or data orchestration work (setup as above). :::
Concepts¶
The Hop config folder can be seen as a workspace on a local machine.
Metadata is managed per workspace and per project, and contains for instance database connections that use variables for host names.
A project can have a parent project from which it inherits all metadata objects (and the variables used in it). Ultimately, a project inherits the metadata of the Hop config folder.
Environments are also linked to a project, and contain values for the variables in the metadata. These are typically used for lifecycle stages (development, staging, production).
Environments consist of one or more JSON files with variables, so it is also possible to split these up.
folder "$HOP_CONFIG_FOLDER" {
frame hop-config.json as hopConfigJson {
map "Project(s)" as Project {
name =>
home => ProjectFolder
configFilename => project-config.json
}
map "Environment(s)" as Environment {
name =>
project =>
purpose =>
configurationFiles =>
}
}
Project "1" -right- "*" Environment
folder metadata {
}
note bottom of metadata: Runtime configurations,\ndatabase connections, etc.
frame EnvironmentConfigFile {
map EnvironmentConfig {
variables =>
}
}
note bottom of EnvironmentConfigFile: Can be "anywhere", I keep system\ninformation in the Hop config folder,\nto have version control outside the project.
}
folder ProjectFolder {
frame "project-config.json" as projectConfigJson {
map ProjectConfig {
parent =>
metadataFolder =>
unitTestFolder =>
dataSetsCsvFolder =>
variables =>
}
}
folder metadata as projectMetadata {
}
metadata <|-- projectMetadata: inherits from
}
Project *-- ProjectFolder
Environment *-- EnvironmentConfigFile
-
These variables are set in xref
environment.adoc[ ↩