REANA-DB docs

image image image image image image image

REANA-DB is a component of the REANA reusable analysis platform. It contains REANA database models and utilities.

Features

  • database persistence for REANA system

  • database models and utilities

  • database upgrades and migrations

Usage

The detailed information on how to install and use REANA can be found in docs.reana.io.

Configuration

REANA DB configuration.

reana_db.config.DB_HOST = 'reana-db.default.svc.cluster.local'

Database service host.

reana_db.config.DB_NAME = 'reana'

Database name.

reana_db.config.DB_PASSWORD = 'reana'

Database password.

reana_db.config.DB_PORT = '5432'

Database service port.

reana_db.config.DB_SECRET_KEY = 'reana'

Database encryption secret key.

reana_db.config.DB_USERNAME = 'reana'

Database user name.

reana_db.config.DEFAULT_QUOTA_LIMITS = {'cpu': 0, 'disk': 0}

Default CPU (in milliseconds) and disk (in bytes) quota limits.

reana_db.config.DEFAULT_QUOTA_RESOURCES = {'cpu': 'processing time', 'disk': 'shared storage'}

Default quota resources to fill Resource table.

reana_db.config.PERIODIC_RESOURCE_QUOTA_UPDATE_POLICY = 0

Whether to run the periodic (cronjob) resource quota updater.

reana_db.config.SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://reana:reana@reana-db.default.svc.cluster.local:5432/reana'

SQLAlchemy database location.

reana_db.config.SQLALCHEMY_MAX_OVERFLOW = 2

How many new connections can temporarily exceed the pool size?

reana_db.config.SQLALCHEMY_POOL_PRE_PING = False

Do we always pre-ping for pessimistic connection handling?

reana_db.config.SQLALCHEMY_POOL_RECYCLE = 3600

How many seconds a connection can persist?

reana_db.config.SQLALCHEMY_POOL_SIZE = 5

How many permanent connections to the database to keep?

reana_db.config.SQLALCHEMY_POOL_TIMEOUT = 30

How many seconds to wait when retrieving a new connection from the pool?

reana_db.config.WORKFLOW_TERMINATION_QUOTA_UPDATE_POLICY = []

What quota types to update, if not specified all quotas will be calculated, if empty no quotas will be updated.

API

Database management

Database management for REANA.

reana_db.database.init_db()[source]

Initialize the DB.

Models

Models for REANA Components.

class reana_db.models.AuditLog(**kwargs)[source]

Audit log table.

created
updated
class reana_db.models.AuditLogAction(value)[source]

Enumeration of audit log actions.

class reana_db.models.CleanUpDependingOnStatusMixin[source]

Mixin to determine whether to clean up jobs for REANA status enums.

classmethod should_cleanup_job(job_status)[source]

Determine if a job/workflow should be cleaned up depending on its status.

class reana_db.models.InteractiveSession(**kwargs)[source]

Interactive Session table.

created
updated
class reana_db.models.InteractiveSessionResource(**kwargs)[source]

Interactive Session Resource table.

created
updated
class reana_db.models.InteractiveSessionType(value)[source]

Enumeration of interactive session types.

class reana_db.models.Job(**kwargs)[source]

Job table.

created
updated
class reana_db.models.JobCache(**kwargs)[source]

Job Cache table.

created
updated
class reana_db.models.JobStatus(value)[source]

Enumeration of possible job statuses.

class reana_db.models.QuotaBase[source]

Quota base functionality.

get_quota_usage()[source]

Get quota usage information.

class reana_db.models.QuotaHealth(value)[source]

Enumeration of quota health statuses.

class reana_db.models.Resource(**kwargs)[source]

Resource table.

created
static initialise_default_resources()[source]

Initialise default Resources.

updated
class reana_db.models.ResourceType(value)[source]

Enumeration of resource types.

class reana_db.models.ResourceUnit(value)[source]

Enumeration of resource usage units.

static human_readable_unit(unit, value)[source]

Convert passed value in units to human readable string.

class reana_db.models.RunStatus(value)[source]

Enumeration of possible run statuses.

class reana_db.models.User(access_token=None, **kwargs)[source]

User table.

access_token

REANA active access token value.

access_token_status

REANA most recent access token status.

active_token

REANA active access token object.

created
get_user_workspace()[source]

Build user’s workspace directory path.

Returns:

Path to the user’s workspace directory.

get_workflow_overload_priority()[source]

Get priority factor based on the number of current workflows running.

has_exceeded_quota()[source]

Get whether user has exceeded the quota of any resource.

initialize_user_quota_limits()[source]

Initialize user quota limits.

latest_access_token

REANA most recent access token.

log_action(action, details=None)[source]

Create audit log entry for the user.

Parameters:
  • action (AuditLogAction) – Type of action.

  • details – JSON field containing action details.

request_access_token()[source]

Create user token and mark it as requested.

updated
class reana_db.models.UserResource(**kwargs)[source]

User Resource table.

created
updated
class reana_db.models.UserToken(**kwargs)[source]

User tokens table.

created
updated
class reana_db.models.UserTokenStatus(value)[source]

Enumeration of possible user token statuses.

class reana_db.models.UserTokenType(value)[source]

Enumeration of possible user token types.

class reana_db.models.Workflow(id_, name, owner_id, reana_specification, type_, logs='', input_parameters={}, operational_options={}, status=RunStatus.created, complexity=[], git_ref='', git_repo=None, git_provider=None, workspace_path=None, restart=False, run_number=None, launcher_url=None)[source]

Workflow table.

activate_workspace_retention_rules()[source]

Activate workspace retention rules for the workflow.

can_transition_to(next_status)[source]

Whether the provided workflow can transition to the next status.

created
get_all_restarts()[source]

Get all the restarts of this workflow, including the original workflow.

Returns all the restarts of this workflow, that is all the workflows that have the same name and the same major run number. This includes the original workflow, as well as all the following restarts.

get_complexity_priority(total_cluster_memory)[source]

Calculate workflow priority based on its complexity.

get_full_workflow_name()[source]

Return full workflow name including run number.

get_input_parameters()[source]

Return workflow parameters.

get_new_run_number(run_number) Tuple[int, int][source]

Return the major and minor run numbers for a new workflow.

Return a tuple where the first element is the major run number and the second element is the minor run number.

get_owner_access_token()[source]

Return workflow owner access token.

get_priority(cluster_memory)[source]

Workflow priority when scheduling it.

Takes into account both the workflow complexity and the number of workflows running at a certain time.

get_specification()[source]

Return workflow specification.

get_workspace_disk_usage(summarize=False, search=None)[source]

Retrieve disk usage information of a workspace.

inactivate_workspace_retention_rules()[source]

Inactivate workspace retention rules for all the parent workflows.

property run_number: str

Get workflow run number.

set_workspace_retention_rules(rules: List[Dict[str, str]])[source]

Set workspace retention rules for the workflow.

static update_workflow_status(db_session, workflow_uuid, status, new_logs='', message=None)[source]

Update database workflow status.

Parameters:
  • workflow_uuid – UUID which represents the workflow.

  • status – String that represents the workflow status.

  • new_logs – New logs from workflow execution.

  • message – Unused.

update_workflow_timestamp(new_status)[source]

Update workflow timestamps according to new status.

updated
workspace_has_pending_retention_rules()[source]

Check whether the workspace has retention rules that are pending.

All the restarts of the workflow are considered when checking the retention rules, as they all share the same workspace.

class reana_db.models.WorkflowResource(**kwargs)[source]

Workflow Resource table.

created
updated
class reana_db.models.WorkflowSession(**kwargs)[source]

Workflow Session table.

class reana_db.models.WorkspaceRetentionAuditLog(**kwargs)[source]

Workspace retention audit log table.

class reana_db.models.WorkspaceRetentionRule(**kwargs)[source]

Workspace retention rule table.

can_transition_to(next_status)[source]

Whether the provided retention rule can transition to the next status.

serialize()[source]

Serialize workspace retention rule object data.

class reana_db.models.WorkspaceRetentionRuleStatus(value)[source]

Enumeration of workspace retention rule status.

  • created : initial status of each rule.

  • active : the workflow has finished running and the rule can now be considered.

  • inactive: the rule will not be considered, even though it wasn’t applied.

  • pending : the rule is currently being handled by the cronjob.

  • applied : the rule was handled by the cronjob and the files were deleted.

reana_db.models.convention = {'ck': 'ck_%(table_name)s_%(constraint_name)s', 'fk': 'fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s', 'ix': 'ix_%(column_0_label)s', 'pk': 'pk_%(table_name)s', 'uq': 'uq_%(table_name)s_%(column_0_name)s'}

Constraint naming convention.

reana_db.models.generate_uuid()[source]

Generate new uuid.

reana_db.models.job_status_change_listener(job, new_status, old_status, initiator)[source]

Job status change listener.

reana_db.models.workflow_status_change_listener(workflow, new_status, old_status, initiator)[source]

Workflow status change listener.

reana_db.models.workspace_retention_change_listener(mapper, connection, workspace_retention_rule)[source]

Workspace retention change listener.

Utilities

REANA-DB utils.

class reana_db.utils.Timer(name=None, total=None, periodic_delta=100)[source]

Timer to time events and log periodic progress.

count_event() None[source]

Count a new event.

elapsed() float[source]

Elapsed time since the creation of the Timer, in seconds.

estimated_total() float[source]

Estimated total time, in seconds.

log_periodic_progress() None[source]

Periodically log progress of events.

Progress is logged periodically after a given amount of events and when all the events are completed.

log_progress() None[source]

Log progress of events.

per_event() float[source]

Time per event, in seconds.

reana_db.utils.build_workspace_path(user_id, workflow_id=None, workspace_root_path=None)[source]

Build user’s workspace relative path.

Parameters:
  • user_id – Owner of the workspace.

  • workflow_id – Optional parameter, if provided gives the path to the workflow workspace instead of just the path to the user workspace.

  • workspace_root_path – Optional parameter, if provided changes the root path under which the workflow workspaces are stored.

Returns:

String that represents the workspace absolute path. i.e. /var/reana/users/0000/workflows/0034

reana_db.utils.get_default_quota_resource(resource_type)[source]

Get default quota resource by given resource type.

Parameters:

resource_type (reana_db.models.ResourceType) – Resource type corresponding to default resource to get.

reana_db.utils.get_disk_usage_or_zero(workspace_path) int[source]

Get disk usage for the workspace if exists, zero if not.

reana_db.utils.should_skip_quota_update(resource_type) bool[source]

Check if quota updates should be skipped based on the update policy.

Parameters:

resource_type – Resource type of the quota that needs to be updated.

reana_db.utils.split_run_number(run_number)[source]

Split run number into major and minor run numbers.

reana_db.utils.store_workflow_disk_quota(workflow, bytes_to_sum: int | None = None, override_policy_checks: bool = False)[source]

Update or create disk workflow resource.

Parameters:
  • workflow (reana_db.models.Workflow) – Workflow whose disk resource usage must be calculated.

  • bytes_to_sum (int) – Amount of bytes to sum to workflow disk quota, if None, du will be used to recalculate it.

  • override_policy_checks – Whether to update the disk quota without checking the update policy.

reana_db.utils.update_users_cpu_quota(user=None) None[source]

Update users CPU quota usage.

User CPU quotas will be calculated from workflow CPU quotas, so the latter should be updated before the former.

Parameters:

user (reana_db.models.User) – User whose CPU quota will be updated. If None, applies to all users.

reana_db.utils.update_users_disk_quota(user=None, bytes_to_sum: int | None = None, override_policy_checks: bool = False) None[source]

Update users disk quota usage.

User disk quota usage will be calculated from the individual workflow disk quota usage numbers, so this function should be typically called only after update_workflows_disk_quota().

Parameters:
  • user (reana_db.models.User) – User whose disk quota will be updated. If None, applies to all users.

  • bytes_to_sum (int) – Amount of bytes to sum to user disk quota, if None, du will be used to recalculate it.

  • override_policy_checks – Whether to update the disk quota without checking the update policy.

reana_db.utils.update_workflow_cpu_quota(workflow) int[source]

Update workflow CPU quota based on started and finished/stopped times.

Returns:

Workflow running time in milliseconds if workflow has terminated, else 0.

reana_db.utils.update_workflows_cpu_quota() None[source]

Update the CPU quotas of all workflows in a more efficient way.

reana_db.utils.update_workflows_disk_quota() None[source]

Update the disk quotas of all workflows in a more efficient way.

reana_db.utils.update_workspace_retention_rules(rules, status) None[source]

Update workspace retention rules status.

Parameters:

CLI API

reana-db

REANA database commands.

reana-db [OPTIONS] COMMAND [ARGS]...

alembic

REANA database migration commands.

Note that this command is just a light wrapper around alembic.

reana-db alembic [OPTIONS] COMMAND [ARGS]...
current

Show current database state.

reana-db alembic current [OPTIONS]

Options

-v, --verbose

Use more verbose output.

downgrade

Downgrade REANA database.

reana-db alembic downgrade [OPTIONS] [REVISION]

Options

--sql

Don’t emit SQL to database - dump to standard output/file instead. See alembic docs on offline mode.

--tag <tag>

Arbitrary ‘tag’ name - can be used by custom env.py scripts.

Arguments

REVISION

Optional argument

history

Show REANA database migration recipes history.

reana-db alembic history [OPTIONS]

Options

-r, --rev-range <rev_range>

Specify a revision range; format is [start]:[end].

-v, --verbose

Use more verbose output.

-i, --indicate-current

Indicate the current revision.

init

Populate ‘alembic_version’ table with existing revisions.

reana-db alembic init [OPTIONS]
revision

Create a REANA database alembic revision.

reana-db alembic revision [OPTIONS]

Options

-m, --message <message>

Message string to use with ‘revision’

--autogenerate, --no-autogenerate

Populate revision script with candidate migration operations, based on comparison of database to model

--sql <sql>

Don’t emit SQL to database - dump to standard output/file instead. See alembic docs on offline mode.

--head <head>

Specify head revision or <branchname>@head to base new revision on.

--splice

Allow a non-head revision as the ‘head’ to splice onto.

--branch-label <branch_label>

Specify a branch label to apply to the new revision

--version-path <version_path>

Specify specific path from config for version file

--rev-id <rev_id>

Specify a hardcoded revision id instead of generating one

--depends-on <depends_on>

Specify one or more revision identifiers which this revision should depend on.

upgrade

Upgrade REANA database.

reana-db alembic upgrade [OPTIONS] [REVISION]

Options

--sql

Don’t emit SQL to database - dump to standard output/file instead. See alembic docs on offline mode.

--tag <tag>

Arbitrary ‘tag’ name - can be used by custom env.py scripts.

Arguments

REVISION

Optional argument

init

Show REANA database migration recipes history.

reana-db init [OPTIONS]

quota

REANA DB quota related commands.

reana-db quota [OPTIONS] COMMAND [ARGS]...
create-default-resources

Create default quota resources.

reana-db quota create-default-resources [OPTIONS]
resource-usage-update

Update users disk and CPU quotas.

reana-db quota resource-usage-update [OPTIONS]

Changelog

0.9.4 (2024-03-01)

Code refactoring

Code style

Continuous integration

  • commitlint: addition of commit message linter (#218) (ee0f7e5)

  • commitlint: allow release commit style (#229) (adf15d7)

  • commitlint: check for the presence of concrete PR number (#223) (3d513f6)

  • pytest: move to PostgreSQL 14.10 (#226) (4dac889)

  • release-please: initial configuration (#218) (7c616d6)

  • shellcheck: fix exit code propagation (#223) (b62ee1e)

Documentation

  • authors: complete list of contributors (#227) (3fbcf65)

0.9.3 (2023-12-01)

  • Changes the Workflow table to replace the run_number column with two new columns run_number_major and run_number_minor in order to allow for more than nine restarts of user workflows.

  • Changes the names of database table, column, index and key constraints in order to follow the SQLAlchemy upstream naming conventions everywhere.

  • Changes several database index definitions in order to improve performance of most common database queries.

0.9.2 (2023-09-26)

  • Adds progress meter to the logs of the periodic quota updater.

  • Changes CPU and disk quota calculations to improve the performance of periodic quota updater.

  • Fixes the workflow priority calculation to avoid workflows stuck in the queued status when the number of allowed concurrent workflow is set to zero.

0.9.1 (2023-01-18)

  • Changes to PostgreSQL 12.13.

  • Fixes conversion of possibly-negative resource usage values to human-readable formats.

  • Fixes disk quota updater to prevent setting negative disk quota usage values.

  • Fixes quota updater to reduce memory usage.

0.9.0 (2022-12-13)

  • Adds new launcher_url column to the Workflow table to store the remote origin of workflows submitted via the Launch-on-REANA functionality.

  • Adds the possibility to force resource quota updates irrespective of globally-configured quota update policy.

  • Adds new WorkspaceRetentionRule table to store workspace file retention rules.

  • Adds new WorkspaceRetentionAuditLog table to store the audit log of workspace file retention rule updates.

  • Changes percentage ranges used to calculate the health status of user resource quota usage.

  • Changes to PostgreSQL 12.10.

  • Fixes wrong numbering of restarted workflows by limiting the number of times a workflow can be restarted to nine.

  • Fixes Workflow.get_workspace_disk_usage to always calculate disk usage rather than relying on the quota usage values from the database, since these may not be up-to-date depending on the global quota update policy.

  • Fixes helper function that retrieves workflows by UUID to also additionally check that the provided user is the owner of the workflow.

0.8.2 (2022-02-23)

  • Adds transition for workflow from queued to failed status.

0.8.1 (2022-02-01)

  • Adds an option to periodically calculate CPU quota usage.

  • Changes CLI quota command from disk-usage-update to resource-usage-update since it can also perform CPU quota calculation.

  • Fixes quota update functions to handle exceptional situation as continuable errors.

  • Removes extra QuotaResourceType enum in favor of ResourceType.name.

0.8.0 (2021-11-22)

  • Adds new disk usage retrieval methods using canonical (bytes) and human-readable (KiB) units. (User, Workflow)

  • Adds Quota models which calculates CPU and disk usage.

  • Adds InteractiveSession model.

  • Adds new properties started_at and finished_at to the Job model, updated on status change.

  • Adds get_priority workflow method, that combines both complexity and concurrency, to pass to the scheduler.

  • Adds a possibility to configure database connection pool parameters via environment variables.

  • Adds new pending state to RunStatus table.

  • Adds workflow complexity property in Workflow table.

  • Adds environment variable to configure which quotas to update.

  • Changes WorkflowStatus table to RunStatus.

  • Changes disk quota calculation functions to allow passing raw bytes to increase the used quota.

  • Changes to PostgreSQL 12.8.

  • Removes support for Python 2.

0.7.3 (2021-03-17)

0.7.2 (2021-02-22)

  • Adds utility to status enums to decide whether to clean workflows and jobs depending on their status.

0.7.1 (2021-02-02)

  • Adds support for Python 3.9.

  • Fixes minor code warnings.

  • Changes CI system to include Python flake8 checker.

0.7.0 (2020-10-20)

  • Adds initial central workflow status transition logic handler.

  • Adds new audit table and logic to register actions. (AuditLog, AuditLogAction)

  • Adds fixtures for better testing of database models.

  • Changes user token storage to move tokens from User table to UserToken table and to encrypt them.

  • Changes Workflow table to add a new workspace_path column.

  • Changes default database service to use centrally configured one from REANA-Commons. (REANA_INFRASTRUCTURE_COMPONENTS_HOSTNAMES)

  • Changes code formatting to respect black coding style.

  • Changes documentation to single-page layout.

0.6.0 (2019-12-19)

  • Adds new method which returns full workflow name.

  • Adds more granular DB configuration.

  • Adds Git repository information to the workflow model. (Workflow.git_repo, Workflow.git_provider)

  • Adds user name information to the user model. (User.full_name, User.username)

  • Removes restart count information from the job model. (Job.restart_count, Job.max_restart_count)

  • Adds support for Python 3.8.

0.5.0 (2019-04-16)

  • Introduces new workflow statuses: deleted, stopped, queued.

  • Adds new field to store workflow stopping time. (Workflow.run_stopped_at)

  • Moves workflow input parameters to its own column to separate them from operational options. Adapts getters accordingly. (Workflow.input_parameters)

  • Adds new method to retrieve the workflow owner’s token. (Workflow.get_owner_access_token)

  • Introduces new utility function to retrieve workflows by uuid or name. (_get_workflow_with_uuid_or_name)

  • Introduces new fields for interactive sessions: interactive_session, interactive_session_name and interactive_session_type. Note that with current design only one interactive session per workflow is supported.

  • Adds a new enumeration for possible job statuses. (JobStatus)

  • Adds new field to identify jobs in the underlying compute backend. (Job.backend_job_id)

0.4.0 (2018-11-06)

  • Stores reana.yaml in database models.

  • Adds Workflow specification and parameter getters.

  • Adds support for Python 3.7.

  • Changes license to MIT.

0.3.0 (2018-08-10)

  • This package is a result of refactoring reana-commons.

  • Provides common REANA models.

  • Provides database connection logic.

Contributing

Bug reports, issues, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the REANA code, please:

  1. Search for already reported problems.

  2. Check if the issue has been fixed or is still reproducible on the latest master branch.

  3. Create an issue, ideally with a test case.

If you create a pull request fixing a bug or implementing a feature, you can run the tests to ensure that everything is operating correctly:

$ ./run-tests.sh

Each pull request should preserve or increase code coverage.

License

MIT License

Copyright (C) 2018, 2019, 2020, 2021, 2022, 2023, 2024 CERN.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.

Authors

The list of contributors in alphabetical order: