REANA DB¶
REANA-DB is a component of the REANA reusable analysis platform. It contains REANA database models and utilities.
Features:
database persistence for REANA system
database models and utilities
database upgrades and migrations
Usage¶
The detailed information on how to install and use REANA can be found in docs.reana.io.
Configuration¶
REANA DB configuration.
- reana_db.config.DB_HOST = 'reana-db.default.svc.cluster.local'¶
Database service host.
- reana_db.config.DB_NAME = 'reana'¶
Database name.
- reana_db.config.DB_PASSWORD = 'reana'¶
Database password.
- reana_db.config.DB_PORT = '5432'¶
Database service port.
- reana_db.config.DB_SECRET_KEY = 'reana'¶
Database encryption secret key.
- reana_db.config.DB_USERNAME = 'reana'¶
Database user name.
- reana_db.config.DEFAULT_QUOTA_LIMITS = {'cpu': 0, 'disk': 0}¶
Default CPU (in milliseconds) and disk (in bytes) quota limits.
- reana_db.config.DEFAULT_QUOTA_RESOURCES = {'cpu': 'processing time', 'disk': 'shared storage'}¶
Default quota resources to fill Resource table.
- reana_db.config.PERIODIC_RESOURCE_QUOTA_UPDATE_POLICY = 0¶
Whether to run the periodic (cronjob) resource quota updater.
- reana_db.config.SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://reana:reana@reana-db.default.svc.cluster.local:5432/reana'¶
SQLAlchemy database location.
- reana_db.config.SQLALCHEMY_MAX_OVERFLOW = 2¶
How many new connections can temporarily exceed the pool size?
- reana_db.config.SQLALCHEMY_POOL_PRE_PING = False¶
Do we always pre-ping for pessimistic connection handling?
- reana_db.config.SQLALCHEMY_POOL_RECYCLE = 3600¶
How many seconds a connection can persist?
- reana_db.config.SQLALCHEMY_POOL_SIZE = 5¶
How many permanent connections to the database to keep?
- reana_db.config.SQLALCHEMY_POOL_TIMEOUT = 30¶
How many seconds to wait when retrieving a new connection from the pool?
- reana_db.config.WORKFLOW_TERMINATION_QUOTA_UPDATE_POLICY = []¶
What quota types to update, if not specified all quotas will be calculated, if empty no quotas will be updated.
API¶
Database management¶
Database management for REANA.
Models¶
Models for REANA Components.
- class reana_db.models.CleanUpDependingOnStatusMixin[source]¶
Mixin to determine whether to clean up jobs for REANA status enums.
- class reana_db.models.InteractiveSession(**kwargs)[source]¶
Interactive Session table.
- created¶
- updated¶
- class reana_db.models.InteractiveSessionResource(**kwargs)[source]¶
Interactive Session Resource table.
- created¶
- updated¶
- class reana_db.models.InteractiveSessionType(value)[source]¶
Enumeration of interactive session types.
- class reana_db.models.User(access_token=None, **kwargs)[source]¶
User table.
- access_token¶
REANA active access token value.
- access_token_status¶
REANA most recent access token status.
- active_token¶
REANA active access token object.
- created¶
- get_user_workspace()[source]¶
Build user’s workspace directory path.
- Returns
Path to the user’s workspace directory.
- get_workflow_overload_priority()[source]¶
Get priority factor based on the number of current workflows
running
.
- latest_access_token¶
REANA most recent access token.
- log_action(action, details=None)[source]¶
Create audit log entry for the user.
- Parameters
action (AuditLogAction) – Type of action.
details – JSON field containing action details.
- updated¶
- class reana_db.models.Workflow(id_, name, owner_id, reana_specification, type_, logs='', input_parameters={}, operational_options={}, status=RunStatus.created, complexity=[], git_ref='', git_repo=None, git_provider=None, workspace_path=None, restart=False, run_number=None)[source]¶
Workflow table.
- can_transition_to(next_status)[source]¶
Whether the provided workflow can transition to the next status.
- created¶
- get_complexity_priority(total_cluster_memory)[source]¶
Calculate workflow priority based on its complexity.
- get_priority(cluster_memory)[source]¶
Workflow priority when scheduling it.
Takes into account both the workflow complexity and the number of workflows
running
at a certain time.
- get_workspace_disk_usage(summarize=False, search=None)[source]¶
Retrieve disk usage information of a workspace.
- run_number¶
Property of run_number.
- static update_workflow_status(db_session, workflow_uuid, status, new_logs='', message=None)[source]¶
Update database workflow status.
- Parameters
workflow_uuid – UUID which represents the workflow.
status – String that represents the workflow status.
new_logs – New logs from workflow execution.
message – Unused.
- updated¶
- class reana_db.models.WorkflowResource(**kwargs)[source]¶
Workflow Resource table.
- created¶
- updated¶
Utilities¶
REANA-DB utils.
- reana_db.utils.build_workspace_path(user_id, workflow_id=None, workspace_root_path=None)[source]¶
Build user’s workspace relative path.
- Parameters
user_id – Owner of the workspace.
workflow_id – Optional parameter, if provided gives the path to the workflow workspace instead of just the path to the user workspace.
workspace_root_path – Optional parameter, if provided changes the root path under which the workflow workspaces are stored.
- Returns
String that represents the workspace absolute path. i.e. /var/reana/users/0000/workflows/0034
- reana_db.utils.get_default_quota_resource(resource_type)[source]¶
Get default quota resource by given resource type.
- Parameters
resource_type (reana_db.models.ResourceType) – Resource type corresponding to default resource to get.
- reana_db.utils.get_disk_usage_or_zero(workspace_path) int [source]¶
Get disk usage for the workspace if exists, zero if not.
- reana_db.utils.store_workflow_disk_quota(workflow, bytes_to_sum: Optional[int] = None)[source]¶
Update or create disk workflow resource.
- Parameters
workflow (reana_db.models.Workflow) – Workflow whose disk resource usage must be calculated.
bytes_to_sum (int) – Amount of bytes to sum to workflow disk quota, if None, du will be used to recalculate it.
- reana_db.utils.update_users_cpu_quota(user=None) None [source]¶
Update users CPU quota usage.
- Parameters
user (reana_db.models.User) – User whose CPU quota will be updated. If None, applies to all users.
- reana_db.utils.update_users_disk_quota(user=None, bytes_to_sum: Optional[int] = None) None [source]¶
Update users disk quota usage.
- Parameters
user (reana_db.models.User) – User whose disk quota will be updated. If None, applies to all users.
bytes_to_sum (int) – Amount of bytes to sum to user disk quota, if None, du will be used to recalculate it.
CLI API¶
reana-db¶
REANA database commands.
reana-db [OPTIONS] COMMAND [ARGS]...
alembic¶
REANA database migration commands.
Note that this command is just a light wrapper around alembic.
reana-db alembic [OPTIONS] COMMAND [ARGS]...
current¶
Show current database state.
reana-db alembic current [OPTIONS]
Options
- -v, --verbose¶
Use more verbose output.
downgrade¶
Downgrade REANA database.
reana-db alembic downgrade [OPTIONS] [REVISION]
Options
- --sql¶
Don’t emit SQL to database - dump to standard output/file instead. See alembic docs on offline mode.
- --tag <tag>¶
Arbitrary ‘tag’ name - can be used by custom env.py scripts.
Arguments
- REVISION¶
Optional argument
history¶
Show REANA database migration recipes history.
reana-db alembic history [OPTIONS]
Options
- -r, --rev-range <rev_range>¶
Specify a revision range; format is [start]:[end].
- -v, --verbose¶
Use more verbose output.
- -i, --indicate-current¶
Indicate the current revision.
init¶
Populate ‘alembic_version’ table with existing revisions.
reana-db alembic init [OPTIONS]
revision¶
Create a REANA database alembic revision.
reana-db alembic revision [OPTIONS]
Options
- -m, --message <message>¶
Message string to use with ‘revision’
- --autogenerate, --no-autogenerate¶
Populate revision script with candidate migration operations, based on comparison of database to model
- --sql <sql>¶
Don’t emit SQL to database - dump to standard output/file instead. See alembic docs on offline mode.
- --head <head>¶
Specify head revision or <branchname>@head to base new revision on.
- --splice¶
Allow a non-head revision as the ‘head’ to splice onto.
- --branch-label <branch_label>¶
Specify a branch label to apply to the new revision
- --version-path <version_path>¶
Specify specific path from config for version file
- --rev-id <rev_id>¶
Specify a hardcoded revision id instead of generating one
- --depends-on <depends_on>¶
Specify one or more revision identifiers which this revision should depend on.
upgrade¶
Upgrade REANA database.
reana-db alembic upgrade [OPTIONS] [REVISION]
Options
- --sql¶
Don’t emit SQL to database - dump to standard output/file instead. See alembic docs on offline mode.
- --tag <tag>¶
Arbitrary ‘tag’ name - can be used by custom env.py scripts.
Arguments
- REVISION¶
Optional argument
init¶
Show REANA database migration recipes history.
reana-db init [OPTIONS]
quota¶
REANA DB quota related commands.
reana-db quota [OPTIONS] COMMAND [ARGS]...
create-default-resources¶
Create default quota resources.
reana-db quota create-default-resources [OPTIONS]
resource-usage-update¶
Update users disk and CPU quotas.
reana-db quota resource-usage-update [OPTIONS]
Changes¶
Version 0.8.1 (2022-02-01)¶
Adds an option to periodically calculate CPU quota usage.
Changes CLI quota command from
disk-usage-update
toresource-usage-update
since it can also perform CPU quota calculation.Fixes quota update functions to handle exceptional situation as continuable errors.
Removes extra
QuotaResourceType
enum in favor ofResourceType.name
.
Version 0.8.0 (2021-11-22)¶
Adds new disk usage retrieval methods using canonical (bytes) and human-readable (KiB) units. (
User
,Workflow
)Adds Quota models which calculates CPU and disk usage.
Adds
InteractiveSession
model.Adds new properties
started_at
andfinished_at
to theJob
model, updated on status change.Adds
get_priority
workflow method, that combines both complexity and concurrency, to pass to the scheduler.Adds a possibility to configure database connection pool parameters via environment variables.
Adds new
pending
state toRunStatus
table.Adds workflow complexity property in
Workflow
table.Adds environment variable to configure which quotas to update.
Changes
WorkflowStatus
table toRunStatus
.Changes disk quota calculation functions to allow passing raw bytes to increase the used quota.
Changes to PostgreSQL 12.8.
Removes support for Python 2.
Version 0.7.3 (2021-03-17)¶
Fixes REANA installation by pinning SQLAlchemy version less than 1.4.0 due to https://github.com/kvesteri/sqlalchemy-utils/issues/505.
Version 0.7.2 (2021-02-22)¶
Adds utility to status enums to decide whether to clean workflows and jobs depending on their status.
Version 0.7.1 (2021-02-02)¶
Adds support for Python 3.9.
Fixes minor code warnings.
Changes CI system to include Python flake8 checker.
Version 0.7.0 (2020-10-20)¶
Adds initial central workflow status transition logic handler.
Adds new audit table and logic to register actions. (
AuditLog
,AuditLogAction
)Adds fixtures for better testing of database models.
Changes user token storage to move tokens from
User
table toUserToken
table and to encrypt them.Changes
Workflow
table to add a newworkspace_path
column.Changes default database service to use centrally configured one from REANA-Commons. (
REANA_INFRASTRUCTURE_COMPONENTS_HOSTNAMES
)Changes code formatting to respect
black
coding style.Changes documentation to single-page layout.
Version 0.6.0 (2019-12-19)¶
Adds new method which returns full workflow name.
Adds more granular DB configuration.
Adds Git repository information to the workflow model. (
Workflow.git_repo
,Workflow.git_provider
)Adds user name information to the user model. (
User.full_name
,User.username
)Removes restart count information from the job model. (
Job.restart_count
,Job.max_restart_count
)Adds support for Python 3.8.
Version 0.5.0 (2019-04-16)¶
Introduces new workflow statuses:
deleted
,stopped
,queued
.Adds new field to store workflow stopping time. (
Workflow.run_stopped_at
)Moves workflow input parameters to its own column to separate them from operational options. Adapts getters accordingly. (
Workflow.input_parameters
)Adds new method to retrieve the workflow owner’s token. (
Workflow.get_owner_access_token
)Introduces new utility function to retrieve workflows by
uuid
or name. (_get_workflow_with_uuid_or_name
)Introduces new fields for interactive sessions:
interactive_session
,interactive_session_name
andinteractive_session_type
. Note that with current design only one interactive session per workflow is supported.Adds a new enumeration for possible job statuses. (
JobStatus
)Adds new field to identify jobs in the underlying compute backend. (
Job.backend_job_id
)
Version 0.4.0 (2018-11-06)¶
Stores
reana.yaml
in database models.Adds Workflow specification and parameter getters.
Adds support for Python 3.7.
Changes license to MIT.
Version 0.3.0 (2018-08-10)¶
This package is a result of refactoring reana-commons.
Provides common REANA models.
Provides database connection logic.
Please beware
Please note that REANA is in an early alpha stage of its development. The developer preview releases are meant for early adopters and testers. Please don’t rely on released versions for any production purposes yet.
Contributing¶
Bug reports, issues, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the REANA code, please:
Search for already reported problems.
Check if the issue has been fixed or is still reproducible on the latest master branch.
Create an issue, ideally with a test case.
If you create a pull request fixing a bug or implementing a feature, you can run the tests to ensure that everything is operating correctly:
$ ./run-tests.sh
Each pull request should preserve or increase code coverage.
License¶
MIT License
Copyright (C) 2018, 2019, 2020, 2021 CERN.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.