Scripts

A script is a description of how to process data. Scripts can depend on other scripts which create pipelines of processes.

Adding a Script to a Project

To add a script to a project, you need to create a new script with a unique label within the project, set the program command to execute the script and path to the script. The path can be an absolute or relative path. A relative path is with respect to the script directory. In the case of a relative path, the script needs to be copied to the projects scripts directory.

An example of adding a script to a project,

script = project.new_script('import scan')
script.set_program('python')
script.set_script('import_scan.py')

Script Dependencies

A dependency on another script can be added using the add_dependency function with the label of the parent script. For example, this segment script depends on the  pretend import script,

script = project.new_script('segment')
script.set_program('python')
script.set_script('segment.py')
script.add_dependency('pretend_import')

In this case, on completion of any process that is spawned from the import scan script, the segment script will spawn a process with a relationship to the parent process. This allow pipelining of processes and the automatic recording and tracing of data processing.

Updating Scripts from Sources

Instead of manually copying a script to the projects script directory, the source of the script can be defined using the add_source function and the update_from_source function can be called to copy the script into the projects scripts directory. For example,

script.add_source('/home/user/my_scripts/import_scan.py')
script.update_from_source()

A script can have multiple source files, for example, supporting modules. Currently, this feature only supports copy updates but is intended to include http and versioned repository type sources. In all cases, a version can be associated with the script and processes spawned from the script such that the processes that create data can be traceable. This means the exact processes that created some data can be traced back to its origin.

Running a Script

This script can be run manually or automatically by a cron job or the WM project due to a dependency. Here is an example of how to manually run this script:

project = workflow_manager.Project(project_name)
project.run_script('import_zipped_scan', {'filepath':zippath})

First, load the project then run the run_script function which takes the script label (‘import_zipped_scan’) and script arguments as inputs. This will create a process that executes the script in the project environment.

Example Script

Here is an example of the Import scan script:

import sys
import workflow_manager

def run(project_name, process_id):

   # Open project
   project = workflow_manager.Project(project_name)

   # Get process and arguments
   process = project.get_process(process_id)
   filepath = process.arguments.get('filepath')

   # Create or load workspace
   workspace = process.get_workspace('scan')

   # Extract zip file into workspace
   status, message = workspace.extract_zipfile(filepath)
   process.completed(status, message) ### REQUIRED ###

if __name__ == "__main__":
       project_name = sys.argv[-2]
       process_id = sys.argv[-1]
       run(project_name, process_id)

When this script is run, it requires the project name and the process id that is running this script. Given these arguments:

  1. the project is opened

  2. the process running the script is loaded

  3. the path to the zip file, the filepath argument is retrieved from the process

  4. a workspace called scan (required) is created or retrieved if it exists

  5. the zip file is extracted into the workspace

  6. and the status of the process is returned to the process - this will spawn any dependent scripts.

Notice the example script has a run function that is called from a if __name__ == “__main__”: expression. This allows a script to be called from the command line and, in the future, by importing the script as a module. So it is good practice to write your scripts to include a run function, i.e., in the form above.