dss-visual-edit

Validating machine-generated data

Use case description

While the initial How to Use guide explains how to use the Visual Edit plugin for tactical corrections of source data, this guide focuses on validating machine-generated data.

We want business users to validate and make corrections as needed, based on their domain expertise. The human-reviewed, machine-generated data would typically be used for mass corrections or enrichment of source data, or as input to an operational process.

Machine-generated data would be stored in the output dataset of an existing data pipeline. Each row would correspond to an item to validate. Columns would include:

primary keys;
machine-generated columns, whose values would change if the pipeline or its algorithms change;
display-only columns, whose values would help the end-user figure out how to validate/edit/provide feedback.

Instead of exporting this dataset to Excel, we want end-users to access a web interface to validate and correct the data. In addition to the above columns, we would want 2 feedback columns: one to mark rows as valid (via checkboxes) and one to write comments.

Special behavior of the validation column

Validation columns are used to indicate that a human saw what the machine did for a given row, and had the opportunity to make corrections or to fill in missing values if needed.

The webapp’s backend implements special behavior when a cell from a column named “Validated” or “Reviewed” is edited: values of all editable columns from the same row are logged (even if they weren’t edited). This allows the editlog to include not just the information that the row is valid, but also to record the actual values that were validated. This is particularly useful when those values were generated by an algorithm, because they may change if the algorithm changes.

As a result, there will be no missing value in the machine-generated and human-reviewed columns that are present in the edits dataset, for rows marked as valid.

How-to

You must be familiar with the initial How to Use guide before following the steps below.

Add feedback columns to the dataset to review: this can be done via code in the existing data pipeline, or with an additional Prepare recipe, as columns with missing values to serve as placeholders in the webapp.
When creating a Visual Edit webapp: make sure to select all machine-generated columns and feedback columns as editable.
When using the webapp: you would review values in generated columns (mark as valid, or edit values and add notes when necessary) and fill in missing values.

Build a complete application to test with end-users
Deploy to production: this will require collaboration and testing with IT, to propagate the contents of the edits dataset to other systems (columns of this dataset include primary keys, human-reviewed values, a boolean validation column, and additional human feedback columns).

dss-visual-edit

Validating machine-generated data

Use case description

Special behavior of the validation column

How-to

Next