dss-visual-edit

Deploying to production

Preliminary remarks

Overview of steps

Initial deployment:

All deployments:

Demo videos:

Initializing the production editlog

Simple procedure

A simple way to initialize the editlog is with a Reset Edits scenario as described in the getting started guide. You can then delete the scenario, to make sure that it won’t be used accidentally in the future (which would cause losing all edits).

Secure procedure

For audit purposes, we strongly encourage to follow this procedure.

In the context of audit, it is especially critical to make sure no tampering is possible. Three layers of security can be put in place for this:

We illustrate with SQL code for a Postgresql database.

CREATE TABLE "editlog" (
    "key" VARCHAR(255),
    "column_name" VARCHAR(255), --set the length according to the expected size of values.
    "value" VARCHAR(255), --set the length according to the size of the columns.
    "date" VARCHAR(255),
    "user" VARCHAR(255),
    "action" VARCHAR(255)
);

CREATE FUNCTION set_timestamp_id()
RETURNS TRIGGER AS $$
BEGIN
    NEW."date" = to_char (CURRENT_TIMESTAMP::timestamp at time zone 'UTC', 'YYYY-MM-DD"T"HH24:MI:SS.US+00:00');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_timestamp_id_trigger
BEFORE INSERT ON editlog
FOR EACH ROW EXECUTE PROCEDURE set_timestamp_id();
REVOKE ALL ON TABLE editlog FROM public;
GRANT SELECT, INSERT ON TABLE editlog TO your_user;
CREATE FUNCTION prevent_updates()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
        RAISE EXCEPTION 'Table is append-only. Updates and deletes are not allowed.';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER prevent_updates_trigger
BEFORE UPDATE OR DELETE ON editlog
FOR EACH ROW EXECUTE PROCEDURE prevent_updates();

Security considerations

Protecting the editlog from interference

Restricting access to the webapp

As any Dataiku webapp, it can either require authentication, or it can be made accessible to visitors who are not logged into Dataiku. We do not recommend the latter option, as anyone who has access to the webapp will be allowed to see the data that it exposes and to make edits. If, however, you do want to make the webapp accessible to unauthenticated visitors, their edits will be attributed to user “none” in the editlog.

When authentication is required:

Remember that the webapp only writes data to the editlog, not to the original dataset (which stays unchanged). The edits and the edited datasets can only be changed by running the recipes that build them.

Tips for production usage

When sharing the production webapp with business users, it’s a good idea to tick “auto-start backend” in webapp settings, which will help make sure that the webapp is always available. In an effort to be conscious of your Dataiku instance’s resources, we recommend running the webapp in containerized execution when available.

You can share a direct link to the webapp (or a dashboard that embeds it), but it can be easier for end-users to find it again in the future if it’s added to a Workspace. Workspaces provide a friendlier way for business users to access all objects shared with them in Dataiku. If your end-users don’t already use a Workspace, we recommend creating one for them and having them use Workspaces as their Dataiku home page.

If a domain-specific issue is detected downstream of edits, review the editlog to understand its root cause. If a cell was incorrectly edited, edit it again via the webapp, which will log the correct value; you can then build datasets downstream.