Skip to content

Home

This is the technical documentation for the backend of the Visual Edit plugin. Return to the main documentation.

DataEditor

This class provides CRUD methods to edit data from a Dataiku Dataset using the Event Sourcing pattern: edits are stored in a separate Dataset called the editlog. The original Dataset is never changed. Both Datasets are used to compute the edited state of the data.

__init__(original_ds_name, primary_keys, editable_column_names=None, linked_records=None, editschema_manual=None, project_key=None, editschema=None, authorized_users=None)

Initializes Datasets (original and editlog) and properties used for data editing.

Parameters:

Name Type Description Default
original_ds_name str

The name of the original dataset.

required
primary_keys list

A list of column names that uniquely identify a row in the dataset.

required
editable_column_names list

A list of column names that can be edited. If None, all columns are editable.

None
project_key str

The key of the project where the dataset is located. If None, the current project is used.

None
authorized_users list

A list of user identifiers who are authorized to make edits. If None, all users are authorized.

None
linked_records list

(Optional) A list of LinkedRecord objects that represent linked datasets or dataframes.

None
editschema_manual list

(Optional) A list of EditSchema objects that define the primary keys and editable columns.

None
editschema list

(Optional) A list of EditSchema objects that define the primary keys and editable columns.

None
Notes
  • If they don't already exist, the editlog, edits and edited Datasets are created on the same Dataiku Connection as the original Dataset. The Recipes in between (replay and apply edits) are also created.
  • Edits made via CRUD methods will instantly add rows to the editlog, but the edits and the edited Datasets won't be kept in "sync": they are only updated when the Recipes are run.

create_row(primary_keys, column_values)

Creates a new row.

Parameters:

Name Type Description Default
primary_keys dict

A dictionary containing values for all primary keys. The set of values must be unique. Example: {"id": "My new unique id"}

required
column_values dict

A dictionary containing values for all other columns. Example: {"col1": "hey", "col2": 42, "col3": True}

required

Returns:

Name Type Description
str str

A message indicating that the row was created.

Notes
  • No data validation: this method does not check that the values are allowed for the specified columns.
  • Attribution of the 'create' action in the editlog: the user identifier is only logged when this method is called in the context of a webapp served by Dataiku (which allows retrieving the identifier from the HTTP request headers sent by the user's web browser).

delete_row(primary_keys)

Deletes a row identified by the given primary key(s).

Parameters:

Name Type Description Default
primary_keys dict

A dictionary containing the primary key(s) value(s) that identify the row to delete.

required

Returns:

Name Type Description
str EditSuccess | EditFailure | EditUnauthorized

A message indicating that the row was deleted.

Notes

Attribution of the 'delete' action in the editlog: the user identifier is only logged when this method is called in the context of a webapp served by Dataiku (which allows retrieving the identifier from the HTTP request headers sent by the user's web browser).

empty_editlog()

Writes an empty dataframe to the editlog dataset.

get_edited_cells_df()

Returns a pandas DataFrame with the edited cells.

Returns:

Type Description
DataFrame

pandas.DataFrame: A DataFrame containing only the edited rows and editable columns.

get_edited_cells_df_indexed()

Returns a pandas DataFrame with the edited cells, indexed by the primary keys.

Returns:

Type Description
DataFrame

pandas.DataFrame A DataFrame containing only the edited rows and editable columns, indexed by the primary keys.

get_edited_df()

Returns the edited dataframe.

Returns:

Type Description
DataFrame

pandas.DataFrame: A DataFrame with all rows and columns from the original data, and edits applied.

get_edited_df_indexed()

Returns the edited dataframe, indexed by the primary keys.

Returns:

Type Description
DataFrame

pandas.DataFrame: A DataFrame with all rows and columns from the original data, edits applied, and primary keys as index.

get_editlog_df()

Returns the contents of the editlog.

Returns:

Type Description
DataFrame

pandas.DataFrame: A DataFrame containing the editlog.

get_original_df()

Returns the original dataframe without any edits.

Returns:

Type Description

pandas.DataFrame: The original data.

get_row(primary_keys)

Retrieve a single row from the dataset that was created, updated or deleted.

Parameters:

Name Type Description Default
primary_keys dict

A dictionary containing values for all primary keys defined in the initial Visual Edit setup. The set of values must be unique. Example: { "key1": "cat", "key2": "2022-12-21", }

required

Returns:

Type Description

pandas.DataFrame: A single-row dataframe containing the values of editable columns, indexed by the primary keys. Example:

key1        key2        editable_column1    editable_column2
"cat"       2022-12-21  "hello"             42
Notes
  • The current implementation loads all edited rows in memory, then filters the rows that match the provided primary key values.
  • This method does not read rows that were not edited, and it does not read columns which are not editable.
    • If some rows of the dataset were created, then by definition all columns are editable (including primary keys).
    • If no row was created, editable columns are those defined in the initial Visual Edit setup.

update_row(primary_keys, column, value)

Updates a row.

Parameters:

Name Type Description Default
primary_keys dict

A dictionary containing primary key(s) value(s) that identify the row to update.

required
column str

The name of the column to update.

required
value str

The value to set for the cell identified by key and column.

required

Returns:

Name Type Description
list List[EditSuccess | EditFailure | EditUnauthorized]

A list of objects indicating the success or failure to insert an editlog.

Note
  • No data validation: this method does not check that the value is allowed for the specified column.
  • Attribution of the 'update' action in the editlog: the user identifier is only logged when this method is called in the context of a webapp served by Dataiku (which allows retrieving the identifier from the HTTP request headers sent by the user's web browser).