Home
This is the technical documentation for the backend of the Visual Edit plugin. Return to the main documentation.
DataEditor
This class provides CRUD methods to edit data from a Dataiku Dataset using the Event Sourcing pattern: edits are stored in a separate Dataset called the editlog. The original Dataset is never changed. Both Datasets are used to compute the edited state of the data.
__init__(original_ds_name, primary_keys, editable_column_names=None, linked_records=None, editschema_manual=None, project_key=None, editschema=None, authorized_users=None)
Initializes Datasets (original and editlog) and properties used for data editing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
original_ds_name |
str
|
The name of the original dataset. |
required |
primary_keys |
list
|
A list of column names that uniquely identify a row in the dataset. |
required |
editable_column_names |
list
|
A list of column names that can be edited. If None, all columns are editable. |
None
|
project_key |
str
|
The key of the project where the dataset is located. If None, the current project is used. |
None
|
authorized_users |
list
|
A list of user identifiers who are authorized to make edits. If None, all users are authorized. |
None
|
linked_records |
list
|
(Optional) A list of LinkedRecord objects that represent linked datasets or dataframes. |
None
|
editschema_manual |
list
|
(Optional) A list of EditSchema objects that define the primary keys and editable columns. |
None
|
editschema |
list
|
(Optional) A list of EditSchema objects that define the primary keys and editable columns. |
None
|
Notes
- If they don't already exist, the editlog, edits and edited Datasets are created on the same Dataiku Connection as the original Dataset. The Recipes in between (replay and apply edits) are also created.
- Edits made via CRUD methods will instantly add rows to the editlog, but the edits and the edited Datasets won't be kept in "sync": they are only updated when the Recipes are run.
create_row(primary_keys, column_values)
Creates a new row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary_keys |
dict
|
A dictionary containing values for all primary keys. The set of values must be unique. Example: {"id": "My new unique id"} |
required |
column_values |
dict
|
A dictionary containing values for all other columns. Example: {"col1": "hey", "col2": 42, "col3": True} |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
A message indicating that the row was created. |
Notes
- No data validation: this method does not check that the values are allowed for the specified columns.
- Attribution of the 'create' action in the editlog: the user identifier is only logged when this method is called in the context of a webapp served by Dataiku (which allows retrieving the identifier from the HTTP request headers sent by the user's web browser).
delete_row(primary_keys)
Deletes a row identified by the given primary key(s).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary_keys |
dict
|
A dictionary containing the primary key(s) value(s) that identify the row to delete. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
EditSuccess | EditFailure | EditUnauthorized
|
A message indicating that the row was deleted. |
Notes
Attribution of the 'delete' action in the editlog: the user identifier is only logged when this method is called in the context of a webapp served by Dataiku (which allows retrieving the identifier from the HTTP request headers sent by the user's web browser).
empty_editlog()
Writes an empty dataframe to the editlog dataset.
get_edited_cells_df()
Returns a pandas DataFrame with the edited cells.
Returns:
Type | Description |
---|---|
DataFrame
|
pandas.DataFrame: A DataFrame containing only the edited rows and editable columns. |
get_edited_cells_df_indexed()
Returns a pandas DataFrame with the edited cells, indexed by the primary keys.
Returns:
Type | Description |
---|---|
DataFrame
|
pandas.DataFrame A DataFrame containing only the edited rows and editable columns, indexed by the primary keys. |
get_edited_df()
Returns the edited dataframe.
Returns:
Type | Description |
---|---|
DataFrame
|
pandas.DataFrame: A DataFrame with all rows and columns from the original data, and edits applied. |
get_edited_df_indexed()
Returns the edited dataframe, indexed by the primary keys.
Returns:
Type | Description |
---|---|
DataFrame
|
pandas.DataFrame: A DataFrame with all rows and columns from the original data, edits applied, and primary keys as index. |
get_editlog_df()
Returns the contents of the editlog.
Returns:
Type | Description |
---|---|
DataFrame
|
pandas.DataFrame: A DataFrame containing the editlog. |
get_original_df()
Returns the original dataframe without any edits.
Returns:
Type | Description |
---|---|
pandas.DataFrame: The original data. |
get_row(primary_keys)
Retrieve a single row from the dataset that was created, updated or deleted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary_keys |
dict
|
A dictionary containing values for all primary keys defined in the initial Visual Edit setup. The set of values must be unique. Example: { "key1": "cat", "key2": "2022-12-21", } |
required |
Returns:
Type | Description |
---|---|
pandas.DataFrame: A single-row dataframe containing the values of editable columns, indexed by the primary keys. Example:
|
Notes
- The current implementation loads all edited rows in memory, then filters the rows that match the provided primary key values.
- This method does not read rows that were not edited, and it does not read columns which are not editable.
- If some rows of the dataset were created, then by definition all columns are editable (including primary keys).
- If no row was created, editable columns are those defined in the initial Visual Edit setup.
update_row(primary_keys, column, value)
Updates a row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary_keys |
dict
|
A dictionary containing primary key(s) value(s) that identify the row to update. |
required |
column |
str
|
The name of the column to update. |
required |
value |
str
|
The value to set for the cell identified by key and column. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
List[EditSuccess | EditFailure | EditUnauthorized]
|
A list of objects indicating the success or failure to insert an editlog. |
Note
- No data validation: this method does not check that the value is allowed for the specified column.
- Attribution of the 'update' action in the editlog: the user identifier is only logged when this method is called in the context of a webapp served by Dataiku (which allows retrieving the identifier from the HTTP request headers sent by the user's web browser).