Package 'daff' reference manual

Title:	Diff, Patch and Merge for Data.frames
Description:	Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff(). The 'V8' package is used to wrap the 'daff.js' 'JavaScript' library which is included in the package.
Authors:	Paul Fitzpatrick [aut] (JavaScript original, http://paulfitz.github.io/daff/), Edwin de Jonge [aut, cre] (R wrapper, <https://orcid.org/0000-0002-6580-4718>), Gregory R. Warnes [aut]
Maintainer:	Edwin de Jonge <[email protected]>
License:	MIT + file LICENSE
Version:	1.1.1
Built:	2025-03-29 04:00:49 UTC
Source:	https://github.com/edwindj/daff

Data diff, patch and merge for R

Description

Daff calculates differences between two data.frames. This difference can be stored and later used to patch the original data. Differences can also be made visual by using render_diff showing what changed.

Details

Storing the difference between data sets allows for tracking or incorporating manual changes to data sets. Ideally changes to data should be scripted to be reproducable, but there are situations or scenario's where this is not possible or happens out of your control. daff can help track these changes.

actions

`diff_data`	Find differences in values between `data.frame`s
`patch_data`	Apply a patch generated with `diff_data` to a `data.frame`
`merge_data`	Merge two diverged `data.frame`s orginating from a same parent

daff.js

Daff wraps the daff.js library which offers more functionality.

Do a data diff

Description

Find differences with a reference data set. The diff can be used to patch_data, to store the difference for documentation purposes using write_diff or to visualize the difference using render_diff

Usage

diff_data(
  data_ref,
  data,
  always_show_header = TRUE,
  always_show_order = FALSE,
  columns_to_ignore = c(),
  count_like_a_spreadsheet = TRUE,
  ids = c(),
  ignore_whitespace = FALSE,
  never_show_order = FALSE,
  ordered = TRUE,
  padding_strategy = c("auto", "smart", "dense", "sparse"),
  show_meta = TRUE,
  show_unchanged = FALSE,
  show_unchanged_columns = FALSE,
  show_unchanged_meta = FALSE,
  unchanged_column_context = 1L,
  unchanged_context = 1L
)
diff_data(
  data_ref,
  data,
  always_show_header = TRUE,
  always_show_order = FALSE,
  columns_to_ignore = c(),
  count_like_a_spreadsheet = TRUE,
  ids = c(),
  ignore_whitespace = FALSE,
  never_show_order = FALSE,
  ordered = TRUE,
  padding_strategy = c("auto", "smart", "dense", "sparse"),
  show_meta = TRUE,
  show_unchanged = FALSE,
  show_unchanged_columns = FALSE,
  show_unchanged_meta = FALSE,
  unchanged_column_context = 1L,
  unchanged_context = 1L
)

Arguments

`data_ref`	`data.frame` reference data frame
`data`	`data.frame` to check for changes
`always_show_header`	`logical` Should we always give a table header in diffs? This defaults to TRUE, and - frankly - you should leave it at TRUE for now.
`always_show_order`	`logical` Diffs for tables where row/column order has been permuted may include an extra row/column specifying the changes in row/column numbers. If you'd like that extra row/column to always be included, turn on this flag, and turn off never_show_order.
`columns_to_ignore`	`character` List of columns to ignore in all calculations. Changes related to these columns are ignored.
`count_like_a_spreadsheet`	`logical` Should column numbers, if present, be rendered spreadsheet-style as A,B,C,...,AA,BB,CC? Defaults to TRUE.
`ids`	`character` List of columns that make up a primary key, if known. Otherwise heuristics are used to find a decent key (or a set of decent keys).
`ignore_whitespace`	`logical` Should whitespace be omitted from comparisons. Defaults to FALSE.
`never_show_order`	`logical` Diffs for tables where row/column order has been permuted may include an extra row/column specifying the changes in row/column numbers. If you'd like to be sure that that row/column is *never included, turn on this flag, and turn off always_show_order.
`ordered`	`logical` Is the order of rows and columns meaningful? Defaults to 'TRUE'.
`padding_strategy`	`logical` Strategy to use when padding columns. Valid values are "auto", "smart", "dense", and "sparse". Leave null for a sensible default.
`show_meta`	`logical` Show changes in column properties, not just data, if available. Defaults to TRUE.
`show_unchanged`	`logical` Should we show all rows in diffs? We default to showing just rows that have changes (and some context rows around them, if row order is meaningful), but you can override this here.
`show_unchanged_columns`	`logical` Should we show all columns in diffs? We default to showing just columns that have changes (and some context columns around them, if column order is meaningful), but you can override this here. Irrespective of this flag, you can rely on index/key columns needed to identify rows to be included in the diff.
`show_unchanged_meta`	`logical` Show all column properties, if available, even if unchanged. Defaults to FALSE.
`unchanged_column_context`	`integer` When showing context columns around a changed column, what is the minimum number of such columns we should show?
`unchanged_context`	`integer` When showing context rows around a changed row, what is the minimum number of such rows we should show?

Value

difference object

Examples

library(daff)
x <- iris
x[1,1] <- 10
diff_data(x, iris)

dd <- diff_data(x, iris)
#write_diff(dd, "diff.csv")
summary(dd)
library(daff)
x <- iris
x[1,1] <- 10
diff_data(x, iris)

dd <- diff_data(x, iris)
#write_diff(dd, "diff.csv")
summary(dd)

differs from,

Description

This is the same function as diff_data but with arguments reversed. This is more useful when using dplyr and magrittr

Usage

differs_from(data, data_ref, ...)
differs_from(data, data_ref, ...)

Arguments

`data`	`data.frame` to check for changes
`data_ref`	`data.frame` reference data frame
`...`	not further specified

Value

difference object

Merge two tables based on a parent version

Description

merge_data provides a three-way merge: suppose two versions are based on a common version, this function will merge tables a and b.

Usage

merge_data(parent, a, b)
merge_data(parent, a, b)

Arguments

`parent`	`data.frame`
`a`	`data.frame` changed version of `parent`
`b`	`data.frame` other changed version of `parent`

Details

If both a and b change the same table cell with a different value, this results in a conflict. In that case a warning will be generated with the number of conflicts. In the returned data.frame of a conflicting merge columns with conflicting values are of type character and contain all three values coded as

(parent) a /// b

Value

merged data.frame. When a merge has conflicts the columns of conflicting changes are of type character and contain all three values.

Examples

parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[2,1] <- 11
# succesful merge
merge_data(parent, a, b)

parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[1,1] <- 11
# conflicting merge (both a and b change same cell)
merged <- merge_data(parent, a, b)
merged #note the conflict

#find out which rows contain a conflict
which_conflicts(merged)
parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[2,1] <- 11
# succesful merge
merge_data(parent, a, b)

parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[1,1] <- 11
# conflicting merge (both a and b change same cell)
merged <- merge_data(parent, a, b)
merged #note the conflict

#find out which rows contain a conflict
which_conflicts(merged)

patch data

Description

Patch data with a diff generated by diff_data

Usage

patch_data(data, patch)
patch_data(data, patch)

Arguments

`data`	`data.frame` that should be patched
`patch`	generated with diff_data

Value

data.frame that has been patched.

Examples

library(daff)
x <- iris
#change a value
x[1,1] <- 1000

patch <- diff_data(iris, x)
print(patch)
# apply patch
iris_patched <- patch_data(iris, patch)

iris_patched[1,1] == 1000
library(daff)
x <- iris
#change a value
x[1,1] <- 1000

patch <- diff_data(iris, x)
print(patch)
# apply patch
iris_patched <- patch_data(iris, patch)

iris_patched[1,1] == 1000

Render a data_diff to html

Description

Converts a diff_data object to HTML code, and opens the resulting HTML code in a browser window if view==TRUE and R is running interactively.

Usage

render_diff(
  diff,
  file = tempfile(fileext = ".html"),
  view = interactive(),
  fragment = FALSE,
  pretty = TRUE,
  title,
  summary = !fragment,
  use.DataTables = !fragment
)
render_diff(
  diff,
  file = tempfile(fileext = ".html"),
  view = interactive(),
  fragment = FALSE,
  pretty = TRUE,
  title,
  summary = !fragment,
  use.DataTables = !fragment
)

Arguments

`diff`	`diff_data object` generated with `diff_data`
`file`	`character` target file (optional)
`view`	`logical` Open the generated HTML in a browser if R is being used interactively
`fragment`	`logical` If `TRUE` generate (just) an HTML table, otherwise generate a valid HTML document.
`pretty`	`logical` Use HTML arrow characters instead of '–>'.
`title`	`character` title text. Defaults to the quoted names of the data objects compared, separated by 'vs.'
`summary`	`logical` Should a summary of changes be shown above the HTML table?
`use.DataTables`	`logical` Include jQuery DataTables plugin and enable: - pagination (10,25,50,100,All) - searching - filtering - column visibility (individually enable/disable) - copy/csv/excel/pdf export buttons - column reorder (drag and drop) - row reorder (drag and drop) - row/multirow select

Value

generated html

Examples

y <- iris[1:3,]
x <- y

x <- head(x,2) # remove a row
x[1,1] <- 10 # change a value
x$hello <- "world"  # add a column
x$Species <- NULL # remove a column

patch <- diff_data(y, x)
render_diff(patch, title="compare x and y", pretty = TRUE)

#apply patch
y_patched <- patch_data(y, patch)
y <- iris[1:3,]
x <- y

x <- head(x,2) # remove a row
x[1,1] <- 10 # change a value
x$hello <- "world"  # add a column
x$Species <- NULL # remove a column

patch <- diff_data(y, x)
render_diff(patch, title="compare x and y", pretty = TRUE)

#apply patch
y_patched <- patch_data(y, patch)

return which rows of a merged `data.frame` contain conflicts

Description

return which rows of a merged data.frame contain conflicts.

Usage

which_conflicts(merged)
which_conflicts(merged)

Arguments

merged

data.frame merged data.frame with possible conflicts.

Value

integer vector with row positions containing conflicts.

Examples

parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[2,1] <- 11
# succesful merge
merge_data(parent, a, b)

parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[1,1] <- 11
# conflicting merge (both a and b change same cell)
merged <- merge_data(parent, a, b)
merged #note the conflict

#find out which rows contain a conflict
which_conflicts(merged)
parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[2,1] <- 11
# succesful merge
merge_data(parent, a, b)

parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[1,1] <- 11
# conflicting merge (both a and b change same cell)
merged <- merge_data(parent, a, b)
merged #note the conflict

#find out which rows contain a conflict
which_conflicts(merged)

Write or read a diff to or from a file

Description

The diff information is stored in the Coopy highlighter diff format: https://paulfitz.github.io/daff-doc/spec.html

Usage

write_diff(diff, file = "diff.csv")

read_diff(file)
write_diff(diff, file = "diff.csv")

read_diff(file)

Arguments

`diff`	generated with diff_data
`file`	filename or connection

Details

Note that type information of the target data.frame is lost when writing a patch to disk. Using a stored diff to patch a data.frame will use the column types of the source data.frame to determine the target column types. New introduced columns may become characters.

Names of the reference and comparison dataset are also lost when writing a data_diff object to disk.

Value

diff object that can be used in patch_data

Package 'daff'

Help Index

Data diff, patch and merge for R

Description

Details

actions

daff.js

Do a data diff

Description

Usage

Arguments

Value

See Also

Examples

differs from,

Description

Usage

Arguments

Value

See Also

Merge two tables based on a parent version

Description

Usage

Arguments

Details

Value

See Also

Examples

patch data

Description

Usage

Arguments

Value

Examples

Render a data_diff to html

Description

Usage

Arguments

Value

See Also

Examples

return which rows of a merged data.frame contain conflicts

Description

Usage

Arguments

Value

See Also

Examples

Write or read a diff to or from a file

Description

Usage

Arguments

Details

Value

return which rows of a merged `data.frame` contain conflicts