Title: | Diff, Patch and Merge for Data.frames |
---|---|
Description: | Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff(). The 'V8' package is used to wrap the 'daff.js' 'JavaScript' library which is included in the package. |
Authors: | Paul Fitzpatrick [aut] (JavaScript original, http://paulfitz.github.io/daff/), Edwin de Jonge [aut, cre] (R wrapper, <https://orcid.org/0000-0002-6580-4718>), Gregory R. Warnes [aut] |
Maintainer: | Edwin de Jonge <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.1 |
Built: | 2024-11-29 05:24:13 UTC |
Source: | https://github.com/edwindj/daff |
Daff calculates differences between two data.frame
s. This difference can be stored and later used to
patch the original data. Differences can also be made visual by using render_diff
showing what changed.
Storing the difference between data sets allows for tracking or incorporating manual changes to data sets.
Ideally changes to data should be scripted to be reproducable, but there are situations or scenario's where
this is not possible or happens out of your control. daff
can help track these changes.
diff_data |
Find differences in values between data.frame s |
patch_data |
Apply a patch generated with diff_data to a data.frame
|
merge_data |
Merge two diverged data.frame s orginating from a same parent
|
Daff wraps the daff.js library which offers more functionality.
Find differences with a reference data set. The diff can be used to patch_data
, to store the difference
for documentation purposes using write_diff
or to visualize the difference using render_diff
diff_data( data_ref, data, always_show_header = TRUE, always_show_order = FALSE, columns_to_ignore = c(), count_like_a_spreadsheet = TRUE, ids = c(), ignore_whitespace = FALSE, never_show_order = FALSE, ordered = TRUE, padding_strategy = c("auto", "smart", "dense", "sparse"), show_meta = TRUE, show_unchanged = FALSE, show_unchanged_columns = FALSE, show_unchanged_meta = FALSE, unchanged_column_context = 1L, unchanged_context = 1L )
diff_data( data_ref, data, always_show_header = TRUE, always_show_order = FALSE, columns_to_ignore = c(), count_like_a_spreadsheet = TRUE, ids = c(), ignore_whitespace = FALSE, never_show_order = FALSE, ordered = TRUE, padding_strategy = c("auto", "smart", "dense", "sparse"), show_meta = TRUE, show_unchanged = FALSE, show_unchanged_columns = FALSE, show_unchanged_meta = FALSE, unchanged_column_context = 1L, unchanged_context = 1L )
data_ref |
|
data |
|
always_show_header |
|
always_show_order |
|
columns_to_ignore |
|
count_like_a_spreadsheet |
|
ids |
|
ignore_whitespace |
|
never_show_order |
|
ordered |
|
padding_strategy |
|
show_meta |
|
show_unchanged |
|
show_unchanged_columns |
|
show_unchanged_meta |
|
unchanged_column_context |
|
unchanged_context |
|
difference object
differs_from
library(daff) x <- iris x[1,1] <- 10 diff_data(x, iris) dd <- diff_data(x, iris) #write_diff(dd, "diff.csv") summary(dd)
library(daff) x <- iris x[1,1] <- 10 diff_data(x, iris) dd <- diff_data(x, iris) #write_diff(dd, "diff.csv") summary(dd)
This is the same function as diff_data
but with arguments
reversed. This is more useful when using dplyr
and magrittr
differs_from(data, data_ref, ...)
differs_from(data, data_ref, ...)
data |
|
data_ref |
|
... |
not further specified |
difference object
diff_data
merge_data
provides a three-way merge: suppose two versions are based on a common
version, this function will merge tables a
and b
.
merge_data(parent, a, b)
merge_data(parent, a, b)
parent |
|
a |
|
b |
|
If both a
and b
change the same table cell with a different value, this results in a
conflict. In that case a warning will be generated with the number of conflicts.
In the returned data.frame
of a conflicting merge columns with conflicting values are of type
character
and contain all three values coded as
(parent) a /// b
merged data.frame
. When a merge has conflicts the columns of conflicting changes
are of type character
and contain all three values.
parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[2,1] <- 11 # succesful merge merge_data(parent, a, b) parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[1,1] <- 11 # conflicting merge (both a and b change same cell) merged <- merge_data(parent, a, b) merged #note the conflict #find out which rows contain a conflict which_conflicts(merged)
parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[2,1] <- 11 # succesful merge merge_data(parent, a, b) parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[1,1] <- 11 # conflicting merge (both a and b change same cell) merged <- merge_data(parent, a, b) merged #note the conflict #find out which rows contain a conflict which_conflicts(merged)
Patch data with a diff generated by diff_data
patch_data(data, patch)
patch_data(data, patch)
data |
|
patch |
generated with diff_data |
data.frame
that has been patched.
library(daff) x <- iris #change a value x[1,1] <- 1000 patch <- diff_data(iris, x) print(patch) # apply patch iris_patched <- patch_data(iris, patch) iris_patched[1,1] == 1000
library(daff) x <- iris #change a value x[1,1] <- 1000 patch <- diff_data(iris, x) print(patch) # apply patch iris_patched <- patch_data(iris, patch) iris_patched[1,1] == 1000
Converts a diff_data object to HTML code, and opens the resulting HTML code
in a browser window if view==TRUE
and R is running interactively.
render_diff( diff, file = tempfile(fileext = ".html"), view = interactive(), fragment = FALSE, pretty = TRUE, title, summary = !fragment, use.DataTables = !fragment )
render_diff( diff, file = tempfile(fileext = ".html"), view = interactive(), fragment = FALSE, pretty = TRUE, title, summary = !fragment, use.DataTables = !fragment )
diff |
|
file |
|
view |
|
fragment |
|
pretty |
|
title |
|
summary |
|
use.DataTables |
|
generated html
data_diff
y <- iris[1:3,] x <- y x <- head(x,2) # remove a row x[1,1] <- 10 # change a value x$hello <- "world" # add a column x$Species <- NULL # remove a column patch <- diff_data(y, x) render_diff(patch, title="compare x and y", pretty = TRUE) #apply patch y_patched <- patch_data(y, patch)
y <- iris[1:3,] x <- y x <- head(x,2) # remove a row x[1,1] <- 10 # change a value x$hello <- "world" # add a column x$Species <- NULL # remove a column patch <- diff_data(y, x) render_diff(patch, title="compare x and y", pretty = TRUE) #apply patch y_patched <- patch_data(y, patch)
data.frame
contain conflictsreturn which rows of a merged data.frame
contain conflicts.
which_conflicts(merged)
which_conflicts(merged)
merged |
|
integer
vector with row positions containing conflicts.
parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[2,1] <- 11 # succesful merge merge_data(parent, a, b) parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[1,1] <- 11 # conflicting merge (both a and b change same cell) merged <- merge_data(parent, a, b) merged #note the conflict #find out which rows contain a conflict which_conflicts(merged)
parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[2,1] <- 11 # succesful merge merge_data(parent, a, b) parent <- a <- b <- iris[1:3,] a[1,1] <- 10 b[1,1] <- 11 # conflicting merge (both a and b change same cell) merged <- merge_data(parent, a, b) merged #note the conflict #find out which rows contain a conflict which_conflicts(merged)
The diff information is stored in the Coopy highlighter diff format: https://paulfitz.github.io/daff-doc/spec.html
write_diff(diff, file = "diff.csv") read_diff(file)
write_diff(diff, file = "diff.csv") read_diff(file)
diff |
generated with diff_data |
file |
filename or connection |
Note that type information of the target data.frame is lost when writing a patch to disk.
Using a stored diff to patch a data.frame
will use the column types of the source
data.frame
to determine the target column types. New introduced columns may become characters
.
Names of the reference and comparison dataset are also lost when writing a data_diff object to disk.
diff object that can be used in patch_data