Title: | Statistical Disclosure Control for Spatial Data |
---|---|
Description: | Privacy protected raster maps can be created from spatial point data. Protection methods include smoothing of dichotomous variables by de Jonge and de Wolf (2016) <doi:10.1007/978-3-319-45381-1_9>, continuous variables by de Wolf and de Jonge (2018) <doi:10.1007/978-3-319-99771-1_23>, suppressing revealing values and a generalization of the quad tree method by Suñé, Rovira, Ibáñez and Farré (2017) <doi:10.2901/EUROSTAT.C2017.001>. |
Authors: | Edwin de Jonge [aut, cre] , Peter-Paul de Wolf [aut], Douwe Hut [ctb], Sapphire Han [ctb] |
Maintainer: | Edwin de Jonge <[email protected]> |
License: | GPL-2 |
Version: | 0.6.0.9000 |
Built: | 2024-10-27 04:34:52 UTC |
Source: | https://github.com/edwindj/sdcSpatial |
sdcSpatial
contains functions to create spatial distribution maps,
assess the risk of disclosure on a location and to suppress or adjust
revealing values at certain locations.
sdcSpatial
working horse is the sdc_raster()
object upon which the
following methods can be applied:
sum
, extract the sum
layer from a sdc_raster
object
mean
, extract the mean
layer from a sdc_raster
object
Maintainer: Edwin de Jonge [email protected] (ORCID)
Authors:
Peter-Paul de Wolf
Other contributors:
Douwe Hut [contributor]
Sapphire Han [contributor]
de Jonge, E., & de Wolf, P. P. (2016, September). Spatial smoothing and statistical disclosure control. In International Conference on Privacy in Statistical Databases (pp. 107-117). Springer, Cham.
de Wolf, P. P., & de Jonge, E. (2018, September). Safely Plotting Continuous Variables on a Map. In International Conference on Privacy in Statistical Databases (pp. 347-359). Springer, Cham.
Suñé, E., Rovira, C., Ibáñez, D., Farré, M. (2017). Statistical disclosure control on visualising geocoded population data using a structure in quadtrees, NTTS 2017
Useful links:
The disclosure risk function is used by is_sensitive()
to determine the risk of
a raster
cell. It returns a score between 0 and 1 for cells that have a finite
value (otherwise NA
).
disclosure_risk(x, risk_type = x$risk_type)
disclosure_risk(x, risk_type = x$risk_type)
x |
|
risk_type |
|
Different risk functions include:
external (numeric variable), calculates how much the largest value comprises the total sum within a cell
internal (numeric variable), calculates how much the largest value comprises the sum without the second largest value
discrete (logical variable), calculates the fraction of TRUE
vs FALSE
raster::raster object with the disclosure risk.
Other sensitive:
is_sensitive_at()
,
is_sensitive()
,
plot_sensitive()
,
remove_sensitive()
,
sdc_raster()
,
sensitivity_score()
The data are generated with residence/household locations from the Dutch open data BAG register. The locations are realistic, but the associated data is simulated.
dwellings
dwellings
a data.frame
with 90603 rows and 4 columns.
integer, x coordinate of dwelling (crs 28992)
integer, y coordinate of dwelling (crs 28992)
numeric, simulated continuous value
logical, simulated discrete value
Basisregistratie Adressen en Gebouwen https://www.kadaster.nl/zakelijk/registraties/basisregistraties/bag/bag-producten
# dwellings is a data.frame, the best way is to first turn it # into a sf or sp object. # create an sf object from our data if (requireNamespace("sf")){ dwellings_sf <- sf::st_as_sf(dwellings, coords=c("x", "y"), crs=28992) unemployed <- sdc_raster( dwellings_sf , "unemployed" , r=200 , max_risk = 0.9 ) plot(unemployed) sensitivity_score(unemployed) unemployed_smoothed <- protect_smooth(unemployed, bw = 0.4e3) plot(unemployed_smoothed, main="Employment rate") plot(unemployed_smoothed, "sum", main = "Employment") } else { message("Package 'sf' was not installed.") } dwellings_sp <- dwellings # or change a data.frame into a sp object sp::coordinates(dwellings_sp) <- ~ x + y tryCatch( # not working on some OS versions. sp::proj4string(dwellings_sp) <- "+init=epsg:28992" ) consumption <- sdc_raster(dwellings_sp, dwellings_sp$consumption, r = 500) consumption plot(consumption) # but we can also create a raster directly from a data.frame unemployed <- sdc_raster( dwellings[c("x","y")], dwellings$unemployed)
# dwellings is a data.frame, the best way is to first turn it # into a sf or sp object. # create an sf object from our data if (requireNamespace("sf")){ dwellings_sf <- sf::st_as_sf(dwellings, coords=c("x", "y"), crs=28992) unemployed <- sdc_raster( dwellings_sf , "unemployed" , r=200 , max_risk = 0.9 ) plot(unemployed) sensitivity_score(unemployed) unemployed_smoothed <- protect_smooth(unemployed, bw = 0.4e3) plot(unemployed_smoothed, main="Employment rate") plot(unemployed_smoothed, "sum", main = "Employment") } else { message("Package 'sf' was not installed.") } dwellings_sp <- dwellings # or change a data.frame into a sp object sp::coordinates(dwellings_sp) <- ~ x + y tryCatch( # not working on some OS versions. sp::proj4string(dwellings_sp) <- "+init=epsg:28992" ) consumption <- sdc_raster(dwellings_sp, dwellings_sp$consumption, r = 500) consumption plot(consumption) # but we can also create a raster directly from a data.frame unemployed <- sdc_raster( dwellings[c("x","y")], dwellings$unemployed)
enterprises
is generated from the dutch open data
BAG register.
The locations are realistic, but the associated data is simulated.
enterprises
enterprises
An object of class SpatialPointsDataFrame
with 8348 rows and 2 columns.
numeric
simulated production (lognormal).
logical simulated variable if an enterprise is fined or not.
Basisregistratie Adressen en Gebouwen: https://www.kadaster.nl/zakelijk/registraties/basisregistraties/bag/bag-producten
library(sdcSpatial) library(raster) data("enterprises") production <- sdc_raster(enterprises, "production", min_count = 10) print(production) # show the average production per cell plot(production, "mean") production$min_count <- 2 # adjust norm for sdc plot(production) production_safe <- remove_sensitive(production) plot(production_safe)
library(sdcSpatial) library(raster) data("enterprises") production <- sdc_raster(enterprises, "production", min_count = 10) print(production) # show the average production per cell plot(production, "mean") production$min_count <- 2 # adjust norm for sdc plot(production) production_safe <- remove_sensitive(production) plot(production_safe)
Create a binary raster with sensitive locations.
is_sensitive( x, max_risk = x$max_risk, min_count = x$min_count, risk_type = x$risk_type )
is_sensitive( x, max_risk = x$max_risk, min_count = x$min_count, risk_type = x$risk_type )
x |
|
max_risk |
a risk value higher than |
min_count |
a count lower than |
risk_type |
what kind of measure should be used (see details). |
By default the risk settings are taken from x
, but they can be overriden.
Different risk functions can be used:
external (numeric variable), calculates how much the largest value comprises the total sum
internal (numeric variable), calculates how much the largest value comprises the sum without the second largest value
discrete (logical variable), calculates the fraction of sensitive values.
Other sensitive:
disclosure_risk()
,
is_sensitive_at()
,
plot_sensitive()
,
remove_sensitive()
,
sdc_raster()
,
sensitivity_score()
dwellings_sp <- dwellings sp::coordinates(dwellings_sp) <- ~ x + y tryCatch( # does not work on some OS versions sp::proj4string(dwellings_sp) <- "+init=epsg:28992" ) # create a 1km grid unemployed <- sdc_raster(dwellings_sp, dwellings_sp$unemployed, r = 1e3) print(unemployed) # retrieve the sensitive cells is_sensitive(unemployed)
dwellings_sp <- dwellings sp::coordinates(dwellings_sp) <- ~ x + y tryCatch( # does not work on some OS versions sp::proj4string(dwellings_sp) <- "+init=epsg:28992" ) # create a 1km grid unemployed <- sdc_raster(dwellings_sp, dwellings_sp$unemployed, r = 1e3) print(unemployed) # retrieve the sensitive cells is_sensitive(unemployed)
Calculate sensitivity from a sdc_raster at x,y locations.
A typical use is to calculate the sensitivity for each of the locations x
was created with (see example).
is_sensitive_at(x, xy, ...)
is_sensitive_at(x, xy, ...)
x |
|
xy |
matrix of x and y coordinates, or a SpatialPoints or SpatialPointsDataFrame object |
... |
Arguments passed on to
|
logical
vector with
Other sensitive:
disclosure_risk()
,
is_sensitive()
,
plot_sensitive()
,
remove_sensitive()
,
sdc_raster()
,
sensitivity_score()
production <- sdc_raster(enterprises, "production") # add the sensitive variable to original data set. enterprises$sensitive <- is_sensitive_at(production, enterprises)
production <- sdc_raster(enterprises, "production") # add the sensitive variable to original data set. enterprises$sensitive <- is_sensitive_at(production, enterprises)
Pertubates coordinates by rounding coordinates to grid coordinates
mask_grid(x, r, plot = FALSE)
mask_grid(x, r, plot = FALSE)
x |
coordinates |
r |
grid resolution |
plot |
if |
Other point pertubation:
mask_random()
,
mask_voronoi()
,
mask_weighted_random()
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
Pertubates points with a uniform pertubation in a circle.
Note that r
can either be one distance, or a distance per data point.
mask_random(x, r, plot = FALSE)
mask_random(x, r, plot = FALSE)
x |
coordinates, |
r |
|
plot |
if |
adapted x
with perturbed coordinates
Other point pertubation:
mask_grid()
,
mask_voronoi()
,
mask_weighted_random()
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
Pertubates points by using voronoi masking. Each point is moved at its nearest voronoi boundary.
mask_voronoi(x, r = 0, k = 10, plot = FALSE)
mask_voronoi(x, r = 0, k = 10, plot = FALSE)
x |
coordinates |
r |
tolerance, nearest voronoi should be at least r away. |
k |
number of neighbors to consider when determining nearest neighbors |
plot |
if |
adapted x
with perturbed coordinates
Other point pertubation:
mask_grid()
,
mask_random()
,
mask_weighted_random()
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
This method uses per point the distance to the k
th neighbor as the maximum
pertubation distance. Parameter r
can be used to restrict the maximum distance
of the k
th neighbor.
mask_weighted_random(x, k = 5, r = NULL, plot = FALSE)
mask_weighted_random(x, k = 5, r = NULL, plot = FALSE)
x |
coordinates, |
k |
|
r |
|
plot |
if |
adapted x
with perturbed coordinates
Spatial obfuscation methods for privacy protection of household-level data
Other point pertubation:
mask_grid()
,
mask_random()
,
mask_voronoi()
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
x <- cbind( x = c(2.5, 3.5, 7.2, 1.5), y = c(6.2, 3.8, 4.4, 2.1) ) # plotting is only useful from small datasets! # grid masking x_g <- mask_grid(x, r=1, plot=TRUE) # random pertubation set.seed(3) x_r <- mask_random(x, r=1, plot=TRUE) if (requireNamespace("FNN", quietly = TRUE)){ # weighted random pertubation x_wr <- mask_weighted_random(x, k = 2, r = 4, plot=TRUE) } if ( requireNamespace("FNN", quietly = TRUE) && requireNamespace("sf", quietly = TRUE) ){ # voronoi masking, plotting needs package `sf` x_vor <- mask_voronoi(x, r = 1, plot=TRUE) }
Plots t the sensitive cells of the sdc_raster. The sensitive cells are
plotted in red. The sensitive cells are determined using is_sensitive
.
plot_sensitive(x, value = "mean", main = "sensitive", col, ...)
plot_sensitive(x, value = "mean", main = "sensitive", col, ...)
x |
sdc_raster object |
value |
character which value layer to be used for values, e.g. "sum", "count", "mean" (default). |
main |
character title of map. |
col |
color palette to be used, passed on to |
... |
passed on to |
Other plotting:
plot.sdc_raster()
Other sensitive:
disclosure_risk()
,
is_sensitive_at()
,
is_sensitive()
,
remove_sensitive()
,
sdc_raster()
,
sensitivity_score()
Plot a sdc_raster object together with its sensitivity.
## S3 method for class 'sdc_raster' plot( x, value = "mean", sensitive = TRUE, ..., main = paste(substitute(x)), col )
## S3 method for class 'sdc_raster' plot( x, value = "mean", sensitive = TRUE, ..., main = paste(substitute(x)), col )
x |
|
value |
|
sensitive |
|
... |
passed on to |
main |
title of plot |
col |
color palette to be used, passed on to |
When sensitive
is set to TRUE
, a side-by-side plot will be made of
the value
and its sensitivity
.
Other plotting:
plot_sensitive()
protects raster by summing over the neighborhood
protect_neighborhood(x, radius = 10 * raster::res(x$value)[1], ...)
protect_neighborhood(x, radius = 10 * raster::res(x$value)[1], ...)
x |
|
radius |
of the neighborhood to take |
... |
not used at the moment |
sdc_raster
object
data(enterprises) # create a sdc_raster from point data with raster with # a resolution of 200m production <- sdc_raster(enterprises, variable = "production" , r = 200, min_count = 3) print(production) # plot the raster zlim <- c(0, 3e4) # show which raster cells are sensitive plot(production, zlim=zlim) # let's smooth to reduce the sensitivity smoothed <- protect_smooth(production, bw = 400) plot(smoothed) neighborhood <- protect_neighborhood(production, radius=1000) plot(neighborhood) # what is the sensitivy fraction? sensitivity_score(neighborhood)
data(enterprises) # create a sdc_raster from point data with raster with # a resolution of 200m production <- sdc_raster(enterprises, variable = "production" , r = 200, min_count = 3) print(production) # plot the raster zlim <- c(0, 3e4) # show which raster cells are sensitive plot(production, zlim=zlim) # let's smooth to reduce the sensitivity smoothed <- protect_smooth(production, bw = 400) plot(smoothed) neighborhood <- protect_neighborhood(production, radius=1000) plot(neighborhood) # what is the sensitivy fraction? sensitivity_score(neighborhood)
protect_quadtree
reduces sensitivy by aggregating sensisitve cells with its
three neighbors, and does this recursively until no sensitive cells are
left or when the maximum zoom levels has been reached.
protect_quadtree(x, max_zoom = Inf, ...)
protect_quadtree(x, max_zoom = Inf, ...)
x |
|
max_zoom |
|
... |
Arguments passed on to
|
This implementation generalizes the method as described by Suñé et al., in
which there is no
risk function, and only a min_count
to determine sensitivity.
Furthermore the method the article
only handles count data (x$value$count
), not mean or summed values.
Currently the translation feature of the article is not (yet) implemented,
for the original method does not take the disclosure_risk
into account.
a sdc_raster
object, in which sensitive cells have been recursively aggregated until not sensitive or
when max_zoom has been reached.
Suñé, E., Rovira, C., Ibáñez, D., Farré, M. (2017). Statistical disclosure control on visualising geocoded population data using a structure in quadtrees, NTTS 2017
Other protection methods:
protect_smooth()
,
remove_sensitive()
# library(raster) # # fined <- sdc_raster(enterprises, enterprises$fined) # plot(fined) # fined_qt <- protect_quadtree(fined) # plot(fined_qt) # # fined <- sdc_raster(enterprises, enterprises$fined, r=50) # plot(fined) # fined_qt <- protect_quadtree(fined) # plot(fined_qt) # # # # library(sf) # gemeente_2019 <- st_read("https://cartomap.github.io/nl/rd/gemeente_2019.geojson") # st_crs(gemeente_2019) <- 28992 # nbl <- st_touches(gemeente_2019) # # coords <- st_coordinates(st_centroid(gemeente_2019)) # l <- lapply(seq_along(nbl), function(i){ # nb <- nbl[[i]] # st_sfc(lapply(nb, function(j){ # st_linestring(coords[c(i,j),])}) # ) # }) # l2 <- do.call(c, l) # # edge_list <- as.data.frame(nbl) # library(data.table) # el <- as.data.table(edge_list) # names(el) <- c("from", "to") # # edge_list$from <- gemeente_2019$id[edge_list$row.id] # edge_list$to <- gemeente_2019$id[edge_list$col.id] # edge_list <- subset(edge_list, row.id < col.id) # edge_list <- edge_list[,c("from", "to")] # # g <- igraph::graph_from_data_frame(edge_list, directed = FALSE) # plot(g) # library(igraph) # i <- match(names(V(g)), gemeente_2019$id) # # c2 <- igraph::layout_with_fr(g, coords[i,]) # plot(g, layout = c2) # # buurt_2019 <- st_read("https://cartomap.github.io/nl/rd/buurt_2019.geojson") # st_crs(buurt_2019) <- 28992 # system.time({ # nbl <- st_touches(buurt_2019) # }) # # coords <- st_coordinates(st_centroid(buurt_2019)) # l <- lapply(seq_along(nbl), function(i){ # nb <- nbl[[i]] # st_sfc(lapply(nb, function(j){ # st_linestring(coords[c(i,j),])}) # ) # }) # l2 <- do.call(c, l) # # plot(l2)
# library(raster) # # fined <- sdc_raster(enterprises, enterprises$fined) # plot(fined) # fined_qt <- protect_quadtree(fined) # plot(fined_qt) # # fined <- sdc_raster(enterprises, enterprises$fined, r=50) # plot(fined) # fined_qt <- protect_quadtree(fined) # plot(fined_qt) # # # # library(sf) # gemeente_2019 <- st_read("https://cartomap.github.io/nl/rd/gemeente_2019.geojson") # st_crs(gemeente_2019) <- 28992 # nbl <- st_touches(gemeente_2019) # # coords <- st_coordinates(st_centroid(gemeente_2019)) # l <- lapply(seq_along(nbl), function(i){ # nb <- nbl[[i]] # st_sfc(lapply(nb, function(j){ # st_linestring(coords[c(i,j),])}) # ) # }) # l2 <- do.call(c, l) # # edge_list <- as.data.frame(nbl) # library(data.table) # el <- as.data.table(edge_list) # names(el) <- c("from", "to") # # edge_list$from <- gemeente_2019$id[edge_list$row.id] # edge_list$to <- gemeente_2019$id[edge_list$col.id] # edge_list <- subset(edge_list, row.id < col.id) # edge_list <- edge_list[,c("from", "to")] # # g <- igraph::graph_from_data_frame(edge_list, directed = FALSE) # plot(g) # library(igraph) # i <- match(names(V(g)), gemeente_2019$id) # # c2 <- igraph::layout_with_fr(g, coords[i,]) # plot(g, layout = c2) # # buurt_2019 <- st_read("https://cartomap.github.io/nl/rd/buurt_2019.geojson") # st_crs(buurt_2019) <- 28992 # system.time({ # nbl <- st_touches(buurt_2019) # }) # # coords <- st_coordinates(st_centroid(buurt_2019)) # l <- lapply(seq_along(nbl), function(i){ # nb <- nbl[[i]] # st_sfc(lapply(nb, function(j){ # st_linestring(coords[c(i,j),])}) # ) # }) # l2 <- do.call(c, l) # # plot(l2)
protect_smooth
reduces the sensitivity by applying a Gaussian smoother,
making the values less localized.
protect_smooth(x, bw = raster::res(x$value), ...)
protect_smooth(x, bw = raster::res(x$value), ...)
x |
raster object |
bw |
bandwidth |
... |
passed through to |
The sensitivity of a raster can be decreased by applying a kernel density smoother as
argued by de Jonge et al. (2016) and de Wolf et al. (2018). Smoothing spatially spreads
localized values, reducing the risk for location disclosure. Note that
smoothing often visually enhances detection of spatial patterns.
The kernel applied is a Gaussian kernel with a bandwidth bw
supplied by the user.
The smoother acts upon the x$value$count
and x$value$sum
from which a new x$value$mean
is derived.
de Jonge, E., & de Wolf, P. P. (2016, September). Spatial smoothing and statistical disclosure control. In International Conference on Privacy in Statistical Databases (pp. 107-117). Springer, Cham.
de Wolf, P. P., & de Jonge, E. (2018, September). Safely Plotting Continuous Variables on a Map. In International Conference on Privacy in Statistical Databases (pp. 347-359). Springer, Cham.
Other protection methods:
protect_quadtree()
,
remove_sensitive()
library(sdcSpatial) library(raster) data(enterprises) # create a sdc_raster from point data with raster with # a resolution of 200m production <- sdc_raster(enterprises, variable = "production" , r = 200, min_count = 3) print(production) # plot the raster zlim <- c(0, 3e4) # show which raster cells are sensitive plot(production, zlim=zlim) # but we can also retrieve directly the raster sensitive <- is_sensitive(production, min_count = 3) plot(sensitive, col = c('white', 'red')) # what is the sensitivy fraction? sensitivity_score(production) # or equally cellStats(sensitive, mean) # let's smooth to reduce the sensitivity smoothed <- protect_smooth(production, bw = 400) plot(smoothed) # let's smooth to reduce the sensitivity, with higher resolution smoothed <- protect_smooth(production, bw = 400, smooth_fact=4, keep_resolution=FALSE) plot(smoothed) # what is the sensitivy fraction? sensitivity_score(smoothed) # let's remove the sensitive data. smoothed_safe <- remove_sensitive(smoothed, min_count = 3) plot(smoothed_safe) # let's communicate! production_mean <- mean(smoothed_safe) production_total <- sum(smoothed_safe) # and create a contour plot raster::filledContour(production_mean, nlevels = 6, main = "Mean production") # generated with R 3.6 >= #col <- hcl.colors(11, rev=TRUE) col <- c("#FDE333", "#C2DE34", "#7ED357", "#00C475", "#00B28A", "#009B95" , "#008298", "#006791", "#274983", "#44286E", "#4B0055" ) raster::filledContour(production_total, nlevels = 11 , col = col , main="Total production")
library(sdcSpatial) library(raster) data(enterprises) # create a sdc_raster from point data with raster with # a resolution of 200m production <- sdc_raster(enterprises, variable = "production" , r = 200, min_count = 3) print(production) # plot the raster zlim <- c(0, 3e4) # show which raster cells are sensitive plot(production, zlim=zlim) # but we can also retrieve directly the raster sensitive <- is_sensitive(production, min_count = 3) plot(sensitive, col = c('white', 'red')) # what is the sensitivy fraction? sensitivity_score(production) # or equally cellStats(sensitive, mean) # let's smooth to reduce the sensitivity smoothed <- protect_smooth(production, bw = 400) plot(smoothed) # let's smooth to reduce the sensitivity, with higher resolution smoothed <- protect_smooth(production, bw = 400, smooth_fact=4, keep_resolution=FALSE) plot(smoothed) # what is the sensitivy fraction? sensitivity_score(smoothed) # let's remove the sensitive data. smoothed_safe <- remove_sensitive(smoothed, min_count = 3) plot(smoothed_safe) # let's communicate! production_mean <- mean(smoothed_safe) production_total <- sum(smoothed_safe) # and create a contour plot raster::filledContour(production_mean, nlevels = 6, main = "Mean production") # generated with R 3.6 >= #col <- hcl.colors(11, rev=TRUE) col <- c("#FDE333", "#C2DE34", "#7ED357", "#00C475", "#00B28A", "#009B95" , "#008298", "#006791", "#274983", "#44286E", "#4B0055" ) raster::filledContour(production_total, nlevels = 11 , col = col , main="Total production")
remove_sensitive
removes sensitive cells from a sdc_raster
.
The sensitive cells, as found by is_sensitive()
are set to NA.
remove_sensitive(x, max_risk = x$max_risk, min_count = x$min_count, ...) mask_sensitive(x, max_risk = x$max_risk, min_count = x$min_count, ...)
remove_sensitive(x, max_risk = x$max_risk, min_count = x$min_count, ...) mask_sensitive(x, max_risk = x$max_risk, min_count = x$min_count, ...)
x |
|
max_risk |
a risk value higher than |
min_count |
a count lower than |
... |
passed on to |
Removing sensitive cells is a protection method, which often is useful to
finalize map protection after other protection methods have been applied.
mask_sensitive
and remove_sensitive
are synonyms, to accommodate both
experienced raster
users as well as sdc users.
sdc_raster object with sensitive cells set to NA
.
Other sensitive:
disclosure_risk()
,
is_sensitive_at()
,
is_sensitive()
,
plot_sensitive()
,
sdc_raster()
,
sensitivity_score()
Other protection methods:
protect_quadtree()
,
protect_smooth()
library(raster) unemployed <- sdc_raster(dwellings[1:2], dwellings$unemployed, r=200) # plot the normally rastered data plot(unemployed, zlim=c(0,1)) plot_sensitive(unemployed) unemployed_safe <- remove_sensitive(unemployed, risk_type="discrete") plot_sensitive(unemployed_safe, zlim=c(0,1)) print(unemployed) unemployed$value
library(raster) unemployed <- sdc_raster(dwellings[1:2], dwellings$unemployed, r=200) # plot the normally rastered data plot(unemployed, zlim=c(0,1)) plot_sensitive(unemployed) unemployed_safe <- remove_sensitive(unemployed, risk_type="discrete") plot_sensitive(unemployed_safe, zlim=c(0,1)) print(unemployed) unemployed$value
sdc_raster
creates multiple raster::raster
objects
("count", "mean", "sum") from supplied point data x
and calculates
the sensitivity to privacy disclosure for each raster location.
sdc_raster( x, variable, r = 200, max_risk = 0.95, min_count = 10, risk_type = c("external", "internal", "discrete"), ..., field = variable )
sdc_raster( x, variable, r = 200, max_risk = 0.95, min_count = 10, risk_type = c("external", "internal", "discrete"), ..., field = variable )
x |
sp::SpatialPointsDataFrame, sf::sf or a two column matrix or data.frame that is used to create a raster map. |
variable |
name of data column or |
r |
either a desired resolution or a pre-existing raster object.
In the first case, the crs of |
max_risk |
|
min_count |
|
risk_type |
passed on to |
... |
passed through to |
field |
synonym for |
A sdc_raster
object is the vehicle that does the book keeping for calculating
sensitivity. Protection methods work upon a sdc_raster
and return a new
sdc_raster
in which the sensitivity is reduced.
The sensitivity of the map can be assessed with sensitivity_score,
plot.sdc_raster()
, plot_sensitive()
or print
.
Reducing the sensitivity can be done with protect_smooth()
,
protect_quadtree()
and remove_sensitive()
. Raster maps for mean
,
sum
and count
data can be extracted from the $value
(brick()
).
object of class
"sdc_raster":
$value
: raster::brick()
object with different layers e.g. count
, sum
, mean
, scale
.
$max_risk
: see above.
$min_count
: see above.
of protection operation protect_smooth()
or protect_quadtree()
.
$type
: data type of variable
, either numeric
or logical
$risk_type
, "external", "internal" or "discrete" (see disclosure_risk()
)
Other sensitive:
disclosure_risk()
,
is_sensitive_at()
,
is_sensitive()
,
plot_sensitive()
,
remove_sensitive()
,
sensitivity_score()
library(raster) prod <- sdc_raster(enterprises, field = "production", r = 500) print(prod) prod <- sdc_raster(enterprises, field = "production", r = 1e3) print(prod) # get raster with the average production per cell averaged over the enterprises prod_mean <- mean(prod) summary(prod_mean) # get raster with the total production per cell prod_total <- sum(prod) summary(prod_total)
library(raster) prod <- sdc_raster(enterprises, field = "production", r = 500) print(prod) prod <- sdc_raster(enterprises, field = "production", r = 1e3) print(prod) # get raster with the average production per cell averaged over the enterprises prod_mean <- mean(prod) summary(prod_mean) # get raster with the total production per cell prod_total <- sum(prod) summary(prod_total)
sensitivity_score
calculates the fraction of cells (with a value)
that are considered sensitive according to the used disclosure_risk
sensitivity_score(x, max_risk = x$max_risk, min_count = x$min_count, ...)
sensitivity_score(x, max_risk = x$max_risk, min_count = x$min_count, ...)
x |
|
max_risk |
a risk value higher than |
min_count |
a count lower than |
... |
passed on to |
Other sensitive:
disclosure_risk()
,
is_sensitive_at()
,
is_sensitive()
,
plot_sensitive()
,
remove_sensitive()
,
sdc_raster()
consumption <- sdc_raster(dwellings[1:2], variable = dwellings$consumption, r = 500) sensitivity_score(consumption) # same as print(consumption) # change the rules! A higher norm generates more sensitive cells sensitivity_score(consumption, min_count = 20)
consumption <- sdc_raster(dwellings[1:2], variable = dwellings$consumption, r = 500) sensitivity_score(consumption) # same as print(consumption) # change the rules! A higher norm generates more sensitive cells sensitivity_score(consumption, min_count = 20)
Create kde density version of a raster
smooth_raster( x, bw = raster::res(x), smooth_fact = 5, keep_resolution = TRUE, na.rm = TRUE, pad = TRUE, padValue = NA, threshold = NULL, type = c("Gauss", "circle", "rectangle"), ... )
smooth_raster( x, bw = raster::res(x), smooth_fact = 5, keep_resolution = TRUE, na.rm = TRUE, pad = TRUE, padValue = NA, threshold = NULL, type = c("Gauss", "circle", "rectangle"), ... )
x |
raster object |
bw |
bandwidth |
smooth_fact |
|
keep_resolution |
|
na.rm |
should the |
pad |
should the data be padded? |
padValue |
what should the padding value be? |
threshold |
cells with a lower (weighted) value of this threshold will be removed. |
type |
what is the type of smoothing (see |
... |
passed through to |