Title: | Statistics Netherlands (CBS) Open Data API Client |
---|---|
Description: | The data and meta data from Statistics Netherlands (<https://www.cbs.nl>) can be browsed and downloaded. The client uses the open data API of Statistics Netherlands. |
Authors: | Edwin de Jonge [aut, cre], Sara Houweling [ctb] |
Maintainer: | Edwin de Jonge <[email protected]> |
License: | GPL-2 |
Version: | 1.2.9000 |
Built: | 2024-11-10 06:01:06 UTC |
Source: | https://github.com/edwindj/cbsodataR |
cbsodataR
allows to download all official statistics of Statistics
Netherlands (CBS) into R. For a introduction please visit the
vignette:
vignette("cbsodataR", package="cbsodataR")
. For an introduction on using
cbs cartographic maps: vignette("maps", package="cbsodataR")
The functions cbs_get_datasets()
and cbs_get_data()
should get you going.
Interested in cartographic maps, see cbs_get_maps()
.
cbs_get_datasets()
returns a data.frame with table of contents (toc): the publication
meta data for available tables, can also include the extra tables not directly available
in StatLine (dataderden)
cbs_get_catalogs()
, returns data.frame with the available (extra) catalogs.
cbs_get_toc()
, returns a data.frame with table of contents (toc): the publication
meta data for available tables within the standard CBS
cbs_search()
, returns a data.frame with tables that contain the given
search word.
cbs_get_data()
, returns the data of a specific opendata/StatLine table
cbs_download_table()
, saves the data (and metadata) as csv files
into a directory
cbs_get_meta()
, returns the meta data objects of a specific opendata / StatLine
table .
cbs_add_date_column()
, converts date/period codes into DateTime
objects
in the data set that was downloaded.
cbs_add_label_columns()
, adds labels to the code columns in the data that
was downloaded.
cbs_get_maps()
, returns a data.frame with available CBS maps
cbs_join_sf_with_data()
, returns an sf object joined with cbs table
cbs_get_sf()
, returns an sf object without data, e.g. "gemeente_2020".
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
The content of CBS opendata is subject to Creative Commons Attribution (CC BY 4.0). This means that the re-use of the content is permitted, provided Statistics Netherlands is cited as the source. For more information see: https://www.cbs.nl/en-gb/about-us/website/copyright
Maintainer: Edwin de Jonge [email protected]
Other contributors:
Sara Houweling [contributor]
Useful links:
Time periods in data of CBS are coded: yyyyXXww (e.g. 2018JJ00, 2018MM10, 2018KW02),
which contains year (yyyy), type (XX) and index (ww). cbs_add_date_column
converts
these codes into a Date()
or numeric
. In addition it adds
a frequency column denoting the type of the column.
cbs_add_date_column(x, date_type = c("Date", "numeric"), ...)
cbs_add_date_column(x, date_type = c("Date", "numeric"), ...)
x |
|
date_type |
Type of date column: "Date", "numeric. Numeric creates a fractional
number which signs the "middle" of the period. e.g. 2018JJ00 -> 2018.5 and
2018KW01 -> 2018.167. This is for the following reasons: otherwise 2018.0 could mean
2018, 2018 Q1 or 2018 Jan, and furthermore 2018.75 is a bit strange for 2018 Q4.
If all codes in the dataset have frequency "Y" the numeric output will be |
... |
future use. |
original dataset with two added columns: <period>_date
and
<period>_freq
. This last column is a factor with levels: Y
, Q
and M
Other data retrieval:
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_data()
,
cbs_extract_table_id()
,
cbs_get_data()
,
cbs_get_data_from_link()
Other meta data:
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_meta()
,
cbs_get_meta()
## Not run: x <- cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # add a Periods_Date column x <- cbs_add_date_column(x) x # add a Periods_numeric column x <- cbs_add_date_column(x, date_type = "numeric") x ## End(Not run)
## Not run: x <- cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # add a Periods_Date column x <- cbs_add_date_column(x) x # add a Periods_numeric column x <- cbs_add_date_column(x, date_type = "numeric") x ## End(Not run)
Adds cbs labels to the dataset that was retrieved using cbs_get_data()
.
cbs_add_label_columns(x, columns = colnames(x), ...)
cbs_add_label_columns(x, columns = colnames(x), ...)
x |
|
columns |
|
... |
not used. |
Code columns will be translated into label columns for each of the column that was supplied.
By default all code columns will be accompagnied with a label column. The name
of each label column will be <code_column>_label
.
the original data.frame x
with extra label
columns. (see description)
Other data retrieval:
cbs_add_date_column()
,
cbs_add_unit_column()
,
cbs_download_data()
,
cbs_extract_table_id()
,
cbs_get_data()
,
cbs_get_data_from_link()
Other meta data:
cbs_add_date_column()
,
cbs_add_unit_column()
,
cbs_download_meta()
,
cbs_get_meta()
## Not run: # get data for main (000000) Consumer Price Index (7196ENG) for March 2000, x <- cbs_get_data( id = "7196ENG" , Periods = "2000MM03" # March 2000 , CPI = "000000" # main price index ) cbs_add_label_columns(x) ## End(Not run)
## Not run: # get data for main (000000) Consumer Price Index (7196ENG) for March 2000, x <- cbs_get_data( id = "7196ENG" , Periods = "2000MM03" # March 2000 , CPI = "000000" # main price index ) cbs_add_label_columns(x) ## End(Not run)
Adds a statcode
column to the dataset, so it can be more easily joined with
a map retrieved with cbs_get_sf()
.
cbs_add_statcode_column(x, ...)
cbs_add_statcode_column(x, ...)
x |
|
... |
future use. |
Regional data uses the x$RegioS
dimension for data. The "codes" for each region
are also used in the cartographic map boundaries of regions as used in cbs_get_sf()
.
Unfortunately the codes in x$RegioS
can have trailing spaces, and the variable
used in the mapping material is named statcode
. This method simply adds
a statcode
column with trimmed codes from RegioS
, making it more easy to
connect a dataset to a cartographic map.
original dataset with added statcode
column.
Other cartographic map:
cbs_get_maps()
,
cbs_get_sf()
,
cbs_join_sf_with_data()
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
Adds extra unit columns to the dataset that was retrieved using cbs_get_data()
.
cbs_add_unit_column(x, columns = colnames(x), ...)
cbs_add_unit_column(x, columns = colnames(x), ...)
x |
|
columns |
|
... |
not used. |
The unit columns will be named <topic_column>_unit
, and are a character
By default all topic columns will be with a unit column. The name
of each unit column will be <topic_column>_unit
.
the original data.frame x
with extra unit
columns. (see description)
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_download_data()
,
cbs_extract_table_id()
,
cbs_get_data()
,
cbs_get_data_from_link()
Other meta data:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_download_meta()
,
cbs_get_meta()
if (interactive()) { x <- cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total , verbose = TRUE # show the url that is used ) # adds two extra columns x_with_units <- x |> cbs_add_unit_column() x_with_units[,1:4] }
if (interactive()) { x <- cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total , verbose = TRUE # show the url that is used ) # adds two extra columns x_with_units <- x |> cbs_add_unit_column() x_with_units[,1:4] }
extract the default selection from a cbsodata meta object
cbs_default_selection(x, ...)
cbs_default_selection(x, ...)
x |
meta object |
... |
for future use |
Gets all data via bulk download. cbs_download_data
dumps the data in
(international) csv format.
cbs_download_data( id, path = file.path(id, "data.csv"), catalog = "CBS", ..., select = NULL, typed = TRUE, verbose = FALSE, show_progress = interactive() && !verbose, base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_download_data( id, path = file.path(id, "data.csv"), catalog = "CBS", ..., select = NULL, typed = TRUE, verbose = FALSE, show_progress = interactive() && !verbose, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
of cbs open data table |
path |
of data file, defaults to "id/data.csv" |
catalog |
catalog id, can be retrieved with |
... |
optional filter statements to select rows of the data, |
select |
optional names of columns to be returned. |
typed |
Should the data automatically be converted into integer and numeric? |
verbose |
show the underlying downloading of the data |
show_progress |
show a progress bar while downloading. |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. See details. |
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
Other download:
cbs_download_meta()
,
cbs_download_table()
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_extract_table_id()
,
cbs_get_data()
,
cbs_get_data_from_link()
Dumps the meta data into a directory
cbs_download_meta( id, dir = id, catalog = "CBS", ..., verbose = FALSE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_download_meta( id, dir = id, catalog = "CBS", ..., verbose = FALSE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
Id of CBS open data table (see |
dir |
Directory in which data should be stored. By default it creates a sub directory with the name of the id |
catalog |
catalog id, can be retrieved with |
... |
not used |
verbose |
Print extra messages what is happening. |
cache |
Should meta data be cached? |
base_url |
optionally allow to specify a different server. Useful for third party data services implementing the same protocol, see details. |
meta data object
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
Other meta data:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_get_meta()
Other download:
cbs_download_data()
,
cbs_download_table()
cbs_download_table
downloads the data and metadata of
a table from statistics Netherlands and stores it in csv
format.
cbs_download_table( id, catalog = "CBS", ..., dir = id, cache = FALSE, verbose = TRUE, typed = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_download_table( id, catalog = "CBS", ..., dir = id, cache = FALSE, verbose = TRUE, typed = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
Identifier of CBS table (can be retrieved from |
catalog |
catalog id, can be retrieved with |
... |
Parameters passed on to |
dir |
Directory where table should be downloaded |
cache |
If metadata is cached use that, otherwise download meta data |
verbose |
Print extra messages what is happening. |
typed |
Should the data automatically be converted into integer and numeric? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
cbs_download_table
retrieves all raw meta data and data and stores these as csv
files in the directory specified by dir
. It is possible to add a filter.
A filter is specified with <column_name> = <values>
in which <values>
is a character vector.
Rows with values that are not part of the character vector are not returned.
meta data object of id
cbs_get_meta()
.
Other download:
cbs_download_data()
,
cbs_download_meta()
## Not run: # download meta data and data from inflation/Consumer Price Indices download_table(id="7196ENG") ## End(Not run)
## Not run: # download meta data and data from inflation/Consumer Price Indices download_table(id="7196ENG") ## End(Not run)
extract the id of a cbs table from the statline url
cbs_extract_table_id(url, ...)
cbs_extract_table_id(url, ...)
url |
|
... |
future use. |
character
with id, will be NA if not found.
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_data()
,
cbs_get_data()
,
cbs_get_data_from_link()
Retrieves the possible catalog values that can be used for retrieving data
cbs_get_catalogs(..., base_url = BASE_URL)
cbs_get_catalogs(..., base_url = BASE_URL)
... |
filter statement to select rows, e.g. Language="nl" |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
if (interactive()){ catalogs <- cbs_get_catalogs() # Identifier of catalog can be used to query print(catalogs$Identifier) ds_rivm <- cbs_get_datasets(catalog = "RIVM") ds_rivm[1:5, c("Identifier","ShortTitle")] }
if (interactive()){ catalogs <- cbs_get_catalogs() # Identifier of catalog can be used to query print(catalogs$Identifier) ds_rivm <- cbs_get_datasets(catalog = "RIVM") ds_rivm[1:5, c("Identifier","ShortTitle")] }
Retrieves data from a table of Statistics Netherlands. A list of available tables
can be retrieved with cbs_get_datasets()
. Use the Identifier
column of
cbs_get_datssets
as id
in cbs_get_data
and cbs_get_meta
.
cbs_get_data( id, ..., catalog = "CBS", select = NULL, typed = TRUE, add_column_labels = TRUE, dir = tempdir(), verbose = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL), include_ID = FALSE )
cbs_get_data( id, ..., catalog = "CBS", select = NULL, typed = TRUE, add_column_labels = TRUE, dir = tempdir(), verbose = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL), include_ID = FALSE )
id |
Identifier of table, can be found in |
... |
optional filter statements, see details. |
catalog |
catalog id, can be retrieved with |
select |
|
typed |
Should the data automatically be converted into integer and numeric? |
add_column_labels |
Should column titles be added as a label (TRUE) which are visible in |
dir |
Directory where the table should be downloaded. Defaults to temporary directory |
verbose |
Print extra messages what is happening. |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol, see details. |
include_ID |
Should the data include the ID column for the rows? |
To reduce the download time, optionaly the data can be filtered on category values: for large tables (> 100k records) this is a wise thing to do.
The filter is specified with (see examples below):
<column_name> = <values>
in which <values>
is a character vector.
Rows with values that are not part of the character vector are not returned.
Note that the values have to be values from the $Key
column of the corresponding meta data. These may contain trailing spaces...
<column_name> = has_substring(x)
in which x is a character vector. Rows with values that
do not have a substring that is in x are not returned. Useful substrings are
"JJ", "KW", "MM" for Periods (years, quarters, months) and "PV", "CR" and "GM"
for Regions (provinces, corops, municipalities).
<column_name> = eq(<values>) | has_substring(x)
, which combines the two statements above.
By default the columns will be converted to their type (typed=TRUE
).
CBS uses multiple types of missing (unknown, surpressed, not measured, missing): users
wanting all these nuances can use typed=FALSE
which results in character columns.
data.frame
with the requested data. Note that a csv copy of
the data is stored in dir
.
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
The content of CBS opendata is subject to Creative Commons Attribution (CC BY 4.0). This means that the re-use of the content is permitted, provided Statistics Netherlands is cited as the source. For more information see: https://www.cbs.nl/en-gb/about-us/website/copyright
All data are downloaded using cbs_download_table()
cbs_get_meta()
, cbs_download_data()
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_data()
,
cbs_extract_table_id()
,
cbs_get_data_from_link()
Other query:
eq()
,
has_substring()
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
Retrieve data from a link created from the StatLine app.
cbs_get_data_from_link( link, message = TRUE, ..., base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_get_data_from_link( link, message = TRUE, ..., base_url = getOption("cbsodataR.base_url", BASE_URL) )
link |
url/hyperlink to opendata table made with the StatLine App |
message |
|
... |
passed on to |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
Same as cbs_get_data
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_data()
,
cbs_extract_table_id()
,
cbs_get_data()
cbs_get_datasets
by default a list of all tables and all columns will be retrieved.
You can restrict the query by supplying multiple filter statements or by specifying the
columns that should be returned.
cbs_get_datasets( catalog = "CBS", convert_dates = TRUE, select = NULL, verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL), ... )
cbs_get_datasets( catalog = "CBS", convert_dates = TRUE, select = NULL, verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL), ... )
catalog |
which set of tables should be returned? |
convert_dates |
convert the columns with date-time information into DateTime (default |
select |
|
verbose |
|
cache |
|
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
... |
filter statement to select rows, e.g. Language="nl" |
Note that setting catalog
to NULL
results in a datasets list with all tables including
the extra catalogs.
if (interactive()){ # retrieve the datasets in the "CBS" catalog ds <- cbs_get_datasets() ds[1:5, c("Identifier", "ShortTitle")] # retrieve de datasets in the "AZW" catalog ds_azw <- cbs_get_datasets(catalog = "AZW") # to retrieve all datasets of all catalogs, supply "NULL" ds_all <- cbs_get_datasets(catalog = NULL) }
if (interactive()){ # retrieve the datasets in the "CBS" catalog ds <- cbs_get_datasets() ds[1:5, c("Identifier", "ShortTitle")] # retrieve de datasets in the "AZW" catalog ds_azw <- cbs_get_datasets(catalog = "AZW") # to retrieve all datasets of all catalogs, supply "NULL" ds_all <- cbs_get_datasets(catalog = NULL) }
Returns a list of (simplified) maps, that can be used with CBS data.
cbs_get_maps(verbose = FALSE, cache = TRUE)
cbs_get_maps(verbose = FALSE, cache = TRUE)
verbose |
if |
cache |
if |
data.frame
with region, year and links to geojson
Other cartographic map:
cbs_add_statcode_column()
,
cbs_get_sf()
,
cbs_join_sf_with_data()
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
Retrieve the meta data of a CBS open data table. Caching (cache=TRUE
) improves
the performance considerably.
cbs_get_meta( id, catalog = "CBS", verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_get_meta( id, catalog = "CBS", verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
internal id of CBS table, can be retrieved with |
catalog |
catalog id, can be retrieved with |
verbose |
Print extra messages what is happening. |
cache |
should the result be cached? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
The meta data of a CBS table is determined by the web api of Statistics
Netherlands. cbsodataR
stays close to this API.
Each cbsodataR object has the following metadata items,
which are all data.frame
s :
$TableInfos
: data.frame with the descriptive publication metadata
of the table, such as Title
, Description
, Summary
etc.
$DataProperties
: data.frame with the Title
, Description
,
Unit
etc. of each column in the dataset that is downloaded with
cbs_get_data()
.
$CategoryGroups
: hierarchical groupings of the code columns.
$<code column>
: for each code column a data.frame
with
the Title
, Key
, Description
etc. of each code / category
in that column. e.g. Perioden
for time codes c("2019JJ00","2018JJ00")
.
cbs_table object containing several data.frames
with meta data
(see details)
Other meta data:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_meta()
Load meta data from a downloaded table
cbs_get_meta_from_dir(dir)
cbs_get_meta_from_dir(dir)
dir |
Directory where data was downloaded |
cbs_table object with meta data
Retrieve a polygon sf object that can be used for plotting. This function only provides the region boundaries.
cbs_get_sf( region, year, keep_columns = c("statcode", "statnaam"), verbose = FALSE )
cbs_get_sf( region, year, keep_columns = c("statcode", "statnaam"), verbose = FALSE )
region |
|
year |
|
keep_columns |
|
verbose |
if |
To use the map for plotting:
add data columns to the sf data.frame returned by cbs_get_sf
, e.g. by
using dplyr::left_join
or otherwise
use ggplot2
, tmap
, leaflet
or any other plotting library useful for
plotting spatial data.
sf::st_sf()
object with the polygons of the regions specified.
Other cartographic map:
cbs_add_statcode_column()
,
cbs_get_maps()
,
cbs_join_sf_with_data()
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
Get a the list of tables connected to themes
cbs_get_tables_themes( ..., select = NULL, verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_get_tables_themes( ..., select = NULL, verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
... |
Use this to add a filter to the query e.g. |
select |
|
verbose |
Print extra messages what is happening. |
cache |
Should the result be cached? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocal. |
A data.frame
with various properties of SN/CBS themes.
Returns a list of all cbs themes.
cbs_get_themes( ..., select = NULL, verbose = TRUE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
cbs_get_themes( ..., select = NULL, verbose = TRUE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
... |
Use this to add a filter to the query e.g. |
select |
|
verbose |
Print extra messages what is happening. |
cache |
Should the result be cached? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
A data.frame
with various properties of SN/CBS themes.
The filter is specified with <column_name> = <values>
in which <values>
is a character vector.
Rows with values that are not part of the character vector are not returned.
## Not run: # get list of all themes cbs+get_themes() # get list of all dutch themes from the Catalog "CBS" cbs_get_themes(Language="nl", Catalog="CBS") ## End(Not run)
## Not run: # get list of all themes cbs+get_themes() # get list of all dutch themes from the Catalog "CBS" cbs_get_themes(Language="nl", Catalog="CBS") ## End(Not run)
cbs_get_toc
by default a list of all tables and all columns will be retrieved.
You can restrict the query by supplying multiple filter statements or by specifying the
columns that should be returned.
cbs_get_toc( ..., convert_dates = TRUE, select = NULL, verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL), include_ID = FALSE )
cbs_get_toc( ..., convert_dates = TRUE, select = NULL, verbose = FALSE, cache = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL), include_ID = FALSE )
... |
filter statement to select rows, e.g. Language="nl" |
convert_dates |
convert the columns with date-time information into DateTime (default |
select |
|
verbose |
|
cache |
|
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
include_ID |
|
data.frame
with identifiers, titles and descriptions of tables
cbs_get_toc
will cache results, so subsequent calls will be much faster.
## Not run: # get list of english tables tables_en <- cbs_get_toc(Language="en") # get list of dutch tables tables_nl <- cbs_get_toc(Language="nl") View(tables_nl) ## End(Not run)
## Not run: # get list of english tables tables_en <- cbs_get_toc(Language="en") # get list of dutch tables tables_nl <- cbs_get_toc(Language="nl") View(tables_nl) ## End(Not run)
Utility function to create an sf map object with data from cbsodataR.
cbs_join_sf_with_data(region, year, x, verbose = FALSE)
cbs_join_sf_with_data(region, year, x, verbose = FALSE)
region |
|
year |
|
x |
data retrieved with |
verbose |
if |
The function is a simple wrapper around cbs_add_statcode_column()
and
cbs_get_sf()
.
Please note that the resulting sf::st_sf()
dataset has the same number of
rows as the requested map object, as requested by cbs_get_sf()
,
i.e. not the same rows as x
. It's the users responsibility to match the correct
map to the selection of the data.
Other cartographic map:
cbs_add_statcode_column()
,
cbs_get_maps()
,
cbs_get_sf()
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
if (interactive()){ # retrieve maps cbs_maps <- cbs_get_maps() cbs_maps |> head(4) gemeente_map <- cbs_get_sf("gemeente", 2023, verbose=TRUE) # sf object gemeente_map # plot the statcodes (included in the map) plot(gemeente_map, max.plot = 1) # now connect with some data labor <- cbs_get_data("85268NED" , Perioden = "2022JJ00" # only 2022 , RegioS = has_substring("PV") # only province , verbose = TRUE ) # most conveniently provincie_2022_with_data <- cbs_join_sf_with_data("provincie", 2022, labor) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... provincie_2022_with_data |> subset(select = Werkloosheidspercentage_13) |> plot( border ="#FFFFFF99", main="unemployment rate") ## but of course this can also be done by hand: labor <- labor |> cbs_add_statcode_column() # add column to connect with map provincie_2022 <- cbs_get_sf("provincie", 2022) # this is a left_join(provincie_2022, labor, by = "statcode") provincie_2022_data <- within(provincie_2022, { unemployment_rate <- labor$Werkloosheidspercentage_13[match(statcode, labor$statcode)] }) # better plotting options are ggplot2 or tmap, # but keeping dependencies low... plot( provincie_2022_data[,c("unemployment_rate")] , border ="#FFFFFF99" , nbreaks = 12 ) }
Find tables containing search words.
cbs_search( query, catalog = "CBS", language = "nl", format = c("datasets", "docs", "raw"), verbose = FALSE, ... )
cbs_search( query, catalog = "CBS", language = "nl", format = c("datasets", "docs", "raw"), verbose = FALSE, ... )
query |
|
catalog |
the subset in which the table is to be found, see
|
language |
should the |
format |
format in which the result should be returned, see details |
verbose |
|
... |
not used |
The format
can be either:
datasets
: the same format as cbs_get_datasets()
, with an extra score
column.
docs
: the table results from the solr query,
raw
: the complete results from the solr query.
if (interactive()){ # search for tables containing the word birth ds_en <- cbs_search("Birth", language="en") ds_en[1:3, c("Identifier", "ShortTitle")] # or in Dutch ds_nl <- cbs_search(c("geboorte"), language="nl") ds_nl[1:3, c("Identifier", "ShortTitle")] # Search in an other catalog ds_rivm <- cbs_search(c("geboorte"), catalog = "RIVM", language="nl") ds_rivm[1:3, c("Identifier", "ShortTitle")] # search in all catalogs ds_all <- cbs_search(c("geboorte"), catalog = NULL, language="nl") # docs docs <- cbs_search(c("geboorte,sterfte"), language="nl", format="docs") names(docs) docs[1:2,] #raw raw_res <- cbs_search(c("geboorte,sterfte"), language="nl", format="raw") raw_res }
if (interactive()){ # search for tables containing the word birth ds_en <- cbs_search("Birth", language="en") ds_en[1:3, c("Identifier", "ShortTitle")] # or in Dutch ds_nl <- cbs_search(c("geboorte"), language="nl") ds_nl[1:3, c("Identifier", "ShortTitle")] # Search in an other catalog ds_rivm <- cbs_search(c("geboorte"), catalog = "RIVM", language="nl") ds_rivm[1:3, c("Identifier", "ShortTitle")] # search in all catalogs ds_all <- cbs_search(c("geboorte"), catalog = NULL, language="nl") # docs docs <- cbs_search(c("geboorte,sterfte"), language="nl", format="docs") names(docs) docs[1:2,] #raw raw_res <- cbs_search(c("geboorte,sterfte"), language="nl", format="raw") raw_res }
This method is deprecated in favor of cbs_download_data()
.
download_data( id, path = file.path(id, "data.csv"), ..., select = NULL, typed = FALSE, verbose = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
download_data( id, path = file.path(id, "data.csv"), ..., select = NULL, typed = FALSE, verbose = TRUE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
of cbs open data table |
path |
of data file, defaults to "id/data.csv" |
... |
optional filter statements to select rows of the data, |
select |
optional names of columns to be returned. |
typed |
Should the data automatically be converted into integer and numeric? |
verbose |
show the underlying downloading of the data |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. See details. |
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
Other download:
cbs_download_meta()
,
cbs_download_table()
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_extract_table_id()
,
cbs_get_data()
,
cbs_get_data_from_link()
This method is deprecated in favor of cbs_download_meta()
.
download_meta( id, dir = id, ..., verbose = FALSE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
download_meta( id, dir = id, ..., verbose = FALSE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
Id of CBS open data table (see |
dir |
Directory in which data should be stored. By default it creates a sub directory with the name of the id |
... |
not used |
verbose |
Print extra messages what is happening. |
cache |
Should meta data be cached? |
base_url |
optionally allow to specify a different server. Useful for third party data services implementing the same protocol, see details. |
meta data object
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
Other meta data:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_get_meta()
Other download:
cbs_download_data()
,
cbs_download_table()
This method is deprecated in favor of cbs_download_table()
.
download_table( id, ..., dir = id, cache = FALSE, verbose = TRUE, typed = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
download_table( id, ..., dir = id, cache = FALSE, verbose = TRUE, typed = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
Identifier of CBS table (can be retrieved from |
... |
Parameters passed on to |
dir |
Directory where table should be downloaded |
cache |
If metadata is cached use that, otherwise download meta data |
verbose |
Print extra messages what is happening. |
typed |
Should the data automatically be converted into integer and numeric? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
cbs_download_table
retrieves all raw meta data and data and stores these as csv
files in the directory specified by dir
. It is possible to add a filter.
A filter is specified with <column_name> = <values>
in which <values>
is a character vector.
Rows with values that are not part of the character vector are not returned.
meta data object of id
cbs_get_meta()
.
Other download:
cbs_download_data()
,
cbs_download_meta()
## Not run: # download meta data and data from inflation/Consumer Price Indices download_table(id="7196ENG") ## End(Not run)
## Not run: # download meta data and data from inflation/Consumer Price Indices download_table(id="7196ENG") ## End(Not run)
Detects for codes in a column. eq
filters the data set at CBS: rows that have
a code that is not in x
are filtered out.
eq(x, column = NULL, allowed = NULL)
eq(x, column = NULL, allowed = NULL)
x |
exact code(s) to be matched in |
column |
name of column. |
allowed |
|
query object
Other query:
cbs_get_data()
,
has_substring()
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
This method is deprecated in favor of cbs_get_data()
get_data( id, ..., recode = TRUE, use_column_title = recode, dir = tempdir(), base_url = getOption("cbsodataR.base_url", BASE_URL) )
get_data( id, ..., recode = TRUE, use_column_title = recode, dir = tempdir(), base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
Identifier of table, can be found in |
... |
optional filter statements, see details. |
recode |
recodes all codes in the code columns with their |
use_column_title |
not used. |
dir |
Directory where the table should be downloaded. Defaults to temporary directory |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol, see details. |
To reduce the download time, optionaly the data can be filtered on category values: for large tables (> 100k records) this is a wise thing to do.
The filter is specified with (see examples below):
<column_name> = <values>
in which <values>
is a character vector.
Rows with values that are not part of the character vector are not returned.
Note that the values have to be values from the $Key
column of the corresponding meta data. These may contain trailing spaces...
<column_name> = has_substring(x)
in which x is a character vector. Rows with values that
do not have a substring that is in x are not returned. Useful substrings are
"JJ", "KW", "MM" for Periods (years, quarters, months) and "PV", "CR" and "GM"
for Regions (provinces, corops, municipalities).
<column_name> = eq(<values>) | has_substring(x)
, which combines the two statements above.
By default the columns will be converted to their type (typed=TRUE
).
CBS uses multiple types of missing (unknown, surpressed, not measured, missing): users
wanting all these nuances can use typed=FALSE
which results in character columns.
data.frame
with the requested data. Note that a csv copy of
the data is stored in dir
.
Besides the official CBS data, there are also third party and preview dataservices
implementing the same protocol. The base_url
parameter allows to specify a different server.
The base_url
can either be specified explicitly or set globally with with
options(cbsodataR.base_url = "http://example.com")
.
Some further tweaking may be necessary for third party services, a download url
is constructed using: either with:
<base_url>/<BULK>/<id>/...
for data
<base_url>/<API>/<id>/?$format=json
for metadata
Default values for BASEURL
, BULK
and API
are set in the package options,
but can be changed with:
options( cbsodataR.base_url = "https://opendata.cbs.nl", cbsodataR.BULK = "ODataFeed/odata", cbsodataR.API = "ODataAPI/odata" )
which are the default values set in the package.
The content of CBS opendata is subject to Creative Commons Attribution (CC BY 4.0). This means that the re-use of the content is permitted, provided Statistics Netherlands is cited as the source. For more information see: https://www.cbs.nl/en-gb/about-us/website/copyright
All data are downloaded using cbs_download_table()
cbs_get_meta()
, cbs_download_data()
Other data retrieval:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_data()
,
cbs_extract_table_id()
,
cbs_get_data_from_link()
Other query:
eq()
,
has_substring()
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
Load meta data from a downloaded table
get_meta_from_dir(dir)
get_meta_from_dir(dir)
dir |
Directory where data was downloaded |
cbs_table object with meta data
This method is deprecated in favor of cbs_get_meta()
get_meta( id, verbose = TRUE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
get_meta( id, verbose = TRUE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
id |
internal id of CBS table, can be retrieved with |
verbose |
Print extra messages what is happening. |
cache |
should the result be cached? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
The meta data of a CBS table is determined by the web api of Statistics
Netherlands. cbsodataR
stays close to this API.
Each cbsodataR object has the following metadata items,
which are all data.frame
s :
$TableInfos
: data.frame with the descriptive publication metadata
of the table, such as Title
, Description
, Summary
etc.
$DataProperties
: data.frame with the Title
, Description
,
Unit
etc. of each column in the dataset that is downloaded with
cbs_get_data()
.
$CategoryGroups
: hierarchical groupings of the code columns.
$<code column>
: for each code column a data.frame
with
the Title
, Key
, Description
etc. of each code / category
in that column. e.g. Perioden
for time codes c("2019JJ00","2018JJ00")
.
cbs_table object containing several data.frames
with meta data
(see details)
Other meta data:
cbs_add_date_column()
,
cbs_add_label_columns()
,
cbs_add_unit_column()
,
cbs_download_meta()
This method is deprecated in favor of cbs_get_toc()
.
get_table_list( ..., select = NULL, base_url = getOption("cbsodataR.base_url", BASE_URL) )
get_table_list( ..., select = NULL, base_url = getOption("cbsodataR.base_url", BASE_URL) )
... |
filter statement to select rows, e.g. Language="nl" |
select |
|
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocal. |
data.frame
with identifiers, titles and descriptions of tables
## Not run: # get list of english tables tables_en <- get_table_list(Language="en") # get list of dutch tables tables_nl <- get_table_list(Language="nl") View(tables_nl) ## End(Not run)
## Not run: # get list of english tables tables_en <- get_table_list(Language="en") # get list of dutch tables tables_nl <- get_table_list(Language="nl") View(tables_nl) ## End(Not run)
Get a the list of tables connected to themes
get_tables_themes( ..., select = NULL, base_url = getOption("cbsodataR.base_url", BASE_URL) )
get_tables_themes( ..., select = NULL, base_url = getOption("cbsodataR.base_url", BASE_URL) )
... |
Use this to add a filter to the query e.g. |
select |
|
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocal. |
A data.frame
with various properties of SN/CBS themes.
Returns a list of all cbs themes.
get_themes( ..., select = NULL, verbose = TRUE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
get_themes( ..., select = NULL, verbose = TRUE, cache = FALSE, base_url = getOption("cbsodataR.base_url", BASE_URL) )
... |
Use this to add a filter to the query e.g. |
select |
|
verbose |
Print extra messages what is happening. |
cache |
Should the result be cached? |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocal. |
A data.frame
with various properties of SN/CBS themes.
The filter is specified with <column_name> = <values>
in which <values>
is a character vector.
Rows with values that are not part of the character vector are not returned.
## Not run: # get list of all themes get_themes() # get list of all dutch themes from the Catalog "CBS" get_themes(Language="nl", Catalog="CBS") ## End(Not run)
## Not run: # get list of all themes get_themes() # get list of all dutch themes from the Catalog "CBS" get_themes(Language="nl", Catalog="CBS") ## End(Not run)
column
Detects a substring in a column. has_substring
filters the dataset at CBS:
rows that have a code that does not contain (one of) x
are filtered out.
has_substring(x, column = NULL, allowed = NULL)
has_substring(x, column = NULL, allowed = NULL)
x |
substring to be detected in column |
column |
column name |
allowed |
|
Other query:
cbs_get_data()
,
eq()
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
## Not run: cbs_get_data( id = "7196ENG" # table id , Periods = "2000MM03" # March 2000 , CPI = "000000" # Category code for total ) # useful substrings: ## Periods: "JJ": years, "KW": quarters, "MM", months ## Regions: "NL", "PV": provinces, "GM": municipalities cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") # all years , CPI = "000000" # Category code for total ) cbs_get_data( id = "7196ENG" # table id , Periods = c("2000MM03","2001MM12") # March 2000 and Dec 2001 , CPI = "000000" # Category code for total ) # combine either this cbs_get_data( id = "7196ENG" # table id , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001 , CPI = "000000" # Category code for total ) # or this: note the "eq" function cbs_get_data( id = "7196ENG" # table id , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years , CPI = "000000" # Category code for total ) ## End(Not run)
resolve a deeplink created in the opendata portal
resolve_deeplink( deeplink, ..., base_url = getOption("cbsodataR.base_url", BASE_URL) )
resolve_deeplink( deeplink, ..., base_url = getOption("cbsodataR.base_url", BASE_URL) )
deeplink |
url to the deeplink in the opendataportal |
... |
used in the query |
base_url |
optionally specify a different server. Useful for third party data services implementing the same protocol. |
information object with table id, select, filter and query statement.