This function performs a search on the school directory at uiv.cz and returns
the resulting export - either the XLS file or the data, or both.
The school directory is a version of the school register: unlike the core
register, it contains contact information but lacks some other information
(such as unique address identification.) Use vz_get_register()
for the core
register.
Arguments
- tables
a character vector of tables to retrieve. See ** Tables** below.
- ...
key-value pairs of search fields. Use
vz_get_search_fields()
to see a list of fields and their potential values.- return_tibbles
Whether to return the data (if TRUE) or only download the files (if FALSE).
- write_files
Whether to write the XLS files locally.
- dest_dir
Directory in which to write XLS files. Defaults to working directory.
Value
A list of a tibbles if return_tibbles = TRUE, a single tibble if only
one table name is passed tables
, otherwise a character vector of paths
to the downloaded *.xls files.
if return_tibbles is TRUE, a named list of
tibbles, with a tibble for each table in tables
with the corresponding name, unless the function was called with a tables
parameter of length one, in which case the result is a tibble;
if return_tibbles is FALSE, the result is a character vector of file paths.
Note that the downloaded XLS files are in fact HTML files and you are best
off loading them using vz_load_directory()
and tidying with
vz_load_directory
, though they can be opened in Excel too.
Tables
Tables can include "addresses", "schools", "locations", "specialisations". If you need more tables based on the same query (fields), pass them into a single function call in order to avoid burdening the data provider's server (the server needs to perform a search for each function call; there is no caching and no data dumps are made available).
What this does
The function
performs a search on the school directory at uiv.cz
by default the search is for all schools, unless ... params are set to narrow down the search
traverses the results to the export links
downloads the XLS files
loads them into tibbles if return_tibbles is TRUE
This is the only way to get to the data - there are no static dumps available. At the same time, no intense web scraping takes place - only individual export files (max 4 per call) are downloaded the same way as it would be done manually.
Note
To avoid blitzing the data provider's server with many heavy requests:
If you need more tables based on the same search, pass it in one call, using the
tables
argument. This means that only one initial search is peformed.Only ask for the tables you need.
If you need a subset of the data, use the
fields
(...) argumentIf you need multiple subsets of the data, try to do that via the
fields
(...) argument too, though that may not always be possible.If you are downloading a large dump and reusing it in a pipeline, keep the downloaded XLS files (or your own export) locally (setting
write_files
to TRUE), use caching and avoid calling this function repeatedly (ideally make any reruns conditional on the age of the stored export or use a pipeline management framework such as targets.
Examples
vz_get_directory("addresses", uzemi = "CZ010", return_tibbles = TRUE, write_files = TRUE)
#> ℹ Downloaded 637.81 kB
#> # A tibble: 992 × 31
#> red_izo ico zrizovatel uzemi kraj okres spravni_urad orp nazev_orp
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 600000206 49625918 6 CZ0101 Hlav… Prah… B11000 1101 Praha 1
#> 2 600000222 61379310 6 CZ0104 Hlav… Prah… B11000 1112 Praha 12
#> 3 600000231 25765710 5 CZ0104 Hlav… Prah… B11000 1112 Praha 12
#> 4 600000249 60437171 6 CZ0105 Hlav… Prah… B11000 1113 Praha 13
#> 5 600000257 25143701 5 CZ0105 Hlav… Prah… B11000 1113 Praha 13
#> 6 600000265 25637941 5 CZ0105 Hlav… Prah… B11000 1116 Praha 16
#> 7 600000273 25642863 5 CZ0105 Hlav… Prah… B11000 1113 Praha 13
#> 8 600000290 49625063 6 CZ0107 Hlav… Prah… B11000 1107 Praha 7
#> 9 600000303 60447338 6 CZ0108 Hlav… Prah… B11000 1108 Praha 8
#> 10 600000311 61507962 5 CZ0109 Hlav… Prah… B11000 1109 Praha 9
#> # ℹ 982 more rows
#> # ℹ 22 more variables: plny_nazev <chr>, zkraceny_nazev <chr>, ulice <chr>,
#> # c_p <chr>, c_or <chr>, c_obce <chr>, psc <chr>, misto <chr>, telefon <chr>,
#> # fax <chr>, email_1 <chr>, email_2 <chr>, www <chr>,
#> # id_dat_schranky_subjektu <chr>, reditel <chr>, x <chr>, je_ovm <chr>,
#> # zuj <chr>, email_zrizovatele <chr>, id_dat_schranky_zrizovatele <chr>,
#> # pravni_forma_reditelstvi <chr>, datum_zapisu <chr>