Get school directory — vz_get_directory • vsezved

This function performs a search on the school directory at uiv.cz and returns the resulting export - either the XLS file or the data, or both. The school directory is a version of the school register: unlike the core register, it contains contact information but lacks some other information (such as unique address identification.) Use vz_get_register() for the core register.

Usage

vz_get_directory(
  tables = c("addresses", "schools", "locations", "specialisations"),
  ...,
  return_tibbles = FALSE,
  write_files = TRUE,
  dest_dir = getwd()
)

Arguments

tables: a character vector of tables to retrieve. See ** Tables** below.
...: key-value pairs of search fields. Use vz_get_search_fields() to see a list of fields and their potential values.
return_tibbles: Whether to return the data (if TRUE) or only download the files (if FALSE).
write_files: Whether to write the XLS files locally.
dest_dir: Directory in which to write XLS files. Defaults to working directory.

Value

A list of a tibbles if return_tibbles = TRUE, a single tibble if only one table name is passed tables, otherwise a character vector of paths to the downloaded *.xls files.

if return_tibbles is TRUE, a named list of tibbles, with a tibble for each table in tables with the corresponding name, unless the function was called with a tables parameter of length one, in which case the result is a tibble; if return_tibbles is FALSE, the result is a character vector of file paths. Note that the downloaded XLS files are in fact HTML files and you are best off loading them using vz_load_directory() and tidying with vz_load_directory, though they can be opened in Excel too.

Tables

Tables can include "addresses", "schools", "locations", "specialisations". If you need more tables based on the same query (fields), pass them into a single function call in order to avoid burdening the data provider's server (the server needs to perform a search for each function call; there is no caching and no data dumps are made available).

What this does

The function

performs a search on the school directory at uiv.cz
by default the search is for all schools, unless ... params are set to narrow down the search
traverses the results to the export links
downloads the XLS files
loads them into tibbles if return_tibbles is TRUE

This is the only way to get to the data - there are no static dumps available. At the same time, no intense web scraping takes place - only individual export files (max 4 per call) are downloaded the same way as it would be done manually.

Note

To avoid blitzing the data provider's server with many heavy requests:

If you need more tables based on the same search, pass it in one call, using the tables argument. This means that only one initial search is peformed.
Only ask for the tables you need.
If you need a subset of the data, use the fields (...) argument
If you need multiple subsets of the data, try to do that via the fields (...) argument too, though that may not always be possible.
If you are downloading a large dump and reusing it in a pipeline, keep the downloaded XLS files (or your own export) locally (setting write_files to TRUE), use caching and avoid calling this function repeatedly (ideally make any reruns conditional on the age of the stored export or use a pipeline management framework such as targets.

Examples

vz_get_directory("addresses", uzemi = "CZ010", return_tibbles = TRUE, write_files = TRUE)
#> ℹ Downloaded 637.81 kB
#> # A tibble: 992 × 31
#>    red_izo   ico      zrizovatel uzemi  kraj  okres spravni_urad orp   nazev_orp
#>    <chr>     <chr>    <chr>      <chr>  <chr> <chr> <chr>        <chr> <chr>    
#>  1 600000206 49625918 6          CZ0101 Hlav… Prah… B11000       1101  Praha 1  
#>  2 600000222 61379310 6          CZ0104 Hlav… Prah… B11000       1112  Praha 12 
#>  3 600000231 25765710 5          CZ0104 Hlav… Prah… B11000       1112  Praha 12 
#>  4 600000249 60437171 6          CZ0105 Hlav… Prah… B11000       1113  Praha 13 
#>  5 600000257 25143701 5          CZ0105 Hlav… Prah… B11000       1113  Praha 13 
#>  6 600000265 25637941 5          CZ0105 Hlav… Prah… B11000       1116  Praha 16 
#>  7 600000273 25642863 5          CZ0105 Hlav… Prah… B11000       1113  Praha 13 
#>  8 600000290 49625063 6          CZ0107 Hlav… Prah… B11000       1107  Praha 7  
#>  9 600000303 60447338 6          CZ0108 Hlav… Prah… B11000       1108  Praha 8  
#> 10 600000311 61507962 5          CZ0109 Hlav… Prah… B11000       1109  Praha 9  
#> # ℹ 982 more rows
#> # ℹ 22 more variables: plny_nazev <chr>, zkraceny_nazev <chr>, ulice <chr>,
#> #   c_p <chr>, c_or <chr>, c_obce <chr>, psc <chr>, misto <chr>, telefon <chr>,
#> #   fax <chr>, email_1 <chr>, email_2 <chr>, www <chr>,
#> #   id_dat_schranky_subjektu <chr>, reditel <chr>, x <chr>, je_ovm <chr>,
#> #   zuj <chr>, email_zrizovatele <chr>, id_dat_schranky_zrizovatele <chr>,
#> #   pravni_forma_reditelstvi <chr>, datum_zapisu <chr>