Skip to contents

pointblankops extends the functionality of the {pointblank} to enable efficient row-level failure reporting.

It adds the concept of operatives as lightweight alternatives to pointblank’s agents, designed for streamlined data validation and reporting. Operatives are memory-efficient and can handle large datasets by processing them in chunks. You get information from them by debrief()ing them, which returns a tibble of failures with specified row identifiers.

Installation

You can install the development version of pointblankops from GitHub with:

# install.packages("devtools")
remotes::install_github("petrbouchal/pointblankops")

Example

This is a basic example which shows you how to use operatives for data validation:

library(pointblankops)
library(dplyr)

# Create some test data
test_data <- data.frame(
  batch = c("A", "A", "B", "B", "C"),
  id = c(1, 2, 3, 4, 5),
  value = c(10, NA, 15, 8, 12),
  category = c("X", "Y", "X", "Z", "Y")
)

# Create an operative and add validation steps
operative <- create_operative(test_data) %>%
  pointblank::col_vals_not_null(columns = dplyr::vars(value)) %>%
  pointblank::col_vals_between(columns = dplyr::vars(value), left = 5, right = 20)

# Debrief the operative to get only the failures
failures <- debrief(operative, row_id_col = c("batch", "id"))
failures
#> # A tibble: 1 × 6
#>   batch id    test_name test_type         column_name failure_details           
#>   <chr> <chr> <chr>     <chr>             <chr>       <chr>                     
#> 1 A     2     step_1    col_vals_not_null value       Failed col_vals_not_null …

For database operations, install DBI package:

For parquet file operations, install arrow package:

Key Features

  • Lightweight operatives: Streamlined alternatives to pointblank agents
  • Memory-efficient processing: Chunked processing for large datasets
  • Multiple output formats: Return tibbles, save to parquet files, or write to databases
  • Database compatibility: Works with local data frames and database tables (DuckDB, SQLite)
  • Flexible ID columns: Support for multiple row identifier columns
  • Consistent naming: Follows pointblank’s playful terminology (agents → operatives, interrogate → debrief)