| Title: | Database Queries Using 'data.table' Syntax |
|---|---|
| Description: | Query database tables over a 'DBI' connection using 'data.table' syntax. Attach database schemas to the search path. Automatically merge using foreign key constraints. |
| Authors: | Kjell P. Konis [aut, cre], Luis Rocha [ctb] (Chinook Database - see example_files/LICENSE) |
| Maintainer: | Kjell P. Konis <[email protected]> |
| License: | MPL-2.0 |
| Version: | 1.0.8 |
| Built: | 2026-06-09 03:51:38 UTC |
| Source: | https://github.com/kjellpk/dbi.table |
A dbi.table is a data structure that describes a SQL query (called the
dbi.table's underlying SQL query). This query can be manipulated
using data.table's [i, j, by] syntax.
dbi.table(conn, id, check.names = FALSE, key = NULL, stringsAsFactors = FALSE) ## S3 method for class 'dbi.table' x[i, j, by, keyby, nomatch = NA, on = NULL]dbi.table(conn, id, check.names = FALSE, key = NULL, stringsAsFactors = FALSE) ## S3 method for class 'dbi.table' x[i, j, by, keyby, nomatch = NA, on = NULL]
conn |
A |
id |
An |
check.names |
Just as |
key |
A character vector of one or more column names to set as the resulting
|
stringsAsFactors |
A logical value (default is |
x |
A |
i |
A logical expression of the columns of When When When |
j |
A list of expressions, a literal character vector of column names of
|
by |
A list of expressions, a literal character vector of column names of
|
keyby |
Same as |
nomatch |
Either |
on |
|
A dbi.table.
A key marks a dbi.table as sorted with an attribute "sorted".
The sorted columns are the key. The key can be any number of columns.
Unlike data.table, the underlying data are not physically sorted, so
there is no performance improvement. However, there remain benefits to
using keys:
The key provides a default order for window queries so that
functions like shift and
cumsum give reproducible output.
dbi.table's merge method uses a dbi.table's
key to determin the default columns to merge on in the same way
that data.table's merge method does. Note: if a
dbi.table has a foreign key relationship, that will be used
to determin the default columns to merge on before the
dbi.table's key is considered.
A table's primary key is used as the default key when it can be
determined.
Differences vs. data.table Keys
There are a few key differences between dbi.table keys and
data.table keys.
In data.table, NAs are always first. Some databases
(e.g., PostgreSQL) sort NULLs last by default and some
databases (e.g., SQLite) sort them first. as.data.frame does
not change the order of the result set returned by the database.
Note that as.data.table uses the dbi.table's key so
that the resulting data.table is sorted in the usual
data.table way.
The sort is not stable: the order of ties may change on
subsequent evaluations of the dbi.table's underlying SQL
query.
Strict Processing of Keys
By default, when previewing data (dbi.table's print
method), the key is not included in the underlying SQL query's ORDER BY
clause. However, the result set is sorted locally to resepct the key. This
behavior is referred to as a non-strict evaluation of the key and
the printed output labels the key (non-strict). To override the
default behavior for a single preview, call print explicitly and
provide the optional argument strict = TRUE. To change the default
behavior, set the option dbitable.print.strict to TRUE.
Non-strict evaluation of keys reduces the time taken to retrieve the preview.
as.data.frame to retrieve the
results set as a data.frame,
csql to see the underlying SQL query.
# open a connection to the Chinook example database using duckdb duck <- chinook.duckdb() # create a dbi.table corresponding to the Album table on duck Album <- dbi.table(duck, DBI::Id(table_name = "Album")) # the print method displays a 5 row preview # print(Album) Album # 'id' can also be 'SQL'; use the same DBI connection as Album Genre <- dbi.table(Album, DBI::SQL("chinook_duckdb.main.Genre")) # use the extract ([...]) method to subset the dbi.table Album[AlbumId < 5, .(Title, nchar = paste(nchar(Title), "characters"))] # use csql to see the underlying SQL query csql(Album[AlbumId < 5, #WHERE .(Title, #SELECT nchar = paste(nchar(Title), "characters"))])# open a connection to the Chinook example database using duckdb duck <- chinook.duckdb() # create a dbi.table corresponding to the Album table on duck Album <- dbi.table(duck, DBI::Id(table_name = "Album")) # the print method displays a 5 row preview # print(Album) Album # 'id' can also be 'SQL'; use the same DBI connection as Album Genre <- dbi.table(Album, DBI::SQL("chinook_duckdb.main.Genre")) # use the extract ([...]) method to subset the dbi.table Album[AlbumId < 5, .(Title, nchar = paste(nchar(Title), "characters"))] # use csql to see the underlying SQL query csql(Album[AlbumId < 5, #WHERE .(Title, #SELECT nchar = paste(nchar(Title), "characters"))])
Execute a dbi.table's underlying SQL query and return the
result set as a data.frame. By default, the
result set is limited to 10,000 rows. See Details.
## S3 method for class 'dbi.table' as.data.frame( x, row.names = NULL, optional = FALSE, ..., n = getOption("dbitable.max.fetch", 10000L) )## S3 method for class 'dbi.table' as.data.frame( x, row.names = NULL, optional = FALSE, ..., n = getOption("dbitable.max.fetch", 10000L) )
x |
a |
row.names |
a logical value. This argument is not used. |
optional |
a logical value. This argument is not used. |
... |
additional arguments are ignored. |
n |
an integer value. When nonnegative, the underlying SQL query includes a
'LIMIT |
By default, as.data.frame returns up to 10,000 rows (see the
n argument). To override this limit, either call
as.data.frame and provide the n argument (e.g., n = -1
to return the entire result set), or set the option
dbitable.max.fetch to the desired default value of n.
a data.frame.
as.data.frame (the generic method in the
base package).
duck <- chinook.duckdb() Artist <- dbi.table(duck, DBI::Id("Artist")) as.data.frame(Artist, n = 7)[]duck <- chinook.duckdb() Artist <- dbi.table(duck, DBI::Id("Artist")) as.data.frame(Artist, n = 7)[]
Test whether an object is a dbi.table, or coerce it if possible.
is.dbi.table(x) as.dbi.table(conn, x, type = c("auto", "query", "temporary"))is.dbi.table(x) as.dbi.table(conn, x, type = c("auto", "query", "temporary"))
x |
any R object. |
conn |
a connection handle returned by |
type |
a character string. Possible choices are |
Two types of tables are provided: Temporary (when
type == "temporary") and In Query
(when type == "query"). For Temporary, the data are
written to a SQL temporary table and the associated
dbi.table is returned. For In Query, the data are
written into a CTE as part of the query itself - useful when the
connection does not permit creating temporary tables.
a dbi.table.
The temporary tables created by this function are dropped
(by calling dbRemoveTable) during garbage
collection when they are no longer referenced.
duck <- dbi.catalog(chinook.duckdb) csql(as.dbi.table(duck, iris[1:4, 1:3], type = "query"))duck <- dbi.catalog(chinook.duckdb) csql(as.dbi.table(duck, iris[1:4, 1:3], type = "query"))
View a dbi.table's underlying SQL query.
csql(x, n = getOption("dbitable.max.fetch", 10000L), strict = FALSE)csql(x, n = getOption("dbitable.max.fetch", 10000L), strict = FALSE)
x |
a |
n |
a single integer value. When nonnegative, limits the number of rows
returned by the query to |
strict |
a logical value. If |
none (invisible NULL).
The database schema is attached to the R search path. This means that the
schema is searched by R when evaluating a variable, so that
dbi.tables in the schema can be accessed by simply giving
their names.
dbi.attach( what, pos = 2L, name = NULL, warn.conflicts = FALSE, schema = NULL, graphics = TRUE )dbi.attach( what, pos = 2L, name = NULL, warn.conflicts = FALSE, schema = NULL, graphics = TRUE )
what |
a connection handle returned by |
pos |
an integer specifying position in |
name |
a character string specifying the name to use for the attached database. |
warn.conflicts |
a logical value. If |
schema |
a character string specifying the name of the schema to attach. |
graphics |
a logical value; passed to |
an environment, the attached schema is invisibly returned.
dbi.catalog
A dbi.catalog represents a database catalog.
dbi.catalog(conn, schemas)dbi.catalog(conn, schemas)
conn |
a connection handle returned by |
schemas |
a character vector of distinct schema names. These schemas will be loaded
into the |
a dbi.catalog.
# chinook.duckdb is a zero-argument function that returns a DBI handle (db <- dbi.catalog(chinook.duckdb)) # list schemas ls(db) # list the tables in the schema 'main' ls(db$main)# chinook.duckdb is a zero-argument function that returns a DBI handle (db <- dbi.catalog(chinook.duckdb)) # list schemas ls(db) # list the tables in the schema 'main' ls(db$main)
dbi.tablesCall DBI methods using the underlying DBI connection.
## S4 method for signature 'dbi.catalog' dbAppendTable(conn, name, value, ..., row.names = NULL) ## S4 method for signature 'dbi.schema' dbAppendTable(conn, name, value, ..., row.names = NULL) ## S4 method for signature 'dbi.table' dbAppendTable(conn, name, value, ..., row.names = NULL) ## S4 method for signature 'dbi.catalog' dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) ## S4 method for signature 'dbi.schema' dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) ## S4 method for signature 'dbi.table' dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) ## S4 method for signature 'dbi.catalog,ANY' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.schema,ANY' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.table,ANY' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.catalog' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.schema' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.table' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.table,missing' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.catalog,ANY' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.schema,ANY' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.table,ANY' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.catalog,ANY' dbListFields(conn, name, ...) ## S4 method for signature 'dbi.schema,ANY' dbListFields(conn, name, ...) ## S4 method for signature 'dbi.table,ANY' dbListFields(conn, name, ...) ## S4 method for signature 'dbi.catalog' dbListObjects(conn, prefix = NULL, ...) ## S4 method for signature 'dbi.schema' dbListObjects(conn, prefix = NULL, ...) ## S4 method for signature 'dbi.table' dbListObjects(conn, prefix = NULL, ...) ## S4 method for signature 'dbi.catalog,ANY' dbQuoteIdentifier(conn, x, ...) ## S4 method for signature 'dbi.schema,ANY' dbQuoteIdentifier(conn, x, ...) ## S4 method for signature 'dbi.table,ANY' dbQuoteIdentifier(conn, x, ...) ## S4 method for signature 'dbi.catalog' dbQuoteLiteral(conn, x, ...) ## S4 method for signature 'dbi.schema' dbQuoteLiteral(conn, x, ...) ## S4 method for signature 'dbi.table' dbQuoteLiteral(conn, x, ...) ## S4 method for signature 'dbi.catalog,ANY' dbQuoteString(conn, x, ...) ## S4 method for signature 'dbi.schema,ANY' dbQuoteString(conn, x, ...) ## S4 method for signature 'dbi.table,ANY' dbQuoteString(conn, x, ...) ## S4 method for signature 'dbi.catalog,ANY' dbReadTable(conn, name, ...) ## S4 method for signature 'dbi.schema,ANY' dbReadTable(conn, name, ...) ## S4 method for signature 'dbi.table,ANY' dbReadTable(conn, name, ...) ## S4 method for signature 'dbi.catalog,ANY' dbRemoveTable(conn, name, ...) ## S4 method for signature 'dbi.schema,ANY' dbRemoveTable(conn, name, ...) ## S4 method for signature 'dbi.table,ANY' dbRemoveTable(conn, name, ...) ## S4 method for signature 'dbi.table,missing' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.catalog,ANY' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.schema,ANY' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.table,ANY' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.catalog' dbWithTransaction(conn, code, ...) ## S4 method for signature 'dbi.schema' dbWithTransaction(conn, code, ...) ## S4 method for signature 'dbi.table' dbWithTransaction(conn, code, ...) ## S4 method for signature 'dbi.catalog,ANY' dbWriteTable(conn, name, value, ...) ## S4 method for signature 'dbi.schema,ANY' dbWriteTable(conn, name, value, ...) ## S4 method for signature 'dbi.table,ANY' dbWriteTable(conn, name, value, ...)## S4 method for signature 'dbi.catalog' dbAppendTable(conn, name, value, ..., row.names = NULL) ## S4 method for signature 'dbi.schema' dbAppendTable(conn, name, value, ..., row.names = NULL) ## S4 method for signature 'dbi.table' dbAppendTable(conn, name, value, ..., row.names = NULL) ## S4 method for signature 'dbi.catalog' dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) ## S4 method for signature 'dbi.schema' dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) ## S4 method for signature 'dbi.table' dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) ## S4 method for signature 'dbi.catalog,ANY' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.schema,ANY' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.table,ANY' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.catalog' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.schema' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.table' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.table,missing' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.catalog,ANY' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.schema,ANY' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.table,ANY' dbGetQuery(conn, statement, ...) ## S4 method for signature 'dbi.catalog,ANY' dbListFields(conn, name, ...) ## S4 method for signature 'dbi.schema,ANY' dbListFields(conn, name, ...) ## S4 method for signature 'dbi.table,ANY' dbListFields(conn, name, ...) ## S4 method for signature 'dbi.catalog' dbListObjects(conn, prefix = NULL, ...) ## S4 method for signature 'dbi.schema' dbListObjects(conn, prefix = NULL, ...) ## S4 method for signature 'dbi.table' dbListObjects(conn, prefix = NULL, ...) ## S4 method for signature 'dbi.catalog,ANY' dbQuoteIdentifier(conn, x, ...) ## S4 method for signature 'dbi.schema,ANY' dbQuoteIdentifier(conn, x, ...) ## S4 method for signature 'dbi.table,ANY' dbQuoteIdentifier(conn, x, ...) ## S4 method for signature 'dbi.catalog' dbQuoteLiteral(conn, x, ...) ## S4 method for signature 'dbi.schema' dbQuoteLiteral(conn, x, ...) ## S4 method for signature 'dbi.table' dbQuoteLiteral(conn, x, ...) ## S4 method for signature 'dbi.catalog,ANY' dbQuoteString(conn, x, ...) ## S4 method for signature 'dbi.schema,ANY' dbQuoteString(conn, x, ...) ## S4 method for signature 'dbi.table,ANY' dbQuoteString(conn, x, ...) ## S4 method for signature 'dbi.catalog,ANY' dbReadTable(conn, name, ...) ## S4 method for signature 'dbi.schema,ANY' dbReadTable(conn, name, ...) ## S4 method for signature 'dbi.table,ANY' dbReadTable(conn, name, ...) ## S4 method for signature 'dbi.catalog,ANY' dbRemoveTable(conn, name, ...) ## S4 method for signature 'dbi.schema,ANY' dbRemoveTable(conn, name, ...) ## S4 method for signature 'dbi.table,ANY' dbRemoveTable(conn, name, ...) ## S4 method for signature 'dbi.table,missing' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.catalog,ANY' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.schema,ANY' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.table,ANY' dbSendStatement(conn, statement, ...) ## S4 method for signature 'dbi.catalog' dbWithTransaction(conn, code, ...) ## S4 method for signature 'dbi.schema' dbWithTransaction(conn, code, ...) ## S4 method for signature 'dbi.table' dbWithTransaction(conn, code, ...) ## S4 method for signature 'dbi.catalog,ANY' dbWriteTable(conn, name, value, ...) ## S4 method for signature 'dbi.schema,ANY' dbWriteTable(conn, name, value, ...) ## S4 method for signature 'dbi.table,ANY' dbWriteTable(conn, name, value, ...)
conn |
A |
name |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
value |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
... |
Additional parameters to pass to methods. |
row.names |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
fields |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
temporary |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
statement |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
dbObj |
A |
prefix |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
x |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
code |
Please refer to the documentation for the generic function (links can be found in the 'See Also' section). |
dbAppendTable,
dbCreateTable,
dbExecute,
dbGetInfo,
dbGetQuery,
dbListObjects,
dbReadTable,
dbQuoteIdentifier,
dbQuoteLiteral,
dbQuoteString,
dbRemoveTable,
dbSendStatement,
dbWithTransaction
These zero-argument functions return connections to the example databases included in the dbi.table package.
chinook.sqlite() chinook.duckdb()chinook.sqlite() chinook.duckdb()
a DBIConnection object, as
returned by dbConnect.
Merge two dbi.tables. By default, the columns to merge on are
determined by the first of the following cases to apply.
If x and y are each unmodified dbi.tables in
the same dbi.catalog and if there is a single foreign key
relating x and y (either x referencing
y, or y referencing x), then it is used to set
by.x and by.y.
If x and y have shared key columns, then they are
used to set by (that is,
by = intersect(key(x), key(y)) when
intersect(key(x), key(y)) has length greater than zero).
If x has a key, then it is used to set by (that is,
by = key(x) when key(x) has length greater than
zero).
If x and y have columns in common, then they are used
to set
by (that is, by = intersect(names(x), names(y)) when
intersect(names(x), names(y)) has length greater than zero).
Use the by, by.x, and by.y arguments explicitly to
override this default.
## S3 method for class 'dbi.table' merge( x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x", ".y"), no.dups = TRUE, recursive = FALSE, ... )## S3 method for class 'dbi.table' merge( x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x", ".y"), no.dups = TRUE, recursive = FALSE, ... )
x, y
|
|
by |
a character vector of shared column names in |
by.x, by.y
|
character vectors of column names in |
all |
a logical value. |
all.x |
a logical value. When |
all.y |
a logical value. Analogous to |
sort |
a logical value. When TRUE (default), the key of the merged
|
suffixes |
a length-2 character vector. The suffixes to be used for making
non- |
no.dups |
a logical value. When |
recursive |
a logical value. Only used when |
... |
additional arguments are passed to |
merge.dbi.table uses sql.join to join x and
y then formats the result set to match the typical merge
output.
a dbi.table.
merge.data.table,
merge.data.frame
chinook <- dbi.catalog(chinook.duckdb) #The Album table has a foreign key constriant that references Artist merge(chinook$main$Album, chinook$main$Artist) #When y is omitted, x's foreign key relationship is used to determine y merge(chinook$main$Album) #Track has 3 foreign keys: merge with Album, Genre, and MediaType merge(chinook$main$Track) #Track references Album but not Artist, Album references Artist #This dbi.table includes the artist name merge(chinook$main$Track, recursive = TRUE)chinook <- dbi.catalog(chinook.duckdb) #The Album table has a foreign key constriant that references Artist merge(chinook$main$Album, chinook$main$Artist) #When y is omitted, x's foreign key relationship is used to determine y merge(chinook$main$Album) #Track has 3 foreign keys: merge with Album, Genre, and MediaType merge(chinook$main$Track) #Track references Album but not Artist, Album references Artist #This dbi.table includes the artist name merge(chinook$main$Track, recursive = TRUE)
dbi.table vs. Reference ImplementationEvaluate an expression including at least one dbi.table and compare
the result with the Reference Implementation. This function is
primarily for testing and is potentially very slow for large tables.
reference.test( expr, envir = parent.frame(), ignore.row.order = TRUE, verbose = TRUE )reference.test( expr, envir = parent.frame(), ignore.row.order = TRUE, verbose = TRUE )
expr |
an expression involving at least one |
envir |
an environment. Where to evaluate |
ignore.row.order |
a logical value. This argument is passed to |
verbose |
a logical value. When |
a logical value.
Suppose that id1 identifies a table in a SQL database and that
[i, j, by] describes a subset/select/summarize operation using
data.table syntax. The Reference Implementation for this
operation is:
setDT(dbReadTable(conn, id1))[i, j, by]
More generally, for an expression involving multiple SQL database objects
and using data.table syntax, the Reference Implementation
would be to download each of these objects in their entirety, convert them
to data.tables, then evaluate the expression.
The goal of the dbi.table is to generate an SQL query that produces the same results set as the Reference Implementation up to row ordering.
library(data.table) duck <- dbi.catalog(chinook.duckdb) Album <- duck$main$Album Artist <- duck$main$Artist reference.test(merge(Album, Artist, by = "ArtistId"))library(data.table) duck <- dbi.catalog(chinook.duckdb) Album <- duck$main$Album Artist <- duck$main$Artist reference.test(merge(Album, Artist, by = "ArtistId"))
dbi.tablesA SQL-like join of two dbi.tables that share the
same DBI connection. All columns from
both dbi.tables are returned.
sql.join(x, y, type = "inner", on = NULL, prefixes = c("x.", "y."))sql.join(x, y, type = "inner", on = NULL, prefixes = c("x.", "y."))
x, y
|
|
type |
a character string specifying the join type. Valid choices are
|
on |
a |
prefixes |
a 2-element character vector of distinct values. When |
a dbi.table.
chinook <- dbi.catalog(chinook.duckdb) Album <- chinook$main$Album Artist <- chinook$main$Artist sql.join(Album, Artist, type = "inner", on = Album.ArtistId == Artist.ArtistId, prefixes = c("Album.", "Artist."))chinook <- dbi.catalog(chinook.duckdb) Album <- chinook$main$Album Artist <- chinook$main$Artist sql.join(Album, Artist, type = "inner", on = Album.ArtistId == Artist.ArtistId, prefixes = c("Album.", "Artist."))