Within the frame of a data.table, columns can be referred to as if they are variables.
We can use “-” on a character columns within the frame of a data.table to sort in decreasing order.
We wrap the variables (column names) within list(), which ensures that a data.table is returned. In case of a single column name, not wrapping with list() returns a vector instead.
data.table also allows using .() to wrap columns with. It is an alias to list(); they both mean the same. Feel free to use whichever you prefer.
Since .() is just an alias for list(), we can name columns as we would while creating a list.
For example,
ans <- flights[, .(delay_arr = arr_delay, delay_dep = dep_delay)]
Speical symbol .N is a special in-built variable that holds the number of observations in the current group.
We can also deselect columns using - or !.
.SD would contain all the columns other than the grouping variables by default.
Using the argument .SDcols. It accepts either column names or column indices. For example, .SDcols = c("arr_delay", "dep_delay") ensures that .SD contains only these two columns for each group.
######################################
Cited from Keys and fast binary search based subset
We can set keys on multiple columns and the column can be of different types. Uniqueness is not enforced.
Setting a key does two things:
- reorders the rows of the data.table by the column(s) provided by reference, always in increasing order.
- marks those columns as key columns by setting an attribute called sorted to the data.table.
setkey() and setkeyv() modify the input data.table by reference. They return the result invisibly.
In data.table, the := operator and all the set* (e.g., setkey, setorder, setnames etc..) functions are the only ones which modify the input object by reference.
In addition to ordering, keyby also sets the key column.
######################################
Cited from Reference semantics
:= returns the result invisibly. Sometimes it might be necessary to see the result after the assignment. We can accomplish that by adding an empty [] at the end of the query, like flights[hour == 24L, hour := 0L][].
The copy() function deep copies the input object and therefore any subsequent update by reference operations performed on the copied object will not affect the original object.
######################################
Cited from Efficient reshaping using data.tables
By default, variable column is of type factor. Set variable.factor argument to FALSE if you’d like to return a character vector instead.
No comments:
Post a Comment