Thursday 13 July 2017

Reading Notes of R for Data Science

Cited from the book "R for Data Science"

To set an aesthetic manually, set the aesthetic by name as an argument of your geom function; i.e. it goes outside of aes().

To facet your plot by a single variable, use facet_wrap(). The first argument of facet_wrap()should be a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass to facet_wrap()should be discrete.

To facet your plot on the combination of two variables, add facet_grid() to your plot call. The first argument of facet_grid() is also a formula. This time the formula should contain two variable names separated by a ~.

The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation.

To find the variables computed by the stat, look for the help section titled “computed variables”.

ggplot(data = <DATA>)
+
 <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION> ) + <COORDINATE_FUNCTION>
+ <FACET_FUNCTION>
  • Mutating joins, which add new variables to one data frame from matching observations in another.
  • Filtering joins, which filter observations from one data frame based on whether or not they match an observation in the other table.
  • Set operations, which treat observations as if they were set elements.
The variables used to connect each pair of tables are called keys.
  • A primary key uniquely identifies an observation in its own table.
  • A foreign key uniquely identifies an observation in another table.
A variable can be both a primary key and a foreign key.

An inner join matches pairs of observations whenever their keys are equal.

The most important property of an inner join is that unmatched rows are not included in the result.

An inner join keeps observations that appear in both tables. An outer join keeps observations that appear in at least one of the tables. There are three types of outer joins:
  • A left join keeps all observations in x.
  • A right join keeps all observations in y.
  • A full join keeps all observations in x and y.
Beware that the printed representation of a string is not the same as string itself, because the printed representation shows the escapes. To see the raw contents of the string, use writeLines().





No comments:

Post a Comment