Thursday 28 September 2017

R Pipe

Cited from the Book "Beginning Data Science in R"

library(magrittr)

The %>% operator takes whatever is computed on the left side of it and inserts it as the first
argument to the function given on the right side, and it does this left to right.


If you are already providing parameters to a function in the pipeline, the left side of %>% is just inserted before those parameters in the pipeline.

Now, you cannot always be so lucky that all the functions you want to call in a pipeline take the left side of the %>% as its first parameter. If this is the case, you can still use the function, though, because "magrittr" interprets . in a special way. If you use . in a function call in a pipeline, then that is where the left side of the %>% operation goes instead of as default first parameter of the right side. So if you need the data to go as the second parameter, you put a . there, since x %>% f(y, .) is equivalent to f(y, x).

The magrittr package does more with . than just changing the order of parameters. You can use . more
than once when calling a function and you can use it in expressions or in function calls.


rnorm(4) %>% data.frame(x = ., is_negative = . < 0)

rnorm(4) %>% data.frame(x = ., y = abs(.))

There is one caveat: If . only appears in function calls, it will still be given as the first expression to the function on the right side of %>%

So by default, f(g(.),h(.)) gets translated into f(.,g(.),h(.)). If you want to avoid this behavior, you can put curly brackets around the function call, since {f(g(.),h(.))} is equivalent to f(g(.),h(.)).

While . is mainly used for providing parameters to functions in a pipeline, it can also be used as a short-hand for defining new functions.

. %>% f is equivalent to writing function(.) f(.)

f <- . %>% cos %>% sin is equivalent to f <- function(.) sin(cos(.))

"magrittr" has lambda expressions. This is a computer science term for anonymous functions, that is, functions that you do not give a name.
data.frame(x, y) %>% (function(d) {  plot(y ~ x, data = d)  abline(lm(y ~ x, data = d))
})


Using . and curly brackets, you can improve the readability (slightly) by just writing the body of the function and referring to the input of it—what was called d above—as '.'.

data.frame(x, y) %>% { 
   plot(y ~ x, data = .) abline(lm(y ~ x, data = .))
}


If you use the operator %$% instead of %>%, you can get to the variables just by naming them instead.

d <- data.frame(x = rnorm(10), y = 4 + rnorm(10))
d %>% {data.frame(mean_x = mean(.$x),
  mean_y = mean(.$y))}

or

d %$% data.frame(mean_x = mean(x), mean_y = mean(y))

%T>% (tee) operator works like the %>% operator but where %>% passes the result of the right side of the expression on, %T>% passes on the result of the left side. The right side is computed but not passed on.

d <- data.frame(x = rnorm(10), y = rnorm(10))
d %T>% plot(y ~ x, data = .) %>% lm(y ~ x, data = .)

The operator %<>% operator assigns the result of a pipeline back to a variable on the left.

d <- read_my_data("/path/to/data")
d %<>% clean_data

Equivalent to

d <- read_my_data("/path/to/data") %>% clean_data
 





 

 

 





 

 

 

 








No comments:

Post a Comment