Introducting rsql: programming SQL in R
SQL is great, but it’s not R
SQL is a powerful language for manipulating data. R is a powerful language for manipulating data. Frequently, data to be analyzed in R actually comes from a database using a SQL query. Fortunately, there are a variety of great R packages for interacting with databases and everything goes smoothly.
Eventually, you find yourself wanting to start doing a little bit more of the SQL in R, since R functions tend to be easier to document, generalize and reuse than SQL scripts. Now, everything is great! You’ve got R functions that get or manipulate data in database tables before bringing the data into R. Everything works, unless you make a make a mistake. Unfortunately, the arguments for those functions are just strings, so it’s rather cumbersome to combine all those things programmatically. Seemingly small generalizations become more and more difficult because, ultimately, R is not SQL.
Why rsql is better
When you operate of a data frame you write this
x.sub = subset(x,subset=(y>0))
Not this
x.sub = subset(x,subset=("y > 0"))
So why would you want to do that just because the data is in a database?
Caveats
Unfortunately, things are just barely less awesome:
x.sub = subset(x,subset=.(y>0))
Check it out on github.
Comments