Sunita Kenner

The objective of this post is to:

  1. Supplement and document the process of connecting to an SQL database from RStudio via ODBC.
  2. Show how SQL queries can be executed directly from RStudio

The best resource to start: Follow instructions at rstudio Connect to database.

1. brew install freetds:  "--with-unixodbc has depreciated, not needed"
2. brew install psqlodbc
3. brew install mysql
4. brew install sqliteodbc
5. Find odbcinst.ini
   sudo find / -name odbcinst.ini (should be in usr/local/etc/)
   nano /usr/local/etc/odbcinst.ini

6. Add the following & save:

[PostgreSQL Driver]
Driver          = /usr/local/lib/psqlodbcw.so

[SQLite Driver]
Driver          = /usr/local/lib/libsqlite3odbc.dylib

[ODBC for FreeTDS]
Driver          = /usr/local/lib/libtdsodbc.so

7. Find odbc.ini
    sudo find / -name odbc.ini (should be in usr/local/etc/)
    nano /usr/local/etc/odbc.ini

8. Add the following & save

[PostgreSQL]
Driver              = PostgreSQL Driver
Database            = test_db
Servername          = localhost
UserName            = postgres
Password            = password
Port                = 5432

[SQLite]
Driver          = SQLite Driver
Database=/tmp/testing

[ODBC for FreeTDS]
Driver              = ODBC for FreeTDS
Database            = nameofthedatabase
Servername          = nameofserver.com
UserName            = username
Password            = passwd
Port                = Portnumber

9. In R, run the console command to check if drivers are installed. . .
odbcListDrivers()

10. run the console command 
install.packages("odbc")

Test it. Install the required libraries; details @Yihui’s book on rmarkdown: chapter Other language engines. First establish a connection to a database using DBI::dbConnect() function.

#Load required libraries
library(odbc)
library(DBI)
library(tidyverse)

#Define Connection
myconnection <- DBI::dbConnect(odbc::odbc(),
                 driver = "libtdsodbc.so",
                 server = "Server Name",
                 database = "Database name",
                 UID = "userid",
                 Pwd = "passwd",
                 port = 1433)

The established/defined connection can be accessed via the Connections pane in Rstudio.

connections_pane

connections_pane

Define an sql chunk and run SQL queries in R-markdown using: {sql, connection = myconnection, output.var = “nameoftheoutputfile”}

SELECT *, 
      convert(INT, DATENAME(YEAR, DATEADD(month, 6, First_Pur_Date))) FY_FirstPurchase,
      convert(INT, DATENAME(YEAR, DATEADD(month, 6, MaxMED))) FY_EMaintD,
      convert(INT, MONTH(First_Pur_Date)) Month_FirstPurch
FROM "databasename"."tablename"."dataset"

Use SQL query similar to case_when in R

SELECT *, 
case 
    when Churn = 'No' AND First_Purchase > 1825 AND First_Purchase < f.fpur then '5+ Years Active'
    when Churn = 'No' AND First_Purchase > 1460 AND First_Purchase < 1824 then '4+ Years Active'
    when Churn = 'No' AND First_Purchase > 1095 AND First_Purchase < 1459 then '3+ Years Active'
    when Churn = 'No' AND First_Purchase >730 AND First_Purchase <1094 then '2+ Years Active'
    when Churn = 'No' AND First_Purchase >365 AND First_Purchase <729 then '1+ Years Active'
    when Churn = 'No' AND First_Purchase <365 then 'New Active'
    else 'Inactive' 
    end segment

FROM table.Prod_Churn c inner join (select SystemId, 
                                    max(First_Pur_Date) fpur from table.Prod_Churn 
                                    GROUP by SystemID) f on f.SystemID=c.SystemID
WHERE Product = 'name_of_product'

Thanks to Mike Griffith, our IT ninja guy, and Greg Lee (SQL Ninja) who helped me through this process.

comments powered by Disqus