How to use odbc connector using paralell processing in R?

2 years ago

#72287

John Thomas

So I am using foreach to access data from a snowflake database using R as well as foreach.

Here is the error I get:

Error in { : 
  task 1 failed - "nanodbc/nanodbc.cpp:1655: 22018: Numeric value '3175519|ddf' is not recognized 
<SQL> 'SELECT * FROM MYTABLE WHERE ID = 1'"

So my question here is how do I adjust my code to divide up 7mil rows into the 10 equal parts to process in parallel.

My code is as follows:

library(foreach)
library(doParallel)

#Creating the cluster
cl <- makeCluster(detectCores() -1)

# Defining packages and variables for cluster
clusterEvalQ(cl, {
  library(odbc)
  library(DBI)
  
  conn <- DBI::dbConnect(odbc::odbc(), "DSN_NAME", uid="id_name")
  NULL
})

#Register the Cluster
registerDoParallel(cl)

#Start the Parallel Loop
results <- foreach(i= 1:10, .combine = "rbind", .inorder = FALSE) %dopar% {
  sql_text <- "SELECT * FROM MYTABLE WHERE ID = ?id"
  sql_query <- DBI::sqlInterpolate(conn, sql_text, id = i)
  q <- DBI::dbGetQuery(conn, sql_query)
}

#Closing connection in all clusters
clusterEvalQ(cl, {
  dbDisconnect(conn)
})

# Stopping cluster
stopCluster(cl)
stopImplicitCluster()

The unique ID in my table is a character string, so the following line needs to be changed but I am unsure to what:

sql_text <- "SELECT * FROM MYTABLE WHERE ID = ?id"

foreach

parallel-processing

odbc

snowflake-cloud-data-platform

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs