1 year ago
#72287
John Thomas
How to use odbc connector using paralell processing in R?
So I am using foreach to access data from a snowflake database using R as well as foreach.
Here is the error I get:
Error in { :
task 1 failed - "nanodbc/nanodbc.cpp:1655: 22018: Numeric value '3175519|ddf' is not recognized
<SQL> 'SELECT * FROM MYTABLE WHERE ID = 1'"
So my question here is how do I adjust my code to divide up 7mil rows into the 10 equal parts to process in parallel.
My code is as follows:
library(foreach)
library(doParallel)
#Creating the cluster
cl <- makeCluster(detectCores() -1)
# Defining packages and variables for cluster
clusterEvalQ(cl, {
library(odbc)
library(DBI)
conn <- DBI::dbConnect(odbc::odbc(), "DSN_NAME", uid="id_name")
NULL
})
#Register the Cluster
registerDoParallel(cl)
#Start the Parallel Loop
results <- foreach(i= 1:10, .combine = "rbind", .inorder = FALSE) %dopar% {
sql_text <- "SELECT * FROM MYTABLE WHERE ID = ?id"
sql_query <- DBI::sqlInterpolate(conn, sql_text, id = i)
q <- DBI::dbGetQuery(conn, sql_query)
}
#Closing connection in all clusters
clusterEvalQ(cl, {
dbDisconnect(conn)
})
# Stopping cluster
stopCluster(cl)
stopImplicitCluster()
The unique ID in my table is a character string, so the following line needs to be changed but I am unsure to what:
sql_text <- "SELECT * FROM MYTABLE WHERE ID = ?id"
r
foreach
parallel-processing
odbc
snowflake-cloud-data-platform
0 Answers
Your Answer