找到你要的答案

Q:RODBC-Package: Select rows in a database table that match values in an R-vector

Q:rodbc包装:选择数据库中的表的行,在一个r-vector匹配值

I have list of about 150'000 ID-numbers in a R-vector (ids) and a big data base table (dbo.datafromhell) with columnnames "IDnr", "V1" and "V2" and about 1.6 Mio rows.

I'd like to select rows from dbo.datafromhell with ID-numbers (IDnr) that match to the values in "ids"

ids <- c(1,2,3,4) #in real: 150'000 id-numbers

I tried the following query with a "where-in-list"-statement, but it used to much resources and terminated with an error:

df <- sqlQuery(mycon, paste("SELECT * FROM dbo.datafromhell WHERE IDnr IN (",paste(ids,sep="",collapse=","),")"))

I suppose my value list (ids) is to big for a "where-in-list"-statement.

At the end I'd like a data frame that contains only rows with matching IDs between the value list and the data base table:

 IDnr V1    V2
    1  A  TRUE
    2  B FALSE
    3  C  TRUE
    4  D FALSE

#in real: about 150'000 rows (different IDnr)

I also tried to write a sqlQuery that joins the data base table directly to my value list. but I couldn't figure it out.

Any help is highly appreciated!

我有大约150名身份证号码列表在r-vector(IDS)和一个大的数据库表(dbo。datafromhell)与columnnames”idnr”、“V1和V2”和1.6百万行。

我想选择与身份证号码dbo.datafromhell行(idnr)匹配的值在“ID”

ids <- c(1,2,3,4) #in real: 150'000 id-numbers

我尝试了下面的查询“列表在哪里”-语句,但它使用了很多资源,并终止了一个错误:

df <- sqlQuery(mycon, paste("SELECT * FROM dbo.datafromhell WHERE IDnr IN (",paste(ids,sep="",collapse=","),")"))

我想我的价值清单(IDS)对于一个“在列表中”的语句来说是很大的。

最后,我想要一个数据框,它只包含在值列表和数据库表之间具有匹配id的行:

 IDnr V1    V2
    1  A  TRUE
    2  B FALSE
    3  C  TRUE
    4  D FALSE

#in real: about 150'000 rows (different IDnr)

我也试着写一个SQL查询,加入数据库表直接到我的值列表。但我无法找出它。

任何帮助是高度赞赏!

answer1: 回答1:

The fasted option is to save the 150.000 ID's to a staging table in the database and then create a query on both tables.

禁食的选择是保存150的ID是一个临时表在数据库中创建表和查询。

answer2: 回答2:

A (slow) work-around is to split your ids into chunks of manageable size, run a query for each chunk and rbind all the results. Something like this:

#define how big is manageable
manSize<-1000
do.call(rbind,
        lapply(split(ids,seq_along(ids)%/%manSize), 
              function(x) sqlQuery(mycon, paste("SELECT * FROM dbo.datafromhell WHERE IDnr IN (",paste(x,sep="",collapse=","),")"))))

一个(慢)的工作是把你的ID到可管理的大小的块,每个块运行查询和rbind所有结果。像这样的东西:

#define how big is manageable
manSize<-1000
do.call(rbind,
        lapply(split(ids,seq_along(ids)%/%manSize), 
              function(x) sqlQuery(mycon, paste("SELECT * FROM dbo.datafromhell WHERE IDnr IN (",paste(x,sep="",collapse=","),")"))))
r  join  where  rodbc