找到你要的答案

Q:Data table assignment not working

Q:数据表分配不工作

Ok so I am cleaning a large dataset and trying to speed things up by changing data frame code to data table. I'm having trouble with conditional assigment of missing value codes. Toy example:

    X = data.table(grp=c("a","a","b","b","b","c","c","d","d","d","d"), 
    foo=c(1:4,NA,6:7,NA,8:10))
    setkey(X,grp)
    err.code <-"1111"
    row.select <- row.names(X)[X$grp=="b" & is.na(X$foo)]

    # Replace missing value for group b with err.code
    X[row.select, foo:=err.code]

So I want to put the err.code into specific cells meeting criteria. Yet the above has not assigned anything. e.g.

    > X
        grp foo
     1:   a   1
     2:   a   2
     3:   b   3
     4:   b   4
     5:   b  NA
     6:   c   6
     7:   c   7
     8:   d  NA
     9:   d   8
    10:   d   9
    11:   d  10

What am I missing here ?

好的,所以我正在清理一个大数据集,并试图通过将数据帧代码更改为数据表来加快速度。我有缺失值的代码条件分配问题。玩具的例子:

    X = data.table(grp=c("a","a","b","b","b","c","c","d","d","d","d"), 
    foo=c(1:4,NA,6:7,NA,8:10))
    setkey(X,grp)
    err.code <-"1111"
    row.select <- row.names(X)[X$grp=="b" & is.na(X$foo)]

    # Replace missing value for group b with err.code
    X[row.select, foo:=err.code]

所以我想把err.code成特定细胞会议标准。然而上面没有分配任何东西。例如

    > X
        grp foo
     1:   a   1
     2:   a   2
     3:   b   3
     4:   b   4
     5:   b  NA
     6:   c   6
     7:   c   7
     8:   d  NA
     9:   d   8
    10:   d   9
    11:   d  10

我错过了什么?

answer1: 回答1:

Two problems I see:

  1. You're trying to replace a value in a numeric column with a character. data.table doesn't like that unless you explicitly convert the column types to match each other.
  2. You're trying to index the row by a character value of "5" and not the numeric value of 5.

Thus, the following should work:

err.code <- 1111
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[row.select, foo := err.code][]
#     grp  foo
#  1:   a    1
#  2:   a    2
#  3:   b    3
#  4:   b    4
#  5:   b 1111
#  6:   c    6
#  7:   c    7
#  8:   d   NA
#  9:   d    8
# 10:   d    9
# 11:   d   10

Alternatively, without creating those extra variables:

X[grp == "b" & is.na(foo), foo := 1111]

If you think that different column types would be the problem you'll need to explicitly convert them first:

err.code <- "1111"
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[, foo := as.character(foo)][row.select, foo := err.code][]
#     grp  foo
#  1:   a    1
#  2:   a    2
#  3:   b    3
#  4:   b    4
#  5:   b 1111
#  6:   c    6
#  7:   c    7
#  8:   d   NA
#  9:   d    8
# 10:   d    9
# 11:   d   10
str(.Last.value)
# Classes ‘data.table’ and 'data.frame':    11 obs. of  2 variables:
# $ grp: chr  "a" "a" "b" "b" ...
# $ foo: chr  "1" "2" "3" "4" ...
# - attr(*, ".internal.selfref")=<externalptr> 
# - attr(*, "sorted")= chr "grp"

我看到的两个问题:

  1. You're trying to replace a value in a numeric column with a character. data.table doesn't like that unless you explicitly convert the column types to match each other.
  2. You're trying to index the row by a character value of "5" and not the numeric value of 5.

因此,下面应该工作:

err.code <- 1111
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[row.select, foo := err.code][]
#     grp  foo
#  1:   a    1
#  2:   a    2
#  3:   b    3
#  4:   b    4
#  5:   b 1111
#  6:   c    6
#  7:   c    7
#  8:   d   NA
#  9:   d    8
# 10:   d    9
# 11:   d   10

或者,不创建这些额外的变量:

X[grp == "b" & is.na(foo), foo := 1111]

如果您认为不同的列类型将是问题,您需要首先显式转换它们:

err.code <- "1111"
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[, foo := as.character(foo)][row.select, foo := err.code][]
#     grp  foo
#  1:   a    1
#  2:   a    2
#  3:   b    3
#  4:   b    4
#  5:   b 1111
#  6:   c    6
#  7:   c    7
#  8:   d   NA
#  9:   d    8
# 10:   d    9
# 11:   d   10
str(.Last.value)
# Classes ‘data.table’ and 'data.frame':    11 obs. of  2 variables:
# $ grp: chr  "a" "a" "b" "b" ...
# $ foo: chr  "1" "2" "3" "4" ...
# - attr(*, ".internal.selfref")=<externalptr> 
# - attr(*, "sorted")= chr "grp"
r  data.table  variable-assignment