找到你要的答案

Q:How do I get the index of each item in a groupby object in Pandas?

Q:我如何在大熊猫GroupBy对象得到各项目的指标?

I use groupby on a dataframe based on the columns I want and then I have to take the index of each item in its group. By index I mean, if there are 10 items in a group, the index goes from 0 to 9, not the dataframe index.

My code for doing this is below:

import pandas as pd

df = pd.DataFrame({'A': np.random.randint(0, 11, 10 ** 3), 'B': np.random.randint(0, 11, 10 ** 3), 
                   'C': np.random.randint(0, 11, 10 ** 3), 'D': np.random.randint(0, 2, 10 ** 3)})

grouped_by = df.groupby(["A", "B", "C"])
groups = dict(list(grouped_by))
index_dict = {k: v.index.tolist() for k,v in groups.items()}
df["POS"] = df.apply(lambda x: index_dict[(x["A"], x["B"], x["C"])].index(x.name), axis=1)

The dataframe here is just an example.

Is there a way to use the grouped_by to achieve this ?

我用一个帧组基于我想要的列,然后我要每项指标的组。通过指数,我的意思是,如果一组中有10项指标从0到9,而不是帧索引。

我这样做的代码如下:

import pandas as pd

df = pd.DataFrame({'A': np.random.randint(0, 11, 10 ** 3), 'B': np.random.randint(0, 11, 10 ** 3), 
                   'C': np.random.randint(0, 11, 10 ** 3), 'D': np.random.randint(0, 2, 10 ** 3)})

grouped_by = df.groupby(["A", "B", "C"])
groups = dict(list(grouped_by))
index_dict = {k: v.index.tolist() for k,v in groups.items()}
df["POS"] = df.apply(lambda x: index_dict[(x["A"], x["B"], x["C"])].index(x.name), axis=1)

这里的数据帧是一个例子。

有没有办法使用grouped_by做到这一点呢?

answer1: 回答1:

Here's a solution using cumcount() on a dummy variable to generate a item index for each group. It should be significantly faster too.

In [122]: df['dummy'] = 0
     ...: df["POS"] = df.groupby(['A','B','C'])['dummy'].cumcount()
     ...: df = df.drop('dummy', axis=1)

As @unutbu noted, even cleaner just to use:

df["POS"] = df.groupby(['A','B','C']).cumcount()

这里有一个解决方案使用cumcount()在虚拟变量来生成每个组一项指标。它应该明显更快。

In [122]: df['dummy'] = 0
     ...: df["POS"] = df.groupby(['A','B','C'])['dummy'].cumcount()
     ...: df = df.drop('dummy', axis=1)

“unutbu指出,即使只使用清洁:

df["POS"] = df.groupby(['A','B','C']).cumcount()
python  pandas