Home > other >  Pandas group in series
Pandas group in series

Time:09-02

Given

df = pd.DataFrame({'group': [1, 1, 2, 1, 1], 'value':['a','b','c','d','e']})

I need to treat a and b as one group, c as second group, d and e as third group. How to get first element from every group?

pd.DataFrame({'group': [1, 2, 1,], 'value':['a','c','d']})

CodePudding user response:

Try this:

df1 = df[df['group'].ne(df['group'].shift())]

Check this answer for more details

CodePudding user response:

You haven't specified if the group column tells whether the values are considered to be in the same group. So I'm assumming it has no connection, and you specify your groups in the groups list:

groups = [['a', 'b'], ['c'], ['d', 'e']]

condlist = [df['value'].isin(group) for group in groups]
choicelist = list(range(len(groups)))
group_idx = np.select(condlist, choicelist)

df.groupby(group_idx).first()

Result:

   group value
0      1     a
1      2     c
2      1     d

CodePudding user response:

You can create your groups and map them to a reduced output:

df = pd.DataFrame({'group': [1, 1, 2, 1, 1], 'value':['a','b','c','d','e']})
groups = [['a', 'b'], ['c'], ['d', 'e']]
mappings = {k: i for i, gr in enumerate(groups) for k in gr}

print(
    df.groupby(df['value'].map(mappings)).first()
)
       group value
value             
0          1     a
1          2     c
2          1     d
  • Related