Home > database >  How to find the next row that have a value in column in a dataframe pandas?
How to find the next row that have a value in column in a dataframe pandas?

Time:07-16

I have a dataframe such as:

id     info    date     group    label
1       aa     02/05        1        7
2       ba     02/05        1        8
3       cp     09/05        2        7
4       dd     09/05        2        8
5       ii     09/05        2        9

Every group should have the numbers 7, 8 and 9. In the example above, the group 1 does not have the three numbers, the number 9 is missing. In that case, I would like to find the closest row with a 9 in the label, and add it to the dataframe, also changing the date to the group's date.

So the desired result would be:

id     info    date     group    label
1       aa     02/05        1        7
2       ba     02/05        1        8
6       ii     02/05        1        9
3       cp     09/05        2        7
4       dd     09/05        2        8
5       ii     09/05        2        9

CodePudding user response:

Welcome to SO. Its good if you include what you have tried so far so keep that in mind. Anyhow for this question, breakdown your thought process into pandas syntax. Like first step would be to check what group do not have which label from [8,9]:

dfs = df.groupby(['group', 'date']).agg({'label':set}).reset_index().sort_values('group')
dfs['label'] = dfs['label'].apply(lambda x: {8, 9}.difference(x)).explode() # This is the missing label
dfs

Which will give you:

group date label
1 02/05 9
2 09/05 nan

Now merge it with original on label and have info filled in:

final_df = pd.concat([df, dfs.merge(df[['label', 'info']], on='label', suffixes=['','_grouped'])])
final_df
id info date group label
1 aa 02/05 1 7
2 ba 02/05 1 8
3 cp 09/05 2 7
4 dd 09/05 2 8
5 ii 09/05 2 9
nan ii 02/05 1 9

And prettify:

final_df.reset_index(drop=True).reset_index().assign(id=lambda x:x['index'] 1).drop(columns=['index']).sort_values(['group', 'id'])
id info date group label
1 aa 02/05 1 7
2 ba 02/05 1 8
6 ii 02/05 1 9
3 cp 09/05 2 7
4 dd 09/05 2 8
5 ii 09/05 2 9
  • Related