I have a dataframe containing three columns: 'Bin No', 'Intervals' and 'Other', where the first two contain numerical values and the last contains values of object type. I'm trying to collapse the rows if a value in 'Bin No' exists more than once, keeping values from the other two columns. My current DataFrame looks like the following:
Intervals Bin No Other
0 1.5 1
1 2.5 2
4 3.5 3
6 4.5 4
8 5.5 5
9 8.5 6
11 12.5 7
2 16.0 8
3 22.0 9
5 37.0 10
12 10 [-99999997, -3]
7 61.0 11
10 87.0 12
13 14 -2
14 15 -1
In this case 10 appears twice in 'Bin No' so the dataframe should look like:
Bin No Intervals Other
0 1 1.5
7 2 2.5
8 3 3.5
9 4 4.5
10 5 5.5
11 6 8.5
12 7 12.5
13 8 16.0
14 9 22.0
2 10 37.0 [-99999997, -3]
3 11 61.0
4 12 87.0
5 14 -2
6 15 -1
I've managed to find a solution using the following code, however I'm concerned this may create issues later on with df.drop_duplicates
:
df_one = df.groupby('Bin No', as_index=False).agg(''.join)
df_two = pd.merge(df_two, df.drop('Intervals', axis=1), how='left', on='Bin No')
df_three = df_two.drop_duplicates('Bin No', keep='last')
Is there a more effective and less ambiguous solution?
CodePudding user response:
You could replace the empty string with NaN values; then use groupby
first
(take advantage of the fact that first
skips NaNs). You could replace NaN back to empty strings but I guess that's kind of redundant:
out = df.replace('', np.nan).groupby('Bin No', as_index=False)[['Intervals', 'Other']].first()
out['Other'] = out['Other'].fillna('')
Output:
Bin_No Intervals Other
0 1 1.5
1 2 2.5
2 3 3.5
3 4 4.5
4 5 5.5
5 6 8.5
6 7 12.5
7 8 16.0
8 9 22.0
9 10 37.0 [-99999997, -3]
10 11 61.0
11 12 87.0
12 14 NaN -2
13 15 NaN -1