Is the following dataframe
import numpy as np
import pandas as pd
df = pd.DataFrame([[1001, 120,np.nan], [1001,np.nan ,30], [1004, 160,np.nan],[1005, 160,np.nan],
[1006,np.nan ,8], [1010, 160,np.nan],[1010,np.nan ,4]], columns=['CustomerNr','Period1','Period2'])
CustomerNr | Period1 | Period2 | |
---|---|---|---|
0 | 1001 | 120.0 | NaN |
1 | 1001 | NaN | 30.0 |
2 | 1004 | 160.0 | NaN |
3 | 1005 | 160.0 | NaN |
4 | 1006 | NaN | 8.0 |
5 | 1010 | NaN | 4.0 |
6 | 1010 | 160.0 | NaN |
and i need to generate this where actually duplicated CustomerNr are eliminated but the values of Period1 and Period 2 are kept.
CustomerNr | Period1 | Period2 | |
---|---|---|---|
0 | 1001 | 120.0 | 30.0 |
1 | 1004 | 160.0 | NaN |
2 | 1005 | 160.0 | NaN |
3 | 1006 | NaN | 8.0 |
4 | 1010 | 160.0 | 4 |
CodePudding user response:
df.groupby('CustomerNr').agg('min')
CodePudding user response:
You can groupby
and take the first
item per group, by default the NaNs are ignored in the groupby
operations:
df.groupby('CustomerNr').first()
output:
Period1 Period2
CustomerNr
1001 120.0000 30.0000
1004 160.0000 NaN
1005 160.0000 NaN
1006 NaN 8.0000
1010 160.0000 4.0000