I'm doing a university job with a pandas dataframe, as below:
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
'col3': [np.NaN, 2, 30, 44, np.NaN]
})
print(df)
col1 sensor_name col2 col3
NaN water NaN NaN
11.2 strain NaN 2.0
10.0 fog 30.0 30.0
NaN water 40.0 44.0
1000.0 fog NaN NaN
I would like to join columns 1, 2 and 3, to avoid the NaN values. The priority would be the value in col1, if any. Then the priority would be 'col2' and finally 'col3'.
I tried to do the following code:
df['new_column'] = df.ffill(axis=1)['col3']
The output is:
col1 sensor_name col2 col3 new_column
NaN water NaN NaN water
11.2 strain NaN 2.0 2.0
10.0 fog 30.0 30.0 30.0
NaN water 40.0 44.0 44.0
NaN fog NaN NaN fog
However, the desired output is:
col1 sensor_name col2 col3 new_column
NaN water NaN NaN NaN
11.2 strain NaN 2.0 11.2
10.0 fog 30.0 30.0 10.0
NaN water 40.0 44.0 40.0
1000.0 fog NaN NaN 1000.0
CodePudding user response:
One of the many ways you can achieve is using pandas.apply
function.
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
'col3': [np.NaN, 2, 30, 44, np.NaN]
})
print(df)
def apply_func(row):
if not pd.isna(row['col1']):
return row['col1']
elif not pd.isna(row['col2']):
return row['col2']
return row["col3"]
df["new_cols"]=df.apply(apply_func,axis=1)
print(df)
Output:
col1 sensor_name col2 col3 new_cols
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
CodePudding user response:
try this:
f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
col1 sensor_name col2 col3 new_column
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
CodePudding user response:
df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]
You could create a for loop if you had more columns to consider and/or wanted it to be more automated.
CodePudding user response:
Filter the column
df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]:
col1 sensor_name col2 col3 new
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0