This is the sample data provided and I need to combine the columns to single column which consists of all the fruit names without any Nan Values.
|Fruit1 | Fruit2| Fruit3 |
|:------ |:-----:| ------:|
|NaN | Apple | NaN |
|Pear | NaN | NaN |
|NaN | NaN | Orange |
|Mango | NaN | NaN |
|NaN | banana| NaN |
This should be the output:
|Fruits |
|------- |
|Apple |
|Pear |
|Orange |
|Mango |
|banana |
CodePudding user response:
I would use bfill()
:
df = pd.DataFrame({
'fruit_1': [None, 'Pear', None, None],
'fruit_2': ['Apple', None, None, None],
'fruit_3': [None, None, 'Orange', None]})
df.bfill(axis=1).iloc[:,0].rename('fruits') # returns
0 Apple 1 Pear 2 Orange 3 None Name: fruits, dtype: object
(or ffill()
and use the last column)
It also works for rows containing None
only.
CodePudding user response:
Assuming you have only one non-NaN per row, you can stack
:
df.stack().droplevel(1).to_frame(name='Fruits')
Output:
Fruits
0 Apple
1 Pear
2 Orange
3 Mango
4 banana
handling rows with only NaNs:
df.stack().droplevel(1).to_frame(name='Fruits').reindex(df.index)
output assuming banana is a NaN:
Fruits
0 Apple
1 Pear
2 Orange
3 Mango
4 NaN
CodePudding user response:
I think this should give the desired output -
df['Fruit1'].fillna(df['Fruit2'])
CodePudding user response:
We can use combine_first
here:
df["Fruits"] = df["Fruit1"].combine_first(df["Fruit2"])
We can also use np.where
:
df["Fruits"] = np.where(df["Fruit1"].isnull(), df["Fruit2"], df["Fruit1"])
CodePudding user response:
We can use a modification of an approach from Transform Multiple Columns Into One With Pandas to combine columns:
df['new'] = df.fillna('').sum(1)
Explanation
- replace all nan values with an empty string
- sum(1), is summing the df row by row. Since values in row are strings, it will join them together
Example Usage
from io import StringIO
# Create DataFrame from OP data
s = '''Fruit1,Fruit2,Fruit3
NaN,Apple,NaN
Pear,NaN,NaN
NaN,NaN,Orange
Mango,NaN,NaN
NaN,banana,NaN'''
df = pd.read_csv(StringIO(s))
print(df)
Initial DataFrame
Fruit1 Fruit2 Fruit3
0 NaN Apple NaN
1 Pear NaN NaN
2 NaN NaN Orange
3 Mango NaN NaN
4 NaN banana NaN
df['New']=df.fillna('').sum(1)
print(df)
Updated DataFrame
Fruit1 Fruit2 Fruit3 new
0 NaN Apple NaN Apple
1 Pear NaN NaN Pear
2 NaN NaN Orange Orange
3 Mango NaN NaN Mango
4 NaN banana NaN banana