I have data in Data-frame like below, where you see column values for different column and Nan
appearing in between, if will use df.dropna('')
then it will leave behind an empty cell for a column which i don't want rather i want remove Nan
and strip the blank so only host*
will sum up rest stripped.
Actual post related to this is here
This is my dataframe:
df = pd.read_csv("server.csv", usecols=['name', 'managed_by'])
df = df.pivot(columns='managed_by', values='name')
the above code producing below ..
Sam Peter Jesse Patrick Banu
host1 host5 host7 host9 host10
host2 host6 host8 host11
host3 Nan Nan Nan
host4 Nan Nan Nan
Nan host22 Nan Nan
host24 Nan Nan Nan
host23 Nan Nan Nan
I want below:
Sam Peter Jesse Patrick Banu
host1 host5 host7 host9 host10
host2 host6 host8 host11
host3 host22
host4
host23
any help will be much appreciated.
CodePudding user response:
If you have real NaNs, use apply
with dropna
and reset_index
:
df.apply(lambda c: c.dropna().reset_index(drop=True))
or, with concat
:
pd.concat([df[c].dropna().reset_index(drop=True) for c in df], axis=1)
output:
Sam Peter Jesse Patrick Banu
0 host1 host5 host7 host9 host10
1 host2 host6 host8 NaN host11
2 host3 host22 NaN NaN NaN
3 host4 NaN NaN NaN NaN
4 host24 NaN NaN NaN NaN
5 host23 NaN NaN NaN NaN
For "blank" cells, fill with empty string:
df.apply(lambda c: c.dropna().reset_index(drop=True)).fillna('')
output:
Sam Peter Jesse Patrick Banu
0 host1 host5 host7 host9 host10
1 host2 host6 host8 host11
2 host3 host22
3 host4
4 host24
5 host23
NB. if string 'Nan', first replace them using df.replace('Nan', float('nan'))
or df.mask(df.eq('NaN'))
CodePudding user response:
Update
From this dataframe:
>>> df
managed_by name
0 host1 Sam
1 host2 Sam
2 host3 Sam
3 host4 Sam
4 host5 Peter
5 host6 Peter
6 host7 Jesse
7 host8 Jesse
8 host9 Patrick
9 host10 Banu
10 host11 Banu
Use: (slightly variation of my old answer)
out = (
df.assign(index=lambda x: x.groupby('name').cumcount())
.pivot_table('managed_by', 'index', 'name', aggfunc='first', fill_value='')
[df['name'].unique()].rename_axis(index=None, columns=None)
)
Output:
>>> out
Sam Peter Jesse Patrick Banu
0 host1 host5 host7 host9 host10
1 host2 host6 host8 host11
2 host3
3 host4
Old answer
You can use melt
to flatten your dataframe and pivot_table
to reshape it:
out = (
df.melt().dropna().assign(index=lambda x: x.groupby('variable').cumcount())
.pivot_table('value', 'index', 'variable', aggfunc='first', fill_value='')
[df.columns].rename_axis(index=None, columns=None)
)
Output:
>>> out
Sam Peter Jesse Patrick Banu
0 host1 host5 host7 host9 host10
1 host2 host6 host8 host11
2 host3 host22
3 host4
4 host24
5 host23
CodePudding user response:
df = df.replace('',np.nan)#Make the empty space NaNs
s =df.fillna(method='bfill').dropna(thresh=2)#backfill the NaNs and drop any that does not have 2 non nulss
s.mask(s.apply(lambda x:x.duplicated())).fillna('')#.duplicated(keep='last', axis=0)#Coditionally drop rest of rows, observing multiplicity. Fill NaNs with space
Outcome
Sam Peter Jesse Patrick Banu
0 host1 host5 host7 host9 host10
1 host2 host6 host8 host11
2 host3 host22
3 host4
4 host24