I have two separate data frames, and I want to do the same thing to both. I want to pair the columns according to the first substring before the underscore (a, b, x, y), and then if a value in the first column contains a word, but the corresponding row in the totals column is null, i want to update the total to a zero. I want to update the data frames and then output them both separately.
import pandas as pd
import numpy as np
d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
'a_total': [5,12,4,np.nan,0],
'b': ['blue','orange','pink', np.nan, np.nan],
'b_total': [12,6,0,0, np.nan]})
d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
'y_total': [182,32,13, np.nan,8],
'z': ['car','bike','walk', np.nan, np.nan],
'z_total': [12,6,np.nan,np.nan, np.nan]})
then i want to do something to both data frames, and then output the updated versions separately. My current code copied below is not outputting properly. I am trying to output a dicitonary of dataframes, but if I can just output the two data frames (d1 and d2) somehow that would also be good.
out = {}
for i, df in enumerate([d1, d2]):
key_id = [*df.loc[:,~df.columns.str.endswith('total')].columns]
totals = [*df.loc[:,df.columns.str.endswith('total')].columns]
for col in key_id:
pairs = df.loc[:, df.columns.str.startswith(col)]
pairs[col '_total'].loc[(pairs[col].notnull()) & (pairs[col '_total'].isnull())] = 0
out[i] = pd.concat([pairs], axis=1)
thank you for looking
CodePudding user response:
Not sure I exactly understand what your need in your output, but maybe this works?
import pandas as pd
import numpy as np
d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
'a_total': [5,12,4,np.nan,0],
'b': ['blue','orange','pink', np.nan, np.nan],
'b_total': [12,6,0,0, np.nan]})
d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
'y_total': [182,32,13, np.nan,8],
'z': ['car','bike','walk', np.nan, np.nan],
'z_total': [12,6,np.nan,np.nan, np.nan]})
#show d1 before making changes
print(d1)
#make the changes directly to d1 and d2
for i, df in enumerate([d1, d2]):
cols = [c for c in df.columns if not c.endswith('total')]
for col in cols:
tot_col = col '_total'
df.loc[df[col].notnull() & df[tot_col].isnull(), tot_col] = 0
#show d1 after making changes
print(d1)
d1 before changes:
a a_total b b_total
0 yes 5.0 blue 12.0
1 no 12.0 orange 6.0
2 maybe 4.0 pink 0.0
3 sometimes NaN NaN 0.0
4 NaN 0.0 NaN NaN
d1 after changes:
a a_total b b_total
0 yes 5.0 blue 12.0
1 no 12.0 orange 6.0
2 maybe 4.0 pink 0.0
3 sometimes 0.0 NaN 0.0
4 NaN 0.0 NaN NaN