I have a df:
dict1 = {'A': 1, 'B': 2, 'C': 3, 'D': 4}
dict2 = {'A': 10, 'B': 20, 'C': 30, 'D': 40}
dict3 = {'A': 100, 'B': 200, 'C': 300, 'D': 400}
df = pd.DataFrame([dict1, dict2, dict3])
(I'm working from home, I can't copy paste the output here, sorry)
Now, I would like to 'enlarge' df
, then assign calculated values to the new columns.
df[['new_col1', 'new_col2']] = None
for idx, row in df.iterrows():
# insert the calculated values for `new_col1` and `new_col2` here
I think I do need to iterate over the rows, as the calculation is based on the values of the rows. I can of course manually insert the values for each cell one by one using .at
, but I have hundreds of thousands of rows, and ~20
calculated values to fill in. How can I do this?
I tried:
dictt = {'new_col1': 1, 'new_col2': 2}
df.iloc[0] = df.iloc[0].map(dictt)
But then if I check what df.iloc[0]
is, its a row of NaN
. I also tried:
df.iloc[0] = df.iloc[0].replace(dictt)
But that didn't do anything. Also, if there is a better/ more proper way to do operations like this, I'm all ears.
CodePudding user response:
If you have some heavy complicated function main bottleneck is in this function, not in pandas, here is solution how iterate in DataFrame.apply
:
def f(a, b):
return pd.Series({'new_col1': 1 a, 'new_col2': 2 b})
df = df.join(df.apply(lambda x: f(x.A, x.B), axis=1))
print (df)
A B C D new_col1 new_col2
0 1 2 3 4 2 4
1 10 20 30 40 11 22
2 100 200 300 400 101 202
Another idea:
def f(a, b):
return (1 a, 2 b)
df[['col1','col2']] = df.apply(lambda x: f(x.A, x.B), axis=1, result_type='expand')
print (df)
A B C D col1 col2
0 1 2 3 4 2 4
1 10 20 30 40 11 22
2 100 200 300 400 101 202