My Problem is the following:
I have an dataframe with appid as the index:
appid name
1648300 Costume Party
1648310 Pillars Of Protection
1648340 Push Me
1648350 Fret Smasher Playtest
1648360 Luminary
... ...
and iteratively recalling information per appid/index, resulting in a new dataframe (or optional Series)
num_reviews review_score review_score_desc total_positive ...
0 0 0 No user reviews 0 ...
I would now like to now append the new ones in every iteration, such that in the first iteration new columns are generated in the original df from the ones in the new one. It should look like
appid name num_reviews review_score review_score_desc total_positive ...
1648300 Costume Party 0 0. 0. 0
1648310 Pillars Of Protection. 1. 2. 3. 4.
1648340 Push Me ...
1648350 Fret Smasher Playtest
1648360 Luminary
... ...
I do not want to create a new frame or add new columns, but update the existing one.
I tried
df.loc[appid] = df.loc[appid].append(pd.DataFrame(new_data))
and
df.loc[appid] = pd.concat([df.loc[appid], pd.Series(new_data)])
Which are both not working.
Also just inserting the values does not work, since the columns are not generated in the first iteration.
Does anyone knows an answer to this ? I was looking quite a lot and was unable to find something usefull.
Thanks in advance!
CodePudding user response:
Example
i make simple and minimal example for answer
df = pd.DataFrame(['name1', 'name2'], index=['A', 'B'], columns=['col1'])
iter1 = pd.DataFrame([[0, 1]], index=['A'], columns=['col2', 'col3'])
iter2 = pd.DataFrame([[2, 3]], index=['B'], columns=['col2', 'col3'])
df
col1
A name1
B name2
iter1
col2 col3
A 0 1
iter2
col2 col3
B 2 3
Code
at first concat iteratively recalling information dataframe(iter1, iter2) and concat with df
pd.concat([df, pd.concat([iter1, iter2])], axis=1)
col1 col2 col3
A name1 0 1
B name2 2 3
in my code, iter1 & iter2 was concated, but you can be combined in your for
loop.
CodePudding user response:
If I understand the question correctly, you start with a dataframe containing appid and name (df1). While iterating, we receive some new data for a given appid (df2). If df2 contains new columns, we append the columns to df1. Finally, write all values from df2 into the corresponding fields in df1.
import pandas as pd
import numpy as np
# initial dataframe containing appid and name
df1 = pd.DataFrame([['Costume Party'],['Pillars Of Protection'],['Push Me']],
index=[1648300, 1648310, 1648340],
columns=['name'])
# some new data for a given appid
appid = 1648340
df2 = pd.DataFrame([[1,5,'positive']],
index=[appid],
columns=['num_reviews','review_score','review_desc'])
# append columns from df2 if they are not already in df1
cols_to_add = df2.columns
for col in cols_to_add:
if col not in df.columns:
df1[col] = np.nan
# write values for appid from df2 into df1
cols_to_write = df2.columns
df1.loc[appid, cols_to_write] = df2.loc[appid]
df1
Output:
| appid | name | review_score | review_desc | num_reviews |
|:--------|:-----|:-------------|:------------|:------------|
| 1648300 | Costume Party | NaN | NaN | NaN |
| 1648310 | Pillars Of Protection | NaN | NaN | NaN |
| 1648340 | Push Me | 5.0 | positive | 1.0 |