Updating an existing row with new columns and values in pandas-CodePudding

My Problem is the following:

I have an dataframe with appid as the index:

appid                     name                         
1648300          Costume Party
1648310  Pillars Of Protection
1648340                Push Me
1648350  Fret Smasher Playtest
1648360               Luminary
...                        ...

and iteratively recalling information per appid/index, resulting in a new dataframe (or optional Series)

 num_reviews  review_score review_score_desc  total_positive  ...
0            0             0   No user reviews               0 ...

I would now like to now append the new ones in every iteration, such that in the first iteration new columns are generated in the original df from the ones in the new one. It should look like

appid                     name  num_reviews  review_score review_score_desc  total_positive  ...                         
1648300          Costume Party  0            0.           0.                 0
1648310  Pillars Of Protection. 1.           2.           3.                 4.                      
1648340                Push Me  ...
1648350  Fret Smasher Playtest
1648360               Luminary
...                        ...

I do not want to create a new frame or add new columns, but update the existing one.

I tried

df.loc[appid] = df.loc[appid].append(pd.DataFrame(new_data)) and

df.loc[appid] = pd.concat([df.loc[appid], pd.Series(new_data)])

Which are both not working.

Also just inserting the values does not work, since the columns are not generated in the first iteration.

Does anyone knows an answer to this ? I was looking quite a lot and was unable to find something usefull.

Thanks in advance!

CodePudding user response：

Example

i make simple and minimal example for answer

df = pd.DataFrame(['name1', 'name2'], index=['A', 'B'], columns=['col1'])
iter1 = pd.DataFrame([[0, 1]], index=['A'], columns=['col2', 'col3'])
iter2 = pd.DataFrame([[2, 3]], index=['B'], columns=['col2', 'col3'])

df

    col1
A   name1
B   name2

iter1

    col2    col3
A   0       1

iter2

    col2    col3
B   2       3

Code

at first concat iteratively recalling information dataframe(iter1, iter2) and concat with df

pd.concat([df, pd.concat([iter1, iter2])], axis=1)


    col1    col2    col3
A   name1   0       1
B   name2   2       3

in my code, iter1 & iter2 was concated, but you can be combined in your for loop.

CodePudding user response：

If I understand the question correctly, you start with a dataframe containing appid and name (df1). While iterating, we receive some new data for a given appid (df2). If df2 contains new columns, we append the columns to df1. Finally, write all values from df2 into the corresponding fields in df1.

import pandas as pd
import numpy as np

# initial dataframe containing appid and name
df1 = pd.DataFrame([['Costume Party'],['Pillars Of Protection'],['Push Me']],
                   index=[1648300, 1648310, 1648340],
                   columns=['name'])

# some new data for a given appid
appid = 1648340
df2 = pd.DataFrame([[1,5,'positive']],
                   index=[appid],
                   columns=['num_reviews','review_score','review_desc'])

# append columns from df2 if they are not already in df1
cols_to_add = df2.columns
for col in cols_to_add:
  if col not in df.columns:
    df1[col] = np.nan


# write values for appid from df2 into df1
cols_to_write = df2.columns
df1.loc[appid, cols_to_write] = df2.loc[appid]

df1

Output:

| appid   | name | review_score | review_desc | num_reviews |
|:--------|:-----|:-------------|:------------|:------------|
| 1648300 | Costume Party         | NaN | NaN      | NaN |
| 1648310 | Pillars Of Protection | NaN | NaN      | NaN |
| 1648340 | Push Me               | 5.0 | positive | 1.0 |