Home > other >  Pandas: How to replace column values in panel dataset based on ID and condition
Pandas: How to replace column values in panel dataset based on ID and condition

Time:09-02

So I have a panel df that looks like this:

ID year value
1 2002 8
1 2003 9
1 2004 10
2 2002 11
2 2003 11
2 2004 12

I want to set the value for every ID and for all years to the value in 2004. How do I do this?

The df should then look like this:

ID year value
1 2002 10
1 2003 10
1 2004 10
2 2002 12
2 2003 12
2 2004 12

Could not find anything online. So far I have tried to get the value for every ID for year 2004, created a new df from that and then merged it back in. Though, that is super slow.

CodePudding user response:

We can use Series.map for this, first we select the values and create our mapping:

mapping = df[df["year"].eq(2004)].set_index("ID")["value"]
df["value"] = df["ID"].map(mapping)
   ID  year  value
0   1  2002     10
1   1  2003     10
2   1  2004     10
3   2  2002     12
4   2  2003     12
5   2  2004     12

CodePudding user response:

Let's convert the value where corresponding year is not 2004 to NaN then get the max value per ID.

df['value'] = (df.assign(value=df['value'].mask(df['year'].ne(2004)))
               .groupby('ID')['value'].transform('max'))
print(df)

   ID  year  value
0   1  2002   10.0
1   1  2003   10.0
2   1  2004   10.0
3   2  2002   12.0
4   2  2003   12.0
5   2  2004   12.0

CodePudding user response:

Another method, for some variety.

# Make everything that isn't 2004 null~
df.loc[df.year.ne(2004), 'value'] = np.nan
# Fill the values by ID~
df['value'] = df.groupby('ID')['value'].bfill()

Output:

   ID  year  value
0   1  2002   10.0
1   1  2003   10.0
2   1  2004   10.0
3   2  2002   12.0
4   2  2003   12.0
5   2  2004   12.0

CodePudding user response:

Yet another method, a bit longer but should be quite intuitive. Basically creating a lookup table for ID->value then performing lookup using pandas.merge.

import pandas as pd

# Original dataframe
df_orig = pd.DataFrame([(1, 2002, 8), (1, 2003, 9), (1, 2004, 10), (2, 2002, 11), (2, 2003, 11), (2, 2004, 12)])
df_orig.columns = ['ID', 'year', 'value']

# Dataframe with 2004 IDs
df_2004 = df_orig[df_orig['year'] == 2004]
df_2004.drop(columns=['year'], inplace=True)
print(df_2004)

# Drop values from df_orig and replace with those from df_2004
df_orig.drop(columns=['value'], inplace=True)
df_final = pd.merge(df_orig, df_2004, on='ID', how='right')
print(df_final)

df_2004:

    ID  value
2   1     10
5   2     12

df_final:

    ID  year     value
0   1   2002     10
1   1   2003     10
2   1   2004     10
3   2   2002     12
4   2   2003     12
5   2   2004     12
  • Related