Pandas: How to replace column values in panel dataset based on ID and condition-CodePudding

So I have a panel df that looks like this:

ID	year	value
1	2002	8
1	2003	9
1	2004	10
2	2002	11
2	2003	11
2	2004	12

I want to set the value for every ID and for all years to the value in 2004. How do I do this?

The df should then look like this:

ID	year	value
1	2002	10
1	2003	10
1	2004	10
2	2002	12
2	2003	12
2	2004	12

Could not find anything online. So far I have tried to get the value for every ID for year 2004, created a new df from that and then merged it back in. Though, that is super slow.

CodePudding user response：

We can use Series.map for this, first we select the values and create our mapping:

mapping = df[df["year"].eq(2004)].set_index("ID")["value"]
df["value"] = df["ID"].map(mapping)

   ID  year  value
0   1  2002     10
1   1  2003     10
2   1  2004     10
3   2  2002     12
4   2  2003     12
5   2  2004     12

CodePudding user response：

Let's convert the value where corresponding year is not 2004 to NaN then get the max value per ID.

df['value'] = (df.assign(value=df['value'].mask(df['year'].ne(2004)))
               .groupby('ID')['value'].transform('max'))

print(df)

   ID  year  value
0   1  2002   10.0
1   1  2003   10.0
2   1  2004   10.0
3   2  2002   12.0
4   2  2003   12.0
5   2  2004   12.0

CodePudding user response：

Another method, for some variety.

# Make everything that isn't 2004 null~
df.loc[df.year.ne(2004), 'value'] = np.nan
# Fill the values by ID~
df['value'] = df.groupby('ID')['value'].bfill()

Output:

   ID  year  value
0   1  2002   10.0
1   1  2003   10.0
2   1  2004   10.0
3   2  2002   12.0
4   2  2003   12.0
5   2  2004   12.0

CodePudding user response：

Yet another method, a bit longer but should be quite intuitive. Basically creating a lookup table for ID->value then performing lookup using pandas.merge.

import pandas as pd

# Original dataframe
df_orig = pd.DataFrame([(1, 2002, 8), (1, 2003, 9), (1, 2004, 10), (2, 2002, 11), (2, 2003, 11), (2, 2004, 12)])
df_orig.columns = ['ID', 'year', 'value']

# Dataframe with 2004 IDs
df_2004 = df_orig[df_orig['year'] == 2004]
df_2004.drop(columns=['year'], inplace=True)
print(df_2004)

# Drop values from df_orig and replace with those from df_2004
df_orig.drop(columns=['value'], inplace=True)
df_final = pd.merge(df_orig, df_2004, on='ID', how='right')
print(df_final)

df_2004:

    ID  value
2   1     10
5   2     12

df_final:

    ID  year     value
0   1   2002     10
1   1   2003     10
2   1   2004     10
3   2   2002     12
4   2   2003     12
5   2   2004     12