I have a dataframe like so called df_2021:
IP isTrue
101 1
101 0
102 1
103 0
I also have another dataframe like so called df_2022:
IP
101
102
103
104
I want to create a new column in df_2022 isTrue2021
which checks if the IP is in 2021 and is True in any row.
How can I do this in pandas?
Expected df:
IP isTrue2021
101 1
102 1
103 0
104 0
CodePudding user response:
Let's try groupby.any
on df2021 then map it to df2022 isTrue2021
column
df2022['isTrue2021'] = (df2022['IP'].map(df2021.groupby('IP')['isTrue'].any())
.fillna(False).astype(int))
print(df2022)
IP isTrue2021
0 101 1
1 102 1
2 103 0
3 104 0
CodePudding user response:
You could also use a sql left join here then fill the missing values with 0 as such :
df1 = pd.DataFrame({'IP': [101, 101, 102, 103], 'isTrue': [1, 0, 1, 0]})
df2 = pd.DataFrame({'IP': [101, 102, 103, 104]})
df1_tidy = df1.groupby('IP').agg(max)
df1_tidy
# isTrue
# IP
# 101 1
# 102 1
# 103 0
df3 = pd.merge(df2, df1_tidy, on='IP', how="left")
df3
# IP isTrue
# 0 101 1.0
# 1 102 1.0
# 2 103 0.0
# 3 104 NaN
df3.fillna(0).astype(int)
# IP isTrue
# 0 101 1
# 1 102 1
# 2 103 0
# 3 104 0
```
CodePudding user response:
You can filter df_2021 and merge dataframes:
df_2022 = df_2022.merge(df_2021[df_2021.isTrue==1],on='IP', how='left').fillna(0).rename(columns={'isTrue': 'isTrue2021'})