I have a dataframe with the main fixed location data:
id name
1 BEL
2 BEL
3 BEL
4 NYC
5 NYC
6 NYC
7 BER
8 BER
I also have second dataframe where I get values for each id and city like this (notice, this dataframe is longer than the main dataframe):
id name value
1 BEL 9
2 BEL 7
3 BEL 3
4 NYC 76
5 NYC 76
6 NYC 23
7 BER 76
8 BER 2
3 BEL 7
4 NYC 5
5 NYC 4
6 NYC 2
My goal is, I want to check the second dataframe if the values are greater than 10 or not. If greater than 10 I want to add to the first dataframe a column ['not_ok'] like 1 for not ok. How can I do this?
I can filter the second dataframe with dff['not_ok'] = np.where(dff['value'] > 10, '1', '0')
but since the dff is much longer I don't know how to get that information in the first dataframe.
My goal looks something like this:
id name is_ok
1 BEL 1
2 BEL 1
3 BEL 1
4 NYC 0
5 NYC 0
6 NYC 0
7 BER 0
8 BER 1
CodePudding user response:
Suppose you first (shorter) daraframe is called 'df_v1'
and the second (longer) is called 'df_v2'
.
On 'df_v2'
prepare the column like this:
df_v2["not_ok"] = df_v2["value"].apply(lambda x: x > 10)
Then, do a join on 'id'
& 'name'
like this:
df_v1.merge(df_v2[["id", "name", "not_ok"]], on=["id", "name"], how="left")
CodePudding user response:
You can use the .lt(10) method to get the values lesser than 10 (labeling values <10 as 1 and values >10 as 0). Then you group by ids using the min() function to keep the minimum value (0 here) in case of duplicate ids in the second DataFrame. Here is the code :
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8],
'name': ['BEL', 'BEL', 'BEL', 'NYC', 'NYC', 'NYC', 'BER', 'BER']})
df2 = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6],
'name': ['BEL', 'BEL', 'BEL', 'NYC', 'NYC', 'NYC', 'BER', 'BER', 'BEL', 'NYC', 'NYC', 'NYC'],
'value': [9, 7, 3, 76, 76, 23, 76, 2, 7, 5, 4, 2]})
df2['is_ok'] = df2['value'].lt(10).astype(int)
df3 = df2[['id', 'name', 'is_ok']].groupby('id').min().reset_index()
print(df3)
# If you want to merge it with the first DataFrame
# df1 = df1.merge(df3[["id", "is_ok"]], on=["id"])
# print(df1)
Output :
id name is_ok
0 1 BEL 1
1 2 BEL 1
2 3 BEL 1
3 4 NYC 0
4 5 NYC 0
5 6 NYC 0
6 7 BER 0
7 8 BER 1
CodePudding user response:
To reach the desired output you could try as follows:
import pandas as pd
data = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
'name': {0: 'BEL', 1: 'BEL', 2: 'BEL', 3: 'NYC', 4: 'NYC',
5: 'NYC', 6: 'BER', 7: 'BER'}
}
df = pd.DataFrame(data)
data2 = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7,
7: 8, 8: 3, 9: 4, 10: 5, 11: 6},
'name': {0: 'BEL', 1: 'BEL', 2: 'BEL', 3: 'NYC', 4: 'NYC',
5: 'NYC', 6: 'BER', 7: 'BER', 8: 'BEL', 9: 'NYC',
10: 'NYC', 11: 'NYC'},
'value': {0: 9, 1: 7, 2: 3, 3: 76, 4: 76, 5: 23, 6: 76,
7: 2, 8: 7, 9: 5, 10: 4, 11: 2}
}
df2 = pd.DataFrame(data2)
df = df.merge(df2[df2['value'].gt(10)], on=['id', 'name'], how='left')\
.rename(columns={'value':'is_ok'})
df['is_ok'] = df['is_ok'].isna().astype(int)
print(df)
id name is_ok
0 1 BEL 1
1 2 BEL 1
2 3 BEL 1
3 4 NYC 0
4 5 NYC 0
5 6 NYC 0
6 7 BER 0
7 8 BER 1
Explanation:
- Use
Series.gt
to get a booleanpd.Series
, which we use to select fromd2
only the rows that meet the conditionvalue > 10
. - Use
df.merge
to merge this slice fromdf2
withdf
and rename columnvalue
tois_ok
(df.rename
). - We now have a column with
NaN
values where there is no match onid, name
, and values> 10
where there is. UseSeries.isna
to turn this column into booleans. - Finally, we can chain
.astype(int)
to changeTrue | False
into1 | 0
.