I want to filter rows that are in df_1
that match the following two condition
Row from
df_1
should not be indf_2
OR
If row from
df_1
is indf_2
then it must have a valueYes
indf_2
Code I tried, that does not work
import pandas as pd
df_1 = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
df_2 = pd.DataFrame({'a': [2, 4, 6, 8, 10], 'b': ['Yes', 'No', 'Yes', 'No', 'No']})
df = df_1[(~df_1.a.isin(df_2.a)) | (df_2.b=='Yes')]
Output
a
0 1
2 3
4 5
6 7
8 9
Expected Output
a
0 1
1 2
2 3
3 6
4 5
6 7
8 9
Explanation
Rows 1, 3, 5, 7, 9 are not in df_2
so they are part of the output
Rows 2 and 6 are in df_2
but the have column b as Yes
so they are part of the output
CodePudding user response:
I think you can create two frames and concat it as below.
import pandas as pd
df_1 = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
df_2 = pd.DataFrame({'a': [2, 4, 6, 8, 10], 'b': ['Yes', 'No', 'Yes', 'No', 'No']})
df = [df_1[(~df_1.a.isin(df_2.a))]['a'],df_2[df_2.b=='Yes']['a']]
result = pd.concat(df).sort_values()
print(result)
output
CodePudding user response:
The problem in your calculation stems from the fact that df_1
and df_2
are not the same size and logical operators between their
columns does not make sense. If you map your values in df_1
to Yes
, No
or NaN
using df_2
then you will have equal length columns that can be compared
df_1[(~df_1["a"].isin(df_2["a"])) | (df_1["a"].map(df_2.set_index("a")["b"]) == "Yes")]
CodePudding user response:
You can use:
df = df_1.merge(df_2, how='left', on='a')
print(df[df.b.isin(['Yes', np.nan])][['a']])
OUTPUT
a
0 1
1 2
2 3
4 5
5 6
6 7
8 9
CodePudding user response:
This won't be the fastest solution but the steps should be easy enough for you to follow. When the two dataframes are merged using a union, NaN
values are added in place of missing data. Afterwards you want to retain those rows so you only need to remove those which have the value No
.
import pandas as pd
df_1 = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
df_2 = pd.DataFrame({'a': [2, 4, 6, 8, 10], 'b': ['Yes', 'No', 'Yes', 'No', 'No']})
# Merge the dataframes using a union
df = df_1.merge(df_2, how='outer', on='a')
# Drop the rows where 'b' == 'No'
df.drop(df[df['b']=='No'].index, inplace=True)
# Drop column 'b'
df.drop('b', axis=1, inplace=True)