I have two dataframes and I have to perform MINUS Query to get only unique values of dataframe_1. How can I perform it? Can I perform it using merge as I do not know common column to give inside 'on' parameter. Also I tried it using Pandasql, but got an error.
q1 = """SELECT * FROM df_1
MINUS
SELECT * FROM df_2;
"""
print(pandasql.sqldf(q1, locals()))
My DATAFRAMES:
Last First Role Salary
0 Thompson Aaron Developer 72000
1 Miller James Developer 68000
2 Bryant Samuel Tester 54000
3 Jones Richard Tester 56500
Last First Role Salary
0 Thompson Aaron Developer 72000
1 Miller James Developer 68000
2 Bryant Samuel Tester 54000
3 Nirmal Ojjaswi Analyst 0
CodePudding user response:
Given:
df1:
Last First Role Salary
0 Thompson Aaron Developer 72000
1 Miller James Developer 68000
2 Bryant Samuel Tester 54000
3 Jones Richard Tester 56500
df2:
Last First Role Salary
0 Thompson Aaron Developer 72000
1 Miller James Developer 68000
2 Bryant Samuel Tester 54000
3 Nirmal Ojjaswi Analyst 0
Doing:
df = df1[df1.ne(df2).all(axis=1)]
print(df)
Output:
Last First Role Salary
3 Jones Richard Tester 56500
Given:
df1:
col1 col2
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
df2:
col1 col2
0 1 10
1 2 11
2 3 12
Doing/Output:
df = df1[df1.ne(df2).all(axis=1)]
print(df)
# Output:
col1 col2
3 4 13
4 5 14
5 3 10
CodePudding user response:
Please check the code snippet below, not very optimized but will solve your purpose:
df1 = pd.DataFrame(data = {'last' : ['Thompson', 'Miller', 'Bryant', 'Jones'],
'first' : ['Aaron', 'James', 'Samuel', 'Richard'],
'role' : ['Developer', 'Developer', 'Tester', 'Tester'],
'salary' : [72000, 68000, 54000, 56500]})
df2 = pd.DataFrame(data = {'last' : ['Thompson', 'Miller', 'Bryant', 'Nirmal'],
'first' : ['Aaron', 'James', 'Samuel', 'Ojjaswi'],
'role' : ['Developer', 'Developer', 'Tester', 'Analyst'],
'salary' : [72000, 68000, 54000, 0]})
Sample Input:
df1.head()
last first role salary
0 Thompson Aaron Developer 72000
1 Miller James Developer 68000
2 Bryant Samuel Tester 54000
3 Jones Richard Tester 56500
df2.head()
last first role salary
0 Thompson Aaron Developer 72000
1 Miller James Developer 68000
2 Bryant Samuel Tester 54000
3 Nirmal Ojjaswi Analyst 0
df1[~df1.fillna('').astype(str).apply(''.join, axis=1).isin(df2.fillna('').astype(str).apply(''.join, axis=1))]
Sample Output:
last first role salary
3 Jones Richard Tester 56500
df2[~df2.fillna('').astype(str).apply(''.join, axis=1).isin(df1.fillna('').astype(str).apply(''.join, axis=1))]
Sample Output:
last first role salary
3 Nirmal Ojjaswi Analyst 0