Here's my code:
import pathlib
from pathlib import Path
import os
import pandas as pd
df=pd.DataFrame(
{
'col1': ['red','blue','red','red','green','red','yellow', 'red', 'cyan', 'purple', 'pink', 'black', 'orange'],
'col2': ['red', 'green','orange'],
}
)
The desired output would be to eliminate every value that appears on col1
and also appears on col2
, then drop col2
and then export the document to an excel spread sheet.
The desired df would look like this:
df=pd.DataFrame(
{
'col1': ['blue', 'yellow', 'cyan', 'purple', 'pink', 'black',]
}
)
As of now my code looks this this, but it's not giving me the intended result as it does not drop the duplicate values.
df = df.drop(df[(df['col1'] == df['col2'])].index)
df = df.drop(['col2'], inplace=True)
os.startfile(File_path, index=False)
Here is the updated code:
import pathlib
from pathlib import Path
import os
import pandas as pd
df_output = Path.home().joinpath("Desktop", "Remaining colors.xlsx")
df=pd.DataFrame(
{
'col1': ['red','blue','red','red','green','red','yellow', 'red', 'cyan', 'purple', 'pink', 'black', 'orange'],
'col2': ['red', 'green','orange'],
}
)
(df.loc[~df['col1'].isin(df["col2"]), ['col1']])
df = df.drop(columns='col2')
df.to_excel(df_output, index=False)
os.startfile(df_output)
CodePudding user response:
You need to use isin
and boolean indexing:
(df.loc[~df['col1'].isin(df['col2']), ['col1']]
.to_excel('filename.xlsx', index=False)
)
Output:
col1
1 blue
6 yellow
8 cyan
9 purple
10 pink
11 black
Used input:
col1 col2
0 red red
1 blue green
2 red orange
3 red NaN
4 green NaN
5 red NaN
6 yellow NaN
7 red NaN
8 cyan NaN
9 purple NaN
10 pink NaN
11 black NaN
12 orange NaN