I have a large csv file and it contains repeated rows, I want to delete all these repeated rows, containing word "Names"
1 Names Dates Picture
2 Alex 6-12 4364.jpg
3 Names Dates Picture
4 Jade 8-11 7435.jpg
5 Names Dates Picture
6 Dread 1-5 8635.jpg
The csv file looks like this. I want to delete all the rows with these repeated "Names" "Dates" "Picture".
I have tried different methods from online but I can't find solution
Im using pandas to import the csv file df = pd.read_csv('file2022.csv')
CodePudding user response:
You can use drop_duplicates
here:
df = pd.read_csv('test2.csv', sep=' *', engine='python', header=None, index_col=0)
df.drop_duplicates(keep=False, inplace=True)
df.reset_index(inplace=True, drop=True)
print(df)
Output:
1 2 3
0 Alex 6-12 4364.jpg
1 Jade 8-11 7435.jpg
2 Dread 1-5 8635.jpg
CodePudding user response:
df = df[df["Names"] != "Names"]
should drop the "Names" values under "Names" column.