I want to delete specific rows in my CSV with Python. The CSV has multiple rows and columns.
import numpy as np
np.df2 = pd.read_csv('C:/Users/.../Data.csv', delimiter='\t')
np.df_2=np.df2[['Colum.X', 'Colum.Y']]
Python should open Data.csv and then delete every (complete) rows where the value of Colum.X > 5 or the value of Colum.Y > 20 in Data.csv.
CodePudding user response:
You can accomplish this with Pandas, no need for Numpy. I assume the columns in your csv are actually named 'Colum.X'
and 'Colum.Y'
.
import pandas as pd
df = pd.read_csv('C:/Users/.../Data.csv', delimiter='\t')
df = df.loc[df['Colum.X'] <= 5] # Take only the rows where Colum.X <= 5
df = df.loc[df['Colum.Y'] <= 20] # Take only the rows where Colum.Y <= 20
df.to_csv('C:/Users/.../Data.csv', index=False) # Export back to csv (with comma's)
CodePudding user response:
Not entirely sure what you're doing with np.df2
, but the following will work:
import pandas as pd
df = pd.read_csv('C:/Users/.../Data.csv', delimiter='\t')
df2 = df[(df['X'] <= 5) & (df['Y'] <= 20)]
You might have to add columns=['X', 'Y']
to the read_csv call, depending on what your CSV data looks like.
You can then overwrite the original file with:
df2.to_csv('C:/Users/.../Data.csv')