Home > other >  How to delete column names from the middle of dataset
How to delete column names from the middle of dataset

Time:12-26

Hi i have a dataset in csv format. the issue with that data set is that it combine different csv files. and from the other csv files it also copied the column names. now i want to remove all column name which is in the middle of dataset

current csv file is like

col1  col2  col3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
col1  col2  col3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3
 1      2     3
col1  col2  col3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
col1  col2  col3
col1  col2  col3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3

want to change it to this column name only on the top

col1  col2  col3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3
 1      2     3 
 1      2     3

CodePudding user response:

you can use this line of codes:

df = pd.read_csv(df_path)
# removing repeating headers
df = df[df.ne(df.columns).any(1)]

This solution compares each row with actual columns and works regardless of non-column rows being number or not

CodePudding user response:

Assuming that your col1 suppose to have integers, you can use:

df = df[df.apply(lambda r: r['col1'].isdigit(), axis=1)]

Or use your own test in lambda for each row.

Also, this will leave the column types as object, so you might want to use .astype(int) (or something more specific for your case).

CodePudding user response:

First read the csv file into a dataframe using pd.read_csv(). Then drop duplicate rows by df = df[df.col1 != "col1"]. It will find all the rows which have col1 in them and drop these rows.

  • Related