I do have a csv file like this
Which include duplicated names. The point is that i am only interested on speed and status.
so i defiend the following:
used_columns=['time.2','speed','time.4','status']
df =pd.read_csv(path,usecols=used_columns)
The tricky part is that if one of the columns is missing then it wont be able to find one of the columns (i have a lot of csv files and they might differ)
An option is to read the whole csv and save the column names and take to previous to 'speed', 'status'
, but im trying to avoid reading the whole csv since its a huge file and i have alot of csv files.
CodePudding user response:
I think you can use nrows=
parameter when reading the .csv
file. That way you can read only few rows (and not whole file), save the column names and read only columns that are present in the file:
df = pd.read_csv(path, nrows=2)
# columns are in df.columns:
print(df.columns)