I understand that to drop a column you use df.drop('column name', axis=1, inplace =True)
The file format is in .csv
file
I want to use above syntax for large data sets and more in robust way
suppose I have 500 columns and I want to keep column no 100 to 140 using column name not by indices and rest want to drop , how would I write above syntax so that I can achieve my goal and also in 100 to 140 column , I want to drop column no 105, 108,110 by column name
CodePudding user response:
df = df.loc[:, 'col_100_name' : 'col_140_name']
.loc
always selects using both ends inclusive. Here I am selecting all rows, and only the columns that you want to select (by names).
After this (or before - it doesn't matter) you can drop the other columns by names as usual:
df.drop(['col_105_name', 'col_108_name', 'col_110_name'], axis=1, inplace=True)
If you wish to select columns using a combination of slices and explicit column names:
cols_in_the_slice = df.loc[:, 'col_100_name' : 'col_140_name'].columns
other_cols = pd.Index(['col_02_name', 'col_04_name'])
all_cols = other_cols.union(cols_in_the_slice , sort=False)
df = df[all_cols]
Union appends the NEW (not yet encountered) elements of cols_in_the_slice
to the end of other_cols
. It sorts by default, so I specify sort=False
not to sort. Then we are selecting all these columns.
By the way, here you can also drop column names which you don't wish to have.
You can use .drop
if you know column names, or .delete
if you know their locations in this index:
cols_in_the_slice = cols_in_the_slice.drop(['col_105_name', 'col_108_name', 'col_110_name'])
I also recommend taking a look at Pandas User Guide on Indexing and selecting data.
CodePudding user response:
Instead of using a string parameter for the column name, use a list of strings refering to the column names you want to delete.