Home > other >  Selecting specific columns from a Data Frame
Selecting specific columns from a Data Frame

Time:07-08

everybody!! I have a question. Imagine a Data Frame with columns [a, b, c, e, f, g, h, i, j]. I want to create a 2nd DF having only columns a, c-g. How can I do this in a single coman without creating a list putting ao the columns? For example, I'm writing in that way:

columns = ['a', 'c', 'e', 'f', 'g']
df2 = df.loc[:,~df.columns.isin(columns)]

I would know if there's something more like:

df2 = df.loc[:,'a': 'g']

But excluing the 'b' column.

This second way I did 2 comands, one to select from a-g and the second, to drop b.

I would like to know if I can selct from a-g and drop b at the same time

CodePudding user response:

The easiest way will be to use slice notation .loc as you demonstrated along with a call to .drop to remove any specific unwanted columns:

Create data

>>> df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])
>>> df
   a  b  c  d  e  f  g  h  i  j
0  0  1  2  3  4  5  6  7  8  9
1  0  1  2  3  4  5  6  7  8  9
2  0  1  2  3  4  5  6  7  8  9
3  0  1  2  3  4  5  6  7  8  9
4  0  1  2  3  4  5  6  7  8  9

.loc and dropping

Fairly straightforward, use .loc to perform your slicing then drop anything you don't want from there.

>>> df.loc[:, 'a':'g'].drop(columns='b')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

Working With the Index

If you want to work as efficiently as possible with the index, you can use Index.slice_indexer along with .drop so that you don't create temporary subsets of your data (like we did above):

>>> columns = df.columns[df.columns.slice_indexer('a', 'g')].drop('b')
>>> df[columns]
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

CodePudding user response:

you can use

df2 = df[[a, c, d, e, f, g]].copy()

or

df2 = df.copy()
del df2[b]

CodePudding user response:

There are a couple ways you could solve this if you did not want to manually have to write in the columns into a list

#Firstly, if you wanted to simply pull back only columns that are sequential you could use an np.arange() to get the column indexes pulled back
df.iloc[:,np.arange(2, 5).tolist()]

#Secondly, if you wanted to pull back some columns sequential, but remove one in the middle you could use a pop on a list of ints to represent your column index
column_list = np.arange(2, 5).tolist()
#This pop will remove the 1 index of the list you created in the np.arange() above
column_list.pop(1)
df.iloc[:,column_list]
  • Related