everybody!! I have a question. Imagine a Data Frame with columns [a, b, c, e, f, g, h, i, j]. I want to create a 2nd DF having only columns a, c-g. How can I do this in a single coman without creating a list putting ao the columns? For example, I'm writing in that way:
columns = ['a', 'c', 'e', 'f', 'g']
df2 = df.loc[:,~df.columns.isin(columns)]
I would know if there's something more like:
df2 = df.loc[:,'a': 'g']
But excluing the 'b' column.
This second way I did 2 comands, one to select from a-g and the second, to drop b.
I would like to know if I can selct from a-g and drop b at the same time
CodePudding user response:
The easiest way will be to use slice notation .loc
as you demonstrated along with a call to .drop
to remove any specific unwanted columns:
Create data
>>> df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])
>>> df
a b c d e f g h i j
0 0 1 2 3 4 5 6 7 8 9
1 0 1 2 3 4 5 6 7 8 9
2 0 1 2 3 4 5 6 7 8 9
3 0 1 2 3 4 5 6 7 8 9
4 0 1 2 3 4 5 6 7 8 9
.loc
and dropping
Fairly straightforward, use .loc
to perform your slicing then drop
anything you don't want from there.
>>> df.loc[:, 'a':'g'].drop(columns='b')
a c d e f g
0 0 2 3 4 5 6
1 0 2 3 4 5 6
2 0 2 3 4 5 6
3 0 2 3 4 5 6
4 0 2 3 4 5 6
Working With the Index
If you want to work as efficiently as possible with the index, you can use Index.slice_indexer
along with .drop
so that you don't create temporary subsets of your data (like we did above):
>>> columns = df.columns[df.columns.slice_indexer('a', 'g')].drop('b')
>>> df[columns]
a c d e f g
0 0 2 3 4 5 6
1 0 2 3 4 5 6
2 0 2 3 4 5 6
3 0 2 3 4 5 6
4 0 2 3 4 5 6
CodePudding user response:
you can use
df2 = df[[a, c, d, e, f, g]].copy()
or
df2 = df.copy()
del df2[b]
CodePudding user response:
There are a couple ways you could solve this if you did not want to manually have to write in the columns into a list
#Firstly, if you wanted to simply pull back only columns that are sequential you could use an np.arange() to get the column indexes pulled back
df.iloc[:,np.arange(2, 5).tolist()]
#Secondly, if you wanted to pull back some columns sequential, but remove one in the middle you could use a pop on a list of ints to represent your column index
column_list = np.arange(2, 5).tolist()
#This pop will remove the 1 index of the list you created in the np.arange() above
column_list.pop(1)
df.iloc[:,column_list]