I am trying to remove ranges of columns in my pandas df. I would prefer to do it in one line but the only method I know is iloc, which doesn't seem to allow multiple references. When I wrote it in separate lines, the columns I don't want remain. Can someone help me with a better way of doing this? Thanks!
import pandas as pd
df = pd.DataFrame({'id': [100,200,300], 'user': ['Bob', 'Jane', 'Alice'], 'income': [50000, 60000, 70000], 'color':['red', 'green', 'blue'], 'state':['GA', 'PA', 'NY'], 'day':['M', 'W', 'Th'], 'color2':['red', 'green', 'blue'], 'state2':['GA', 'PA', 'NY'], 'id2': [100,200,300]})
df.drop(df.iloc[:, 0:2], inplace=True, axis=1)
df.drop(df.iloc[:, 4:5], inplace=True, axis=1)
df.drop(df.iloc[:, 7:9], inplace=True, axis=1)
I'd like the output from the code above to contain columns 'color' and 'color2'
CodePudding user response:
try the np.r_
answer based on your question, prior to you editing it:
import pandas as pd
import numpy as np
idx = np.r_[0:12, 63:70, 73:78, 92:108]
policies.drop(df.columns[idx], axis = 1, inplace = True)
answer based on your given example:
import pandas as pd
import numpy as np
idx = np.r_[0:2, 4:5, 7:9]
df.drop(df.columns[idx], axis = 1, inplace = True)
PS: the np.r_
is exclusive, meaning [0:3], column at position 3 will not be droped.
hope this helps.
CodePudding user response:
You could do:
df = df.drop(df.columns[[*range(0,2), *range(4,5), *range(7,9)]], axis=1)
Output:
income color day color2
0 50000 red M red
1 60000 green W green
2 70000 blue Th blue