How to replace df.loc with df.reindex without KeyError-CodePudding

I have a huge dataframe which I get from a .csv file. After defining the columns I only want to use the one I need. I used Python 3.8.1 version and it worked great, although raising the "FutureWarning:

Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative."

If I try to do the same in Python 3.10.x I get a KeyError now: "[’empty’] not in index"

In order to get slice/get rid of columns I don't need I use the .loc function like this:

df = df.loc[:, ['laenge','Timestamp', 'Nick']]

How can I get the same result with .reindex function (or any other) without getting the KeyError?

Thanks

CodePudding user response：

If need only columns which exist in DataFrame use numpy.intersect1d:

df = df[np.intersect1d(['laenge','Timestamp', 'Nick'], df.columns)]

Same output is if use DataFrame.reindex with remove only missing values columns:

df = df.reindex(['laenge','Timestamp', 'Nick'], axis=1).dropna(how='all', axis=1)

Sample:

df = pd.DataFrame({'laenge': [0,5], 'col': [1,7], 'Nick': [2,8]})

print (df)
   laenge  col  Nick
0       0    1     2
1       5    7     8

df = df[np.intersect1d(['laenge','Timestamp', 'Nick'], df.columns)]
print (df)
   Nick  laenge
0     2       0
1     8       5

CodePudding user response：

Use reindex:

df = pd.DataFrame({'A': [0], 'B': [1], 'C': [2]})
#    A  B  C
# 0  0  1  2


df.reindex(['A', 'C', 'D'], axis=1)

output:

   A  C   D
0  0  2 NaN

If you need to get only the common columns, you can use Index.intersection:

cols = ['A', 'C', 'E']
df[df.columns.intersection(cols)]

output:

   A  C
0  0  2