how to filter several columns with LIKE and follow the sequence?-CodePudding

I am trying to build a pandas dataframe by filtering the name of the columns. The dataframe is styled like:

NAME_1, EMAIL_1, PHONO_1, POSTALCODE_1, NAME_2, EMAIL_2, PHONO_2, POSTALCODE_2, etc.

I understand that to keep only the "email" columns, I just have to put:

df_temp = df.filter(like='EMAIL')

Where the result looks like:

EMAIL_1, EMAIL_2, EMAIL_n

So far so good, but I also want to add other columns to the filter, and also follow the sequence that links the data. I am looking for something like:

NAME_1, EMAIL_1, NAME_2, EMAIL_2

df_temp = df[df.filter(like='NAME') & df.filter(like='EMAIL')]

But it doesn't work.

The following does not return the data as I need it:

df_temp_1 = df.filter(like='NAME')
df_temp_2 = df.filter(like='EMAIL')
df_temp_3 = pd.concat([df_temp_1, df_temp_2], axis=1)

NAME_1, NAME_2, EMAIL_1, EMAIL_2, etc.

I want to see it as:

NAME_1, EMAIL_1, NAME_2, EMAIL_2

I would even like to be able to separate them into different dataframes.

do you recommend a method please?

From already thank you very much.

Regards

CodePudding user response：

Use regex instead of like:

out = df.filter(regex=('^(NAME|EMAIL)'))
print(out)

# Output (sample)
   NAME_1  EMAIL_1  NAME_2  EMAIL_2
0       5        9       5        9
1       8        2       3        9
2       8        8       1        5
3       6        7       9        5
4       6        6       4        3

CodePudding user response：

You can try

out = df.filter(regex='(NAME|EMAIL)_[12]')

If suffix is different

out = df.filter(regex='(NAME_[12]|EMAIL_[3-5])')

CodePudding user response：

Check with startswith

out = df.loc[:,df.columns.str.startswith(('NAME','EMAIL'))]

CodePudding user response：

Using a regular expression is the best solution, but you can also extract the columns from the two filters calls and join them together with union:

cols = df.filter(like='NAME').columns.union(df.filter(like='EMAIL').columns)

Output:

>>> cols
Index(['EMAIL_1', 'EMAIL_2', 'NAME_1', 'NAME_2'], dtype='object'