I am trying to build a pandas dataframe by filtering the name of the columns. The dataframe is styled like:
NAME_1, EMAIL_1, PHONO_1, POSTALCODE_1, NAME_2, EMAIL_2, PHONO_2, POSTALCODE_2, etc.
I understand that to keep only the "email" columns, I just have to put:
df_temp = df.filter(like='EMAIL')
Where the result looks like:
EMAIL_1, EMAIL_2, EMAIL_n
So far so good, but I also want to add other columns to the filter, and also follow the sequence that links the data. I am looking for something like:
NAME_1, EMAIL_1, NAME_2, EMAIL_2
df_temp = df[df.filter(like='NAME') & df.filter(like='EMAIL')]
But it doesn't work.
The following does not return the data as I need it:
df_temp_1 = df.filter(like='NAME')
df_temp_2 = df.filter(like='EMAIL')
df_temp_3 = pd.concat([df_temp_1, df_temp_2], axis=1)
NAME_1, NAME_2, EMAIL_1, EMAIL_2, etc.
I want to see it as:
NAME_1, EMAIL_1, NAME_2, EMAIL_2
I would even like to be able to separate them into different dataframes.
do you recommend a method please?
From already thank you very much.
Regards
CodePudding user response:
Use regex
instead of like
:
out = df.filter(regex=('^(NAME|EMAIL)'))
print(out)
# Output (sample)
NAME_1 EMAIL_1 NAME_2 EMAIL_2
0 5 9 5 9
1 8 2 3 9
2 8 8 1 5
3 6 7 9 5
4 6 6 4 3
CodePudding user response:
You can try
out = df.filter(regex='(NAME|EMAIL)_[12]')
If suffix is different
out = df.filter(regex='(NAME_[12]|EMAIL_[3-5])')
CodePudding user response:
Check with startswith
out = df.loc[:,df.columns.str.startswith(('NAME','EMAIL'))]
CodePudding user response:
Using a regular expression is the best solution, but you can also extract the columns from the two filters
calls and join them together with union
:
cols = df.filter(like='NAME').columns.union(df.filter(like='EMAIL').columns)
Output:
>>> cols
Index(['EMAIL_1', 'EMAIL_2', 'NAME_1', 'NAME_2'], dtype='object'