Say I have a table that looks something like:
---------------------------------- ------------------------------------- ----------------------------------
| ExperienceModifier|ApplicationId | ExperienceModifier|RatingModifierId | ExperienceModifier|ActionResults |
---------------------------------- ------------------------------------- ----------------------------------
| | | |
---------------------------------- ------------------------------------- ----------------------------------
I would like to grab all of the columns that lead with 'ExperienceModifier' and stuff the results of that into its own dataframe. How would I accomplish this with pandas?
CodePudding user response:
You can try pandas.DataFrame.filter
df.filter(like='ExperienceModifier')
If you want to get columns only contains ExperienceModifier
at the beginning.
df.filter(regex='^ExperienceModifier')
CodePudding user response:
Ynjxsjmh's answer will get all columns that contain "ExperienceModifier". If you literally want columns that start with that string, rather merely contain it, you can do new_df = df[[col for col in df.columns if col[:18] == 'ExperienceModifier']]
. If all of the desired columns have |
after "ExperienceModifier", you could also do new_df = df[[col for col in df.columns if col.split('|')[0] == 'ExperienceModifier']]
. All of these will create a view of the dataframe. If you want a completely separate dataframe, you should copy it, like this: new_df = df[[col for col in df.columns if col.split('|')[0] == 'ExperienceModifier']].copy()
. You also might want to create a multi-index by splitting the column names on |
rather than creating a separate dataframe.
CodePudding user response:
The accepted answer does easly the job but I still attach my "hand made version" that works:
import pandas as pd
import numpy as np
import re
lst = [[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4]]
column_names = [['ExperienceModifier|ApplicationId', 'ExperienceModifier|RatingModifierId', 'ExperienceModifier|ActionResults','OtherName|ActionResults']]
data = pd.DataFrame(lst, columns = column_names)
data
old_and_dropped_dataframes = []
new_dataframes=[]
for i in np.arange(0,len(column_names[0])):
column_names[0][i].split("|")
splits=re.findall(r"[\w'] ", column_names[0][i])
if "ExperienceModifier" in splits:
new_dataframe = data.iloc[:,[i]]
new_dataframes.append(new_dataframe)
else:
old_and_dropped_dataframe = data.iloc[:,[i]]
old_and_dropped_dataframes.append(old_and_dropped_dataframe)
ExperienceModifier_dataframe = pd.concat(new_dataframes,axis=1)
ExperienceModifier_dataframe
OtherNames_dataframe = pd.concat(old_and_dropped_dataframes,axis=1)
OtherNames_dataframe
This script creates two new dataframes starting from the initial dataframe: one that contains the columns whose names start with ExperienceModifier and an other one that contains the columns that do not start with ExperienceModifier.