I have a Dataframe with hundreds of columns and very long column-names. I want to remove any text in the column-names after the ":" or "." This is basically splitting on multiple delimiters.
I tried to extract column-names in a list and use the split method at ":" and "." and then keep only the portion of the text before ":" or "." but the split did not work as I wanted. I do not know why. any idea how to fix it and achieve my goal.
data = {'Name of the injured. Bla bla bla': ['Bill', 'John'],
'Age of the injured: bla bla': [50,40],
}
df_data = pd.DataFrame.from_dict(data)
print(df_data)
cols = df_data.columns.values
new_cols = [( x.split(':') or x.split('.') ) for x in cols]
print(new_cols)
This is the outcome that I need:
Thanks, GR
CodePudding user response:
You could use regular expression re
in order to split the column name strings, and then reassing them to the dataframe :
import pandas as pd
import re
data = {'Name of the injured. Bla bla bla': ['Bill', 'John'],
'Age of the injured: bla bla': [50,40],
}
df_data = pd.DataFrame.from_dict(data)
then, following your intuition, you could do:
cols = df_data.columns.values
new_cols= [re.split(r'[.:]',item)[0] for item in cols]
df_data.columns = new_cols
If you have some other charcaters to split your text, just add it between the brackets, and here we choose to keep only the first part, hence the [0]
.
The result for df_data
is as expected:
Name of the injured Age of the injured
0 Bill 50
1 John 40
CodePudding user response:
import re
# using re.split, split the col on multiple delimiters
# list comprehension to form a list of columns
# and assign back to the DF
df.columns=[re.split(r'[:|.]', col)[0] for col in df.columns]
df
Name of the injured Age of the injured
0 Bill 50
1 John 40