I downloaded data from an api and it got pretty messy. I have this dataframe and I am trying to apply a function to get data after ":". The catch is that there are some cells with more than one ":". I tried the following.
matrix = [('points:31','points:25'),
('name:Deportes wigan','name:Deportes loco'),
('team:id:1059','team:id:1057'),
('league:id:1000','league:id:1000')
]
Create a Dataframe object
df= pd.DataFrame(matrix, columns = list('ab'))
df
I want the following df :
31, 25
Deportes wigan,Deportes loco
1059, 1057
1000 , 1000
I tried
new_df = ejemplo.apply(lambda x : x.split(":")[-1])
new_df
CodePudding user response:
You can use DataFrame.applymap
new_df = ejemplo.applymap(lambda x : x.split(":")[-1])
print(new_df)
a b
0 31 25
1 Deportes wigan Deportes loco
2 1059 1057
3 1000 1000
Or with Series.str.split
new_df = ejemplo.apply(lambda col: col.str.split(":").str[-1])
CodePudding user response:
You can post-process the data with str.extract
:
df = (pd.DataFrame(matrix, columns = list('ab'))
.apply(lambda s: s.str.extract('(?<=:)([^:]*$)', expand=False))
)
output:
a b
0 31 25
1 Deportes wigan Deportes loco
2 1059 1057
3 1000 1000