I have a dataframe in pandas as mentioned below where elements in column info
is same as unique file in column id
:
id text info
1 great boy,police
1 excellent boy,police
2 nice girl,mother,teacher
2 good girl,mother,teacher
2 bad girl,mother,teacher
3 awesome grandmother
4 superb grandson
All I want to get list elements as row for each file, like:
id text info
1 great boy
1 excellent police
2 nice girl
2 good mother
2 bad teacher
3 awesome grandmother
4 superb grandson
CodePudding user response:
One way using pandas.DataFrame.groupby.transform
.
Note that this assumes:
- elements in
info
have same length as the number of members for eachid
after split by','
- elements in
info
are identical among the same id.
df["info"] = df.groupby("id")["info"].transform(lambda x: x.str.split(",").iloc[0])
print(df)
Output:
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
CodePudding user response:
create temp variable counting the number of rows for each info
group:
temp = df.groupby('info').cumcount()
Do a list comprehension to index per text in info
:
df['info'] = [ent.split(',')[pos] for ent, pos in zip(df['info'], temp)]
df
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
CodePudding user response:
Or try apply
:
df['info'] = pd.DataFrame({'info': df['info'].str.split(','), 'n': df.groupby('id').cumcount()}).apply(lambda x: x['info'][x['n']], axis=1)
Output:
>>> df
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
>>>
CodePudding user response:
Let us try
df['new'] = df.loc[~df.id.duplicated(),'info'].str.split(',').explode().values
df
id text info new
0 1 great boy,police boy
1 1 excellent boy,police police
2 2 nice girl,mother,teacher girl
3 2 good girl,mother,teacher mother
4 2 bad girl,mother,teacher teacher
5 3 awesome grandmother grandmother
6 4 superb grandson grandson
CodePudding user response:
Take advantage of the fact that 'info'
is duplicated.
df['info'] = df['info'].drop_duplicates().str.split(',').explode().to_numpy()
Output:
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson