Home > front end >  comma seperated values in columns as rows in pandas
comma seperated values in columns as rows in pandas

Time:12-15

I have a dataframe in pandas as mentioned below where elements in column info is same as unique file in column id:

id   text         info
1    great        boy,police
1    excellent    boy,police
2    nice         girl,mother,teacher
2    good         girl,mother,teacher
2    bad          girl,mother,teacher
3    awesome      grandmother
4    superb       grandson

All I want to get list elements as row for each file, like:

id   text         info
1    great        boy
1    excellent    police
2    nice         girl
2    good         mother
2    bad          teacher
3    awesome      grandmother
4    superb       grandson

CodePudding user response:

One way using pandas.DataFrame.groupby.transform.

Note that this assumes:

  1. elements in info have same length as the number of members for each id after split by ','
  2. elements in info are identical among the same id.

df["info"] = df.groupby("id")["info"].transform(lambda x: x.str.split(",").iloc[0])
print(df)

Output:

   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson

CodePudding user response:

create temp variable counting the number of rows for each info group:

temp = df.groupby('info').cumcount()

Do a list comprehension to index per text in info:

df['info'] = [ent.split(',')[pos] for ent, pos in zip(df['info'], temp)]

df

   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson

CodePudding user response:

Or try apply:

df['info'] = pd.DataFrame({'info': df['info'].str.split(','), 'n': df.groupby('id').cumcount()}).apply(lambda x: x['info'][x['n']], axis=1)

Output:

>>> df
   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson
>>> 

CodePudding user response:

Let us try

df['new'] = df.loc[~df.id.duplicated(),'info'].str.split(',').explode().values
df
   id       text                 info          new
0   1      great           boy,police          boy
1   1  excellent           boy,police       police
2   2       nice  girl,mother,teacher         girl
3   2       good  girl,mother,teacher       mother
4   2        bad  girl,mother,teacher      teacher
5   3    awesome          grandmother  grandmother
6   4     superb             grandson     grandson

CodePudding user response:

Take advantage of the fact that 'info' is duplicated.

df['info'] = df['info'].drop_duplicates().str.split(',').explode().to_numpy()

Output:

   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson
  • Related