Home > front end >  Split string value in Pandas & add to a new column the split values one after the other
Split string value in Pandas & add to a new column the split values one after the other

Time:01-05

I have a Pandas dataframe like this :

id  A   B
0   1   toto tata
1   1   toto tata
2   2   titi tutu
3   2   titi tutu
4   3   toto tata titi
5   3   toto tata titi
6   3   toto tata titi

Thanks to the split function, I can split these string values :

for i in range(len(df)):
    var = output.iloc[i, 1].split(" ")
    print(var)
['toto', 'tata']
['toto', 'tata']
['titi', 'tutu']
['titi', 'tutu']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']

Now, I want to put these values in a new column, one value after the other :

id  A   B              C
0   1   toto tata      toto
1   1   toto tata      tata
2   2   titi tutu      titi
3   2   titi tutu      tutu
4   3   toto tata titi toto
5   3   toto tata titi tata
6   3   toto tata titi titi

I tried to do this :

for i in range(len(df)):
    var = df.iloc[i, 1].split(" ")
    print(var)
    for y in range(len(var)):
        df.at[i, 'C'] = var[y]

But it always returns the last value of the split :

id  A   B              C
0   1   toto tata      tata
1   1   toto tata      tata
2   2   titi tutu      tutu
3   2   titi tutu      tutu
4   3   toto tata titi titi
5   3   toto tata titi titi
6   3   toto tata titi titi

I'm missing this little detail for my algo to work but I can't find it.

CodePudding user response:

Assuming you always have as many rows in a group that there are ' ' separated items, that the groups are consecutive, and that the strings are identical per group.

A simple way is to remove the duplicates, str.split, and explode:

df['C'] = df.groupby('A')['B'].first().str.split(' ').explode().values

output:

   id  A               B     C
0   0  1       toto tata  toto
1   1  1       toto tata  tata
2   2  2       titi tutu  titi
3   3  2       titi tutu  tutu
4   4  3  toto tata titi  toto
5   5  3  toto tata titi  tata
6   6  3  toto tata titi  titi

If the rows are not grouped per consecutive "group", apply the same logic per group with groupby transform:

Example:

# shuffle the rows to generate an example
df2 = df.sample(frac=1)

# extract the chunks
df2['C'] = df2.groupby('A')['B'].transform(lambda x: x.head(1).str.split(' ').explode().values)

output:

   id  A               B     C
4   4  3  toto tata titi  toto
1   1  1       toto tata  toto
0   0  1       toto tata  tata
3   3  2       titi tutu  titi
6   6  3  toto tata titi  tata
5   5  3  toto tata titi  titi
2   2  2       titi tutu  tutu
  •  Tags:  
  • Related