I have a Pandas dataframe like this :
id A B
0 1 toto tata
1 1 toto tata
2 2 titi tutu
3 2 titi tutu
4 3 toto tata titi
5 3 toto tata titi
6 3 toto tata titi
Thanks to the split
function, I can split these string values :
for i in range(len(df)):
var = output.iloc[i, 1].split(" ")
print(var)
['toto', 'tata']
['toto', 'tata']
['titi', 'tutu']
['titi', 'tutu']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
Now, I want to put these values in a new column, one value after the other :
id A B C
0 1 toto tata toto
1 1 toto tata tata
2 2 titi tutu titi
3 2 titi tutu tutu
4 3 toto tata titi toto
5 3 toto tata titi tata
6 3 toto tata titi titi
I tried to do this :
for i in range(len(df)):
var = df.iloc[i, 1].split(" ")
print(var)
for y in range(len(var)):
df.at[i, 'C'] = var[y]
But it always returns the last value of the split :
id A B C
0 1 toto tata tata
1 1 toto tata tata
2 2 titi tutu tutu
3 2 titi tutu tutu
4 3 toto tata titi titi
5 3 toto tata titi titi
6 3 toto tata titi titi
I'm missing this little detail for my algo to work but I can't find it.
CodePudding user response:
Assuming you always have as many rows in a group that there are ' ' separated items, that the groups are consecutive, and that the strings are identical per group.
A simple way is to remove the duplicates, str.split
, and explode
:
df['C'] = df.groupby('A')['B'].first().str.split(' ').explode().values
output:
id A B C
0 0 1 toto tata toto
1 1 1 toto tata tata
2 2 2 titi tutu titi
3 3 2 titi tutu tutu
4 4 3 toto tata titi toto
5 5 3 toto tata titi tata
6 6 3 toto tata titi titi
If the rows are not grouped per consecutive "group", apply the same logic per group with groupby
transform
:
Example:
# shuffle the rows to generate an example
df2 = df.sample(frac=1)
# extract the chunks
df2['C'] = df2.groupby('A')['B'].transform(lambda x: x.head(1).str.split(' ').explode().values)
output:
id A B C
4 4 3 toto tata titi toto
1 1 1 toto tata toto
0 0 1 toto tata tata
3 3 2 titi tutu titi
6 6 3 toto tata titi tata
5 5 3 toto tata titi titi
2 2 2 titi tutu tutu