I have this dataset:
Column A |
---|
pt abcdefg |
cv fghikl |
abcdg pt |
opqrs cv |
bp ststst |
qwert bp |
I want the word 'pt', 'cv', and 'bp' to the last of the sentence, so this is the output that I want:
Column A |
---|
abcdefg pt |
fghikl cv |
abcdg pt |
opqrs cv |
ststst bp |
qwert bp |
I haven't tried any code but I found this code but I'm stuck in modifying it since I want to apply it to the whole DataFrame.
def order_word(s, word, delta):
words = s.split()
oldpos = words.index(word)
words.insert(oldpos delta, words.pop(oldpos))
return ' '.join(words)
Can anyone help me to build the code? Thanks in advance.
CodePudding user response:
Here is a proposition using pandas.Series.str.split
with sorted
:
df["Column A"] = (
df["Column A"]
.str.split()
.apply(lambda x: " ".join(sorted(x, key=len, reverse=True)))
)
# Output :
print(df)
Column A
0 abcdefg pt
1 fghikl cv
2 abcdg pt
3 opqrs cv
4 ststst bp
5 qwert bp
CodePudding user response:
You can use a regex with str.replace
:
df['Column A'] = df['Column A'].str.replace(r'\s*\b(cv|pt|bp)\b\s*(.*$)',
r'\2 \1', regex=True)
Output (as new column for clarity):
Column A Column B
0 pt abcdefg abcdefg pt
1 cv fghikl fghikl cv
2 abcdg pt abcdg pt
3 opqrs cv opqrs cv
4 bp ststst ststst bp
5 qwert bp qwert bp