How to add a full stop to a text please? I am not able to get the desired combined text.
# Import libraries
import pandas as pd
import numpy as np
# Initialize list of lists
data = [['text with a period.', '111A.'],
['text without a period', '222B'],
['text with many periods...', '333C'],
[np.NaN, '333C'],
[np.NaN, np.NaN]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['text1', 'text2'])
combined_df=df.copy()
combined_df["combined_text"]=df["text1"].fillna("") ". " df["text2"].fillna("") '.'
combined_df
CodePudding user response:
df['combined_text'] = df.text1.where(df.text1.str.endswith('.'), df.text1 '.').str.cat(
df.text2.where(df.text2.str.endswith('.'), df.text2 '.'),
sep=' ',
na_rep=''
).str.strip().replace('', np.nan)
Result:
text1 text2 combined_text
0 text with a period. 111A. text with a period. 111A.
1 text without a period 222B text without a period. 222B.
2 text with many periods... 333C text with many periods... 333C.
3 NaN 333C 333C.
4 NaN NaN NaN
(this also works for the case when text1
is given and text2
is NaN
)
CodePudding user response:
Hope this helps:
data = [['this is the first text with a period.', '111A.'],
['this is the second text without a period', '222B'],
['this is the third text with many periods...', '333C'],
[np.NaN, '333C'],
[np.NaN, np.NaN]]
Create the pandas DataFrame
df = pd.DataFrame(data, columns=['text1', 'text2'])
combined_df=df.copy()
combined_df["combined_text"]=df.text1.str.split('.').str[0] '. ' df.text2.str.split('.').str[0]
print(combined_df)
text1 text2 combined_text
0 this is the first text with a period. 111A. this is the first text with a period. 111A
1 this is the second text without a period 222B this is the second text without a period. 222B
2 this is the third text with many periods... 333C this is the third text with many periods. 333C
3 NaN 333C NaN
4 NaN NaN NaN