Home > Blockchain >  Adding a full stop to text when missing
Adding a full stop to text when missing

Time:06-21

How to add a full stop to a text please? I am not able to get the desired combined text.

# Import libraries
import pandas as pd
import numpy as np
 
# Initialize list of lists
data = [['text with a period.', '111A.'], 
        ['text without a period', '222B'], 
        ['text with many periods...', '333C'],
        [np.NaN, '333C'],
        [np.NaN, np.NaN]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['text1', 'text2'])

combined_df=df.copy()
combined_df["combined_text"]=df["text1"].fillna("")   ". "   df["text2"].fillna("")   '.'
combined_df

Desired output

combined_df snapshot

CodePudding user response:

You can use where and cat:

df['combined_text'] = df.text1.where(df.text1.str.endswith('.'),  df.text1   '.').str.cat(
                        df.text2.where(df.text2.str.endswith('.'),  df.text2   '.'),
                        sep=' ',
                        na_rep=''
                      ).str.strip().replace('', np.nan)

Result:

                       text1  text2                    combined_text
0        text with a period.  111A.        text with a period. 111A.
1      text without a period   222B     text without a period. 222B.
2  text with many periods...   333C  text with many periods... 333C.
3                        NaN   333C                            333C.
4                        NaN    NaN                              NaN

(this also works for the case when text1 is given and text2 is NaN)

CodePudding user response:

Hope this helps:

data = [['this is the first text with a period.', '111A.'], 
        ['this is the second text without a period', '222B'], 
        ['this is the third text with many periods...', '333C'],
        [np.NaN, '333C'],
        [np.NaN, np.NaN]]

Create the pandas DataFrame

df = pd.DataFrame(data, columns=['text1', 'text2'])

combined_df=df.copy()
combined_df["combined_text"]=df.text1.str.split('.').str[0] '. ' df.text2.str.split('.').str[0]

print(combined_df)
                                         text1  text2                                   combined_text
0        this is the first text with a period.  111A.      this is the first text with a period. 111A
1     this is the second text without a period   222B  this is the second text without a period. 222B
2  this is the third text with many periods...   333C  this is the third text with many periods. 333C
3                                          NaN   333C                                             NaN
4                                          NaN    NaN                                             NaN
  • Related