Home > Back-end >  Moving specific strings between columns in pandas
Moving specific strings between columns in pandas

Time:01-06

I have a pandas dataframe of properties.

**Address**      | **Added on**.       | 
15 Smith Close   |  Added on 17/11/22  |
1 Apple Drive    |  Reduced on 19/11/22|
27 Pride place   |  Added on 18/1//22  |

I would like to move all those instances of 'reduced on ...' in the 'Added on' column to another column named 'Reduced on' in the dataframe. How would I do this?

Many thanks.

CodePudding user response:

You could use pd.DataFrame.where:

df['Reduced on'] = df['Added on'].where(df['Added on'].str.contains('Reduced on'))
df['Added on'] = df['Added on'].where(~ df['Added on'].str.contains('Reduced on'))

df

          Address           Added on           Reduced on
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

Or another is to use pd.Series.str.extract & pd.DataFrame.concat:

pd.concat([df['Address'], df['Added on'].str.extract('(?P<Added_on>Add.*)|(?P<Reduced_on>Reduced.*)')], axis=1)

          Address           Added_on           Reduced_on
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

CodePudding user response:

Proposed code :

import pandas as pd
import numpy as np

# Build Dataframe to work on
df = pd.DataFrame({"**Address** ": ['15 Smith Close' , '1 Apple Drive', '27 Pride place'], 
                   "**Added on**": ['Added on 17/11/22', 'Reduced on 19/11/22', 'Added on 18/1//22']})

# Define the mask m
m = df['**Added on**'].str.contains('Reduced')               

# 1- Move 'Reduced' rows to **New Col**                       
df['**Reduced on**'] = df['**Added on**'].where(m, np.nan)
# 2- Erase 'Reduced' rows from  **Added on**
df['**Added on**'] = df['**Added on**'].where(~m, np.nan) 

print(df)

Result :

     **Address**        **Added on**          **Reduced on**
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

CodePudding user response:

This should work as well:

(df[['Address']].join(df[['Added on']]
.set_index(df['Added on']
.str.rsplit(n=1)
.str[0]
.rename(None),append=True)['Added on']
.unstack()))

Output:

          Address           Added on           Reduced on
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN
  • Related