Home > Enterprise >  Filter and move text in another column in substring
Filter and move text in another column in substring

Time:10-01

I have the following dataset:

df = pd.DataFrame([
{'Phone': 'Fax(925) 482-1195', 'Fax': None},
{'Phone': 'Fax(406) 226-0317', 'Fax': None},
{'Phone': 'Fax 1 650-383-6305', 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
{'Phone': None, 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}]
)

Which should look like:

enter image description here

What I'm trying to do is: for every row that I see "Fax", I want to truncate it and transfer this record to the column "Fax".

At first, I was trying to query only the matching with this filtering:

df[df['Phone'].str.contains("Fax") == True, "Fax"] = df[df['Phone'].str.contains("Fax") == True]

But it does not works, with the error: "TypeError: unhashable type: 'Series'".

Any ideas?

CodePudding user response:

You have a bunch of rows, that is, a list of dicts. Simplest approach would be to massage each row prior to adding it to the dataframe.

rows = [ ... ]

def get_contacts(rows):
    for row in rows:
        phone, fax = row['Phone'], row['Fax']
        if 'Fax' in phone:
            phone, fax = None, phone
        yield phone, fax

df = pd.DataFrame(get_contacts(rows))

You can force str instead of None with a filter like this:

        ...
        yield clean(phone), clean(fax)
        ...

def clean(s, default=''):
    if s is None:
        return default
    return s

If you really prefer to stick to using Pandas, you might want to

  1. identify a mask of rows where df.Phone contains 'Fax', then
  2. copy that subset into df['Fax'], then
  3. blank out selected df['Phone'] entries.

You can verify / debug each step by itself -- get (1) right before moving on to attempt (2).

If you choose to go this route, please post your final solution.

CodePudding user response:

Another option using .assign and np.where

import numpy as np
import pandas as pd


df = pd.DataFrame(
    [
        {'Phone': 'Fax(925) 482-1195', 'Fax': None},
        {'Phone': 'Fax(406) 226-0317', 'Fax': None},
        {'Phone': 'Fax 1 650-383-6305', 'Fax': None},
        {'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
        {'Phone': None, 'Fax': None},
        {'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}
    ]
)

print(f"{df}\n")

condition = df["Phone"].str.contains("fax", case=False)
df = df.assign(
    Fax=np.where(condition, df["Phone"], df["Fax"]),
    Phone=np.where(condition, "", df["Phone"])
).fillna("")

print(df)

                 Phone                 Fax
0                        Fax(925) 482-1195
1                        Fax(406) 226-0317
2                       Fax 1 650-383-6305
3  Phone(334) 585-1171   Fax(334) 585-1182
4                                         
5  Phone(334) 585-1171   Fax(334) 585-1176
  • Related