I have the following dataset:
df = pd.DataFrame([
{'Phone': 'Fax(925) 482-1195', 'Fax': None},
{'Phone': 'Fax(406) 226-0317', 'Fax': None},
{'Phone': 'Fax 1 650-383-6305', 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
{'Phone': None, 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}]
)
Which should look like:
What I'm trying to do is: for every row that I see "Fax", I want to truncate it and transfer this record to the column "Fax".
At first, I was trying to query only the matching with this filtering:
df[df['Phone'].str.contains("Fax") == True, "Fax"] = df[df['Phone'].str.contains("Fax") == True]
But it does not works, with the error: "TypeError: unhashable type: 'Series'".
Any ideas?
CodePudding user response:
You have a bunch of rows, that is, a list of dicts. Simplest approach would be to massage each row prior to adding it to the dataframe.
rows = [ ... ]
def get_contacts(rows):
for row in rows:
phone, fax = row['Phone'], row['Fax']
if 'Fax' in phone:
phone, fax = None, phone
yield phone, fax
df = pd.DataFrame(get_contacts(rows))
You can force str
instead of None
with a filter like this:
...
yield clean(phone), clean(fax)
...
def clean(s, default=''):
if s is None:
return default
return s
If you really prefer to stick to using Pandas, you might want to
- identify a mask of rows where
df.Phone
contains 'Fax', then - copy that subset into
df['Fax']
, then - blank out selected
df['Phone']
entries.
You can verify / debug each step by itself -- get (1) right before moving on to attempt (2).
If you choose to go this route, please post your final solution.
CodePudding user response:
Another option using .assign and np.where
import numpy as np
import pandas as pd
df = pd.DataFrame(
[
{'Phone': 'Fax(925) 482-1195', 'Fax': None},
{'Phone': 'Fax(406) 226-0317', 'Fax': None},
{'Phone': 'Fax 1 650-383-6305', 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
{'Phone': None, 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}
]
)
print(f"{df}\n")
condition = df["Phone"].str.contains("fax", case=False)
df = df.assign(
Fax=np.where(condition, df["Phone"], df["Fax"]),
Phone=np.where(condition, "", df["Phone"])
).fillna("")
print(df)
Phone Fax
0 Fax(925) 482-1195
1 Fax(406) 226-0317
2 Fax 1 650-383-6305
3 Phone(334) 585-1171 Fax(334) 585-1182
4
5 Phone(334) 585-1171 Fax(334) 585-1176