I have a dataframe of construction titles and names arranged in a random order (but someone's name is always in the cell to the right of their title) like so:
contact_1_title contact_1_name contact_2_title contact_2_name contact_3_title contact_3_name contact_4_title contact_4_name
0 owner_architect joe other_string other_string other_string other_string other_string other_string
1 other_string other_string architect jack other_string other_string other_string other_string
2 other_string other_string other_string other_string other_string other_string self_cert_architect mary
3 other_string other_string other_string other_string owner phil other_string other_string
4 contractor sarah other_string other_string other_string other_string other_string other_string
5 other_string other_string expeditor kate other_string other_string other_string other_string
I want to pull every title with the word "architect" in it and insert it into its own, new column. I also want to pull every name in the cell immediately to the right and insert it into its own column as well. My desired output:
arch_title_col arch_name_col
0 owner_architect joe
1 architect jack
2 self_cert_architect mary
I'm at a loss as to how to go about this. I tried working with iterrtuples()
but I didn't get very far.
CodePudding user response:
What you need is pd.wide_to_long, but I couldn't get the syntax right for how your columns are formatted. So here it is manually:
title = pd.concat([df[col] for col in df.filter(like='title')], axis=0)
name = pd.concat([df[col] for col in df.filter(like='name')], axis=0)
df = pd.concat([title, name], axis=1)
df.columns = ['title', 'name']
Now that we have things in a good format, it's a simple check:
out = df[df.title.str.contains('architect')]
print(out)
Output:
title name
0 owner_architect joe
1 architect jack
2 self_cert_architect mary
I promise you that 99% of the time, iter...
is not what you want, and there is a far better panda's specific way to do whatever you want to do.