Pandas - Repeat row if found in list and count-CodePudding

Need help in repeating rows if found in list.

If found value in list increment count
If more than one instance found then repeat the row by incrementing count

data = {'First_Name':['Tom', 'Nick', 'Daniel', 'Jack'],
        'oter_col':['other_value1', 'other_value2', 'other_value3', 'other_value4']}
df = pd.DataFrame(data)
df["repeation"] = 0

DF looks like

  First_Name      oter_col  count
0        Tom  other_value1      0
1       Nick  other_value2      0
2     Daniel  other_value3      0
3       Jack  other_value4      0

Input list:

Full_Name = ["Tom Cruise", "Tom Holland", "Tom Hardy", "Jack black", "Chris Hemsworth"]

In need output like:

  First_Name    Full_Name      oter_col  count
0        Tom   Tom Cruise  other_value1      1
1        Tom  Tom Holland  other_value1      2
2        Tom    Tom Hardy  other_value1      3
3       Jack   Jack black  other_value4      1
4       Nick               other_value2      1
5     Daniel               other_value3      0

Tried something like this to get first matching index so I can repeat the row but not sure how to do it.

for name in Full_Name:
    m = df.First_Name.str.contains(name.split()[0])
    first_index_found = m.idxmax() if m.any() else None
    if type(first_index_found) == int:
        print(first_index_found)

CodePudding user response：

You can craft a DataFrame to merge and count the non-NA names as cumsum per group:

df2 = (pd.DataFrame({'Full_Name': Full_Name})
         .assign(First_Name=lambda d: d['Full_Name'].str.extract('(\w )'))
      )
(df
 .merge(df2, on='First_Name', how='left')
 .assign(repeation=lambda d: d['Full_Name'].notna().groupby(d['First_Name']).cumsum())
)

output:

  First_Name      oter_col  repeation    Full_Name
0        Tom  other_value1          1   Tom Cruise
1        Tom  other_value1          2  Tom Holland
2        Tom  other_value1          3    Tom Hardy
3       Nick  other_value2          0          NaN
4     Daniel  other_value3          0          NaN
5       Jack  other_value4          1   Jack black

CodePudding user response：

You can use:

# Create a regex pattern to extract First_Name from Full_Name
pat = fr"\b({'|'.join(df['First_Name'])})\b"

# Create a dataframe from Full_Name
df1 = pd.DataFrame({'Full_Name': Full_Name})
df1['First_Name'] = df1['Full_Name'].str.extract(pat)

# Merge them on First_Name column
out = df.merge(df1, on='First_Name', how='left')

# Count (choose one)
out['repeation'] = out.groupby('First_Name').cumcount().add(1)
# OR
out['repeation2'] = (out.dropna().groupby('First_Name').cumcount().add(1)
                        .reindex(out.index, fill_value=0))

Output:

>>> out
  First_Name      oter_col  repeation    Full_Name  repeation2
0        Tom  other_value1          1   Tom Cruise           1
1        Tom  other_value1          2  Tom Holland           2
2        Tom  other_value1          3    Tom Hardy           3
3       Nick  other_value2          1          NaN           0
4     Daniel  other_value3          1          NaN           0
5       Jack  other_value4          1   Jack black           1

CodePudding user response：

You can create a new dataframe from Full_Name to merge with your original data, then group by First_Name and count rows:

data = {'First_Name':['Tom', 'Nick', 'Daniel', 'Jack'],
        'oter_col':['other_value1', 'other_value2', 'other_value3', 'other_value4']}
df = pd.DataFrame(data)

Full_Name = ["Tom Cruise", "Tom Holland", "Tom Hardy", "Jack black", "Chris Hemsworth"]
df2 = pd.DataFrame({ 'First_Name' : [n.split()[0] for n in Full_Name], 'Full_Name' : Full_Name })

df = df.merge(df2, on='First_Name', how='left').fillna('')

df['count'] = df.groupby('First_Name').cumcount().add(1)

Output:

  First_Name      oter_col    Full_Name  count
0        Tom  other_value1   Tom Cruise      1
1        Tom  other_value1  Tom Holland      2
2        Tom  other_value1    Tom Hardy      3
3       Nick  other_value2                   1
4     Daniel  other_value3                   1
5       Jack  other_value4   Jack black      1