Home > Enterprise >  Pandas - Repeat row if found in list and count
Pandas - Repeat row if found in list and count

Time:06-06

Need help in repeating rows if found in list.

  • If found value in list increment count
  • If more than one instance found then repeat the row by incrementing count

DF

data = {'First_Name':['Tom', 'Nick', 'Daniel', 'Jack'],
        'oter_col':['other_value1', 'other_value2', 'other_value3', 'other_value4']}
df = pd.DataFrame(data)
df["repeation"] = 0

DF looks like

  First_Name      oter_col  count
0        Tom  other_value1      0
1       Nick  other_value2      0
2     Daniel  other_value3      0
3       Jack  other_value4      0

Input list:

Full_Name = ["Tom Cruise", "Tom Holland", "Tom Hardy", "Jack black", "Chris Hemsworth"]

In need output like:

  First_Name    Full_Name      oter_col  count
0        Tom   Tom Cruise  other_value1      1
1        Tom  Tom Holland  other_value1      2
2        Tom    Tom Hardy  other_value1      3
3       Jack   Jack black  other_value4      1
4       Nick               other_value2      1
5     Daniel               other_value3      0

Tried something like this to get first matching index so I can repeat the row but not sure how to do it.

for name in Full_Name:
    m = df.First_Name.str.contains(name.split()[0])
    first_index_found = m.idxmax() if m.any() else None
    if type(first_index_found) == int:
        print(first_index_found)

CodePudding user response:

You can craft a DataFrame to merge and count the non-NA names as cumsum per group:

df2 = (pd.DataFrame({'Full_Name': Full_Name})
         .assign(First_Name=lambda d: d['Full_Name'].str.extract('(\w )'))
      )
(df
 .merge(df2, on='First_Name', how='left')
 .assign(repeation=lambda d: d['Full_Name'].notna().groupby(d['First_Name']).cumsum())
)

output:

  First_Name      oter_col  repeation    Full_Name
0        Tom  other_value1          1   Tom Cruise
1        Tom  other_value1          2  Tom Holland
2        Tom  other_value1          3    Tom Hardy
3       Nick  other_value2          0          NaN
4     Daniel  other_value3          0          NaN
5       Jack  other_value4          1   Jack black

CodePudding user response:

You can use:

# Create a regex pattern to extract First_Name from Full_Name
pat = fr"\b({'|'.join(df['First_Name'])})\b"

# Create a dataframe from Full_Name
df1 = pd.DataFrame({'Full_Name': Full_Name})
df1['First_Name'] = df1['Full_Name'].str.extract(pat)

# Merge them on First_Name column
out = df.merge(df1, on='First_Name', how='left')

# Count (choose one)
out['repeation'] = out.groupby('First_Name').cumcount().add(1)
# OR
out['repeation2'] = (out.dropna().groupby('First_Name').cumcount().add(1)
                        .reindex(out.index, fill_value=0))

Output:

>>> out
  First_Name      oter_col  repeation    Full_Name  repeation2
0        Tom  other_value1          1   Tom Cruise           1
1        Tom  other_value1          2  Tom Holland           2
2        Tom  other_value1          3    Tom Hardy           3
3       Nick  other_value2          1          NaN           0
4     Daniel  other_value3          1          NaN           0
5       Jack  other_value4          1   Jack black           1

CodePudding user response:

You can create a new dataframe from Full_Name to merge with your original data, then group by First_Name and count rows:

data = {'First_Name':['Tom', 'Nick', 'Daniel', 'Jack'],
        'oter_col':['other_value1', 'other_value2', 'other_value3', 'other_value4']}
df = pd.DataFrame(data)

Full_Name = ["Tom Cruise", "Tom Holland", "Tom Hardy", "Jack black", "Chris Hemsworth"]
df2 = pd.DataFrame({ 'First_Name' : [n.split()[0] for n in Full_Name], 'Full_Name' : Full_Name })

df = df.merge(df2, on='First_Name', how='left').fillna('')

df['count'] = df.groupby('First_Name').cumcount().add(1)

Output:

  First_Name      oter_col    Full_Name  count
0        Tom  other_value1   Tom Cruise      1
1        Tom  other_value1  Tom Holland      2
2        Tom  other_value1    Tom Hardy      3
3       Nick  other_value2                   1
4     Daniel  other_value3                   1
5       Jack  other_value4   Jack black      1
  • Related