Home > Back-end >  Python apply function to each row of DataFrame
Python apply function to each row of DataFrame

Time:06-25

I have DataFrame with two columns: Type and Name. The values in each cell are lists of equal length, i.e we have pairs (Type, Name). I want to:

  1. Group Name by it's Type
  2. Create column Type with the values of Names

My current code is a for loop:

for idx, row in df.iterrows():
    for t in list(set(row["Type"])):
        df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]

but it works very slow. How can I speed up this code?

EDIT Here is the code example which ilustrates what I want to obtain but in a faster way:

import pandas as pd
df = pd.DataFrame({"Type": [["1", "1", "2", "3"], ["2","3"]], "Name": [["A", "B", "C", "D"], ["E", "F"]]})

unique = list(set(row["Type"]))
for t in unique:
    df[t] = None
    df[t] = df[t].astype('object')

for idx, row in df.iterrows():
    for t in unique:
        df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]

CodePudding user response:

You could write a function my_function(param) and then do something like this:

df['type'] = df['name'].apply(lambda x: my_function(x))

There are likely better alternatives to using lambda functions, but lambdas are what I remember. If you post a simplified mock of your original data and what the desired output should look like, it may help you find the best answer to your question. I'm not certain I understand what you're trying to do. A literal group by should be done using enter image description here

This reduces the problem to a single column. Finally produce the dataframe required by:

l = ','.join(df['Name:Type'].to_list()).split(',')
pd.DataFrame([i.split(':') for i in l], columns=['Name','Type'])

Giving: enter image description here

  • Related