Home > Software engineering >  Looping through dataframe with conditions
Looping through dataframe with conditions

Time:10-19

This is my input data, which is stored in dataframe df. Input

Now i want to change all the values in column B to Yearly format. Here is my code:

D = []
for i in df['B']:
    for j in df['C']:
        if j == 'Year':
            D.append(int(i)/1) 
        elif j == 'Month':
            D.append(int(i)/12)
        elif j == 'Day':
            D.append(int(i)/365)

print(len(df))
print(len(D))

While my original df only has len of 10, the output (list D) has len of 100. Anyone knows how to fix the issue here?

CodePudding user response:

You can try with map

df['D'] = df['B'].div(df['C'].map({'Year':1, 'Month':12, 'Day':365})

CodePudding user response:

B is of size 10 and C is of size 10. You loop through C 10 times. This will generate D of size 100. Because they are of the same size, you only need 1 for loop to populate D.

CodePudding user response:

Your code iterates 10 X 10 times because for each i of df['B'] it iterates throughout all the rows of df['C'].

https://pandas.pydata.org/docs/user_guide/basics.html#iteration

D = []
for row in df.itertuples():
   if row.C == 'Year':
      D.append(int(i)/1) 
   elif row.C == 'Month':
      D.append(int(i)/12)
   elif row.C == 'Day':
      D.append(int(i)/365)

CodePudding user response:

You should aplly a function using the df.apply() method.

Just define your logic inside a function to return the value you want for each row, like:

def convertValues(row):
    if row['C'] == 'Year':
        return int(row['B'])
    elif row['C'] == 'Month':
        return int(row['B'])/12
    elif row['C'] == 'Day':
        return int(row['C'])/365
    return 'Invalid String'

Then you simply apply to the dataframe:

yearly_values = df.apply(convertValues, axis=1)

Your result will be a Pandas Series, which you can then cast to list or do whatever you want to do.

You can also create a new column on that dataframe with the respective values using:

df['D'] = df.apply(convertValues, axis=1)
  • Related