This is my input data, which is stored in dataframe df.
Now i want to change all the values in column B to Yearly format. Here is my code:
D = []
for i in df['B']:
for j in df['C']:
if j == 'Year':
D.append(int(i)/1)
elif j == 'Month':
D.append(int(i)/12)
elif j == 'Day':
D.append(int(i)/365)
print(len(df))
print(len(D))
While my original df only has len of 10, the output (list D) has len of 100. Anyone knows how to fix the issue here?
CodePudding user response:
You can try with map
df['D'] = df['B'].div(df['C'].map({'Year':1, 'Month':12, 'Day':365})
CodePudding user response:
B is of size 10 and C is of size 10. You loop through C 10 times. This will generate D of size 100. Because they are of the same size, you only need 1 for loop to populate D.
CodePudding user response:
Your code iterates 10 X 10 times because for each i of df['B'] it iterates throughout all the rows of df['C'].
https://pandas.pydata.org/docs/user_guide/basics.html#iteration
D = []
for row in df.itertuples():
if row.C == 'Year':
D.append(int(i)/1)
elif row.C == 'Month':
D.append(int(i)/12)
elif row.C == 'Day':
D.append(int(i)/365)
CodePudding user response:
You should aplly a function using the df.apply() method.
Just define your logic inside a function to return the value you want for each row, like:
def convertValues(row):
if row['C'] == 'Year':
return int(row['B'])
elif row['C'] == 'Month':
return int(row['B'])/12
elif row['C'] == 'Day':
return int(row['C'])/365
return 'Invalid String'
Then you simply apply to the dataframe:
yearly_values = df.apply(convertValues, axis=1)
Your result will be a Pandas Series, which you can then cast to list or do whatever you want to do.
You can also create a new column on that dataframe with the respective values using:
df['D'] = df.apply(convertValues, axis=1)