I created a list as a mean of 2 other columns, the length of the list is same as the number of rows in the dataframe. But when I try to add that list as a column to the dataframe, the entire list gets assigned to each row instead of only corresponding values of the list.
glucose_mean = []
for i in range(len(df)):
mean = (df['h1_glucose_max'] df['h1_glucose_min'])/2
glucose_mean.append(mean)
df['glucose'] = glucose_mean
CodePudding user response:
I think you overcomplicated it. You don't need for
-loop but only one line
df['glucose'] = (df['h1_glucose_max'] df['h1_glucose_min']) / 2
EDIT:
If you want to work with every row separatelly then you can use .apply()
def func(row):
return (row['h1_glucose_max'] row['h1_glucose_min']) / 2
df['glucose'] = df.apply(func, axis=1)
And if you really need to use for
-loop then you can use .iterrows()
(or similar functions)
glucose_mean = []
for index, row in df.iterrows():
mean = (row['h1_glucose_max'] row['h1_glucose_min']) / 2
glucose_mean.append(mean)
df['glucose'] = glucose_mean
Minimal working example:
import pandas as pd
data = {
'h1_glucose_min': [1,2,3],
'h1_glucose_max': [4,5,6],
}
df = pd.DataFrame(data)
# - version 1 -
df['glucose_1'] = (df['h1_glucose_max'] df['h1_glucose_min']) / 2
# - version 2 -
def func(row):
return (row['h1_glucose_max'] row['h1_glucose_min']) / 2
df['glucose_2'] = df.apply(func, axis=1)
# - version 3 -
glucose_mean = []
for index, row in df.iterrows():
mean = (row['h1_glucose_max'] row['h1_glucose_min']) / 2
glucose_mean.append(mean)
df['glucose_3'] = glucose_mean
print(df)
CodePudding user response:
You do not need to iterate over your frame. Use this instead (example for a pseudo data frame):
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6, 7, 8], 'col2': [10, 9, 8, 7, 6, 5, 4, 100]})
df['mean_col1_col2'] = df[['col1', 'col2']].mean(axis=1)
df
-----------------------------------
col1 col2 mean_col1_col2
0 1 10 5.5
1 2 9 5.5
2 3 8 5.5
3 4 7 5.5
4 5 6 5.5
5 6 5 5.5
6 7 4 5.5
7 8 100 54.0
-----------------------------------
CodePudding user response:
As you can see in the following example, your code is appending an entire column each time the for loop executes, so when you assign glucose_mean
list as a column, each element is a list instead of a single element:
import pandas as pd
df = pd.DataFrame({'col1':[1, 2, 3, 4], 'col2':[2, 3, 4, 5]})
glucose_mean = []
for i in range(len(df)):
glucose_mean.append(df['col1'])
print((glucose_mean[0]))
df['col2'] = [5, 6, 7, 8]
print(df)
Output:
0 1
1 2
2 3
3 4
Name: col1, dtype: int64
col1 col2
0 1 5
1 2 6
2 3 7
3 4 8