Complete list assigned to each row in python-CodePudding

I created a list as a mean of 2 other columns, the length of the list is same as the number of rows in the dataframe. But when I try to add that list as a column to the dataframe, the entire list gets assigned to each row instead of only corresponding values of the list.

glucose_mean = []
for i in range(len(df)):
    mean = (df['h1_glucose_max'] df['h1_glucose_min'])/2
    glucose_mean.append(mean)

df['glucose'] = glucose_mean

data after adding list

CodePudding user response：

I think you overcomplicated it. You don't need for-loop but only one line

df['glucose'] = (df['h1_glucose_max']   df['h1_glucose_min']) / 2

EDIT:

If you want to work with every row separatelly then you can use .apply()

def func(row):
   return (row['h1_glucose_max']   row['h1_glucose_min']) / 2

df['glucose'] = df.apply(func, axis=1)

And if you really need to use for-loop then you can use .iterrows() (or similar functions)

glucose_mean = []

for index, row in df.iterrows():
    mean = (row['h1_glucose_max']   row['h1_glucose_min']) / 2
    glucose_mean.append(mean)

df['glucose'] = glucose_mean

Minimal working example:

import pandas as pd

data = {
    'h1_glucose_min': [1,2,3], 
    'h1_glucose_max': [4,5,6], 
}

df = pd.DataFrame(data)

# - version 1 -

df['glucose_1'] = (df['h1_glucose_max']   df['h1_glucose_min']) / 2

# - version 2 -

def func(row):
   return (row['h1_glucose_max']   row['h1_glucose_min']) / 2

df['glucose_2'] = df.apply(func, axis=1)

# - version 3 -

glucose_mean = []

for index, row in df.iterrows():
    mean = (row['h1_glucose_max']   row['h1_glucose_min']) / 2
    glucose_mean.append(mean)

df['glucose_3'] = glucose_mean

print(df)

CodePudding user response：

You do not need to iterate over your frame. Use this instead (example for a pseudo data frame):

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6, 7, 8], 'col2': [10, 9, 8, 7, 6, 5, 4, 100]})
df['mean_col1_col2'] = df[['col1', 'col2']].mean(axis=1)
df

-----------------------------------
    col1    col2    mean_col1_col2
0   1       10      5.5
1   2       9       5.5
2   3       8       5.5
3   4       7       5.5
4   5       6       5.5
5   6       5       5.5
6   7       4       5.5
7   8       100     54.0
-----------------------------------

CodePudding user response：

As you can see in the following example, your code is appending an entire column each time the for loop executes, so when you assign glucose_mean list as a column, each element is a list instead of a single element:

import pandas as pd

df = pd.DataFrame({'col1':[1, 2, 3, 4], 'col2':[2, 3, 4, 5]})

glucose_mean = []
for i in range(len(df)):
    glucose_mean.append(df['col1'])

print((glucose_mean[0]))
df['col2'] = [5, 6, 7, 8]
print(df)

Output:

0    1
1    2
2    3
3    4
Name: col1, dtype: int64
   col1  col2
0     1     5
1     2     6
2     3     7
3     4     8