Home > OS >  pandas: faster method than df.at[x,y]?
pandas: faster method than df.at[x,y]?

Time:09-16

I have df1

df1 = pd.DataFrame({'x':[1,2,3,5],
                    'y':[2,3,4,6],
                    'value':[1.5,2.0,0.5,3.0]})

df1
    x   y   value
0   1   2   1.5
1   2   3   2.0
2   3   4   0.5
3   5   6   3.0

and I want to assign the value at x and y coordinates to another dataframe df2

df2 = pd.DataFrame(0.0, index=[x for x in range(0,df1['x'].max() 1)], columns=[y for y in range(0,df1['y'].max() 1)])

df2
    0   1   2   3   4   5   6
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 0.0 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.0 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0
5   0.0 0.0 0.0 0.0 0.0 0.0 0.0

by

for x, y, value in zip(df1['x'],df1['y'],df1['value']):

    df2.at[x,y] = value

to give

    0   1   2   3   4   5   6
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 1.5 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 2.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.5 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0
5   0.0 0.0 0.0 0.0 0.0 0.0 3.0

However, it is a bit slow because I have a long df1.

Do we have a faster method than df.at[x,y]?

CodePudding user response:

You can avoid create zero df2 and using df.at method by DataFrame.pivot, DataFrame.fillna and DataFrame.reindex:

df2 = (df1.pivot('x','y','value')
          .fillna(0)
          .reindex(index=range(df1['x'].max() 1),
                   columns=range(df1['y'].max() 1), fill_value=0))
print (df2)
y    0    1    2    3    4    5    6
x                                   
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  1.5  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  2.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.5  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  3.0

CodePudding user response:

Since your data is all numbers, you can use numpy; with a larger dataset, it might be faster than using pd.pivot:

# create a flattened array from df2
temp = df2.to_numpy().ravel()
# get indices for a flattened array, based on df1.x and df1.y
arr = np.ravel_multi_index((df1.x, df1.y), df2.shape)
# replace at the positions with df1.value
temp[arr] = df1.value
# reshape and create dataframe
temp = temp.reshape(df2.shape)
pd.DataFrame(temp, columns = df2.columns)

     0    1    2    3    4    5    6
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  1.5  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  2.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.5  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  3.0
  • Related