Home > database >  Grid-like dataframe to list
Grid-like dataframe to list

Time:05-23

I have an excel dataset which contains 100 rows and 100 clolumns with order frequencies in locations described by x and y.(gird like structure)

I'd like to convert it to the following structure with 3 columns:

x-Coördinaten | y-Coördinaten | value

The "value" column only contains positive integers. The x and y column contain float type data (geograohical coordinates. The order does not matter, as it can easily be sorted afterwards.

So,basicly a merge of lists could work, e.g.:

[[1,5,3,5], [4,2,5,6], [2,3,1,5]] ==> [1,5,3,5,4,2,5,6,2,3,1,5]

But then i would lose the location...which is key for my project.

What is the best way to accomplish this?

CodePudding user response:

Assuming this input:

l = [[1,5,3,5],[4,2,5,6],[2,3,1,5]]
df = pd.DataFrame(l)

you can use stack:

df2 = df.rename_axis(index='x', columns='y').stack().reset_index(name='value')

output:

    x  y  value
0   0  0      1
1   0  1      5
2   0  2      3
3   0  3      5
4   1  0      4
5   1  1      2
6   1  2      5
7   1  3      6
8   2  0      2
9   2  1      3
10  2  2      1
11  2  3      5

or melt for a different order:

df2 = df.rename_axis('x').reset_index().melt('x', var_name='y', value_name='value')

output:

    x  y  value
0   0  0      1
1   1  0      4
2   2  0      2
3   0  1      5
4   1  1      2
5   2  1      3
6   0  2      3
7   1  2      5
8   2  2      1
9   0  3      5
10  1  3      6
11  2  3      5

CodePudding user response:

You should be able to get the results with a melt operation -

df = pd.DataFrame(np.arange(9).reshape(3, 3))
df.columns = [2, 3, 4] 
df.loc[:, 'x'] = [3, 4, 5] 

This is what df looks like

   2  3  4  x
0  0  1  2  3
1  3  4  5  4
2  6  7  8  5

The melt operation -

df.melt(id_vars='x', var_name='y')

output -

   x  y  value
0  3  2      0
1  4  2      3
2  5  2      6
3  3  3      1
4  4  3      4
5  5  3      7
6  3  4      2
7  4  4      5
8  5  4      8
  • Related