I have a csv file representing measurements done to the matrix:
A B C
1
2
3
in the csv:
- The first 2 columns ("Col" & "row") represent the coordinates of the matrix
- The rest of the columns represent measurement setup (0, 1 or 2)
Since measurements are repeated the setup is repeated: (0,1,2,1,0) columns repeat.
Here is the datasets
Col row 0 1 2 1 0 #these represent measurement setup
1 A 1 0.7 0.5 0.4 0.5
2 B 1 0.4 0.2 0.1 0.1
3 C 1 0.3 0.2 0.1 0.1
i want to:
- import file
- pivot to create columns "col - row - setup - iteration"
where setup represents the measurement setup (0,1,2,1,0)
and iteration counts the measurements done for a single pixel (col,row pair) (1,2,3,4,5)
The final dateset should look like this:
Col row setup iteration value
1 A 0 1 1
1 A 1 2 0.7
1 A 2 3 0.5
1 A 1 4 0.4
1 A 0 5 0.5
2 B 0 1 1
2 B 1 2 0.4
2 B 2 3 0.2
2 B 1 4 0.1
2 B 0 5 0.1
.....etc
CodePudding user response:
You want to unpivot or melt your dataframe:
df_orig = pd.read_csv('file.csv')
df = df_orig.melt(['Col', 'row'], var_name='setup')
It gives:
Col row setup value
0 1 A 0 1.0
1 2 B 0 1.0
2 3 C 0 1.0
3 1 A 1 0.7
4 2 B 1 0.4
5 3 C 1 0.3
6 1 A 2 0.5
7 2 B 2 0.2
8 3 C 2 0.2
9 1 A 1.1 0.4
10 2 B 1.1 0.1
11 3 C 1.1 0.1
12 1 A 0.1 0.5
13 2 B 0.1 0.1
14 3 C 0.1 0.1
Because by default (or for older version) read_csv
has mangled duplicate column name, we have to demangle the setup column:
df['setup'] = df['setup'].replace('\\..*', '', regex=True)
Just add an iteration column:
df['iteration'] = 1 df.index // len(df_orig)
and sort the dataframe:
df.sort_values(['Col', 'row', 'iteration']).reset_index(drop=True)
to get:
Col row setup value iteration
0 1 A 0 1.0 0
1 1 A 1 0.7 1
2 1 A 2 0.5 2
3 1 A 1 0.4 3
4 1 A 0 0.5 4
5 2 B 0 1.0 0
6 2 B 1 0.4 1
7 2 B 2 0.2 2
8 2 B 1 0.1 3
9 2 B 0 0.1 4
10 3 C 0 1.0 0
11 3 C 1 0.3 1
12 3 C 2 0.2 2
13 3 C 1 0.1 3
14 3 C 0 0.1 4