Home > Software design >  Pandas Crosstab with third column
Pandas Crosstab with third column

Time:11-11

I have a Dataframe df that looks like this, although I have 20 P(n) columns;

Time    Speed    P1     P2    Ratio
1       100      2      NaN     1.2
2       150      3      NaN     1.3
3       500      4      NaN     1.4
4       500      5      NaN     1.5
5       900      NaN      7     1.6
6       150      NaN      6     1.7
7       100      NaN      8     1.8

I'm trying to crosstabulate this data to build a new Dataframe, with the Ratio value for every value of Speed and P(n). So that the result would look something like this;

Time    Speed    P1       P2    
1       100      1.2      NaN
2       150      1.3      NaN
3       500      1.4      NaN
4       500      1.5      NaN
5       900      NaN      1.6
6       150      NaN      1.7
7       100      NaN      1.8

Essentially, my aim is to replace the values for P(n) with the values for Ratio.

It seems like a simple task, however I'm quite stuck here. I have tried to use pandas crosstab but can only seem to utilise two variables to return the number of instances rather than the values. I tried this;

new_df = pd.crosstab(df['Speed'], df['Ratio'].fillna('n/a'))

Is it possible to use pd.crosstab to achieve this or is another method available?

CodePudding user response:

Use DataFrame.filter for get DataFrame with columsn starting by P and then for replace use DataFrame.mask for not missing values tested by DataFrame.notna:

df1 = df.filter(regex='^P')
df[df1.columns] = df1.mask(df1.notna(), df['Ratio'], axis=0)

print (df)
   Time  Speed   P1   P2  Ratio
0     1    100  1.2  NaN    1.2
1     2    150  1.3  NaN    1.3
2     3    500  1.4  NaN    1.4
3     4    500  1.5  NaN    1.5
4     5    900  NaN  1.6    1.6
5     6    150  NaN  1.7    1.7
6     7    100  NaN  1.8    1.8
  • Related