I have a Dataframe df
that looks like this, although I have 20 P(n)
columns;
Time Speed P1 P2 Ratio
1 100 2 NaN 1.2
2 150 3 NaN 1.3
3 500 4 NaN 1.4
4 500 5 NaN 1.5
5 900 NaN 7 1.6
6 150 NaN 6 1.7
7 100 NaN 8 1.8
I'm trying to crosstabulate this data to build a new Dataframe, with the Ratio
value for every value of Speed
and P(n)
. So that the result would look something like this;
Time Speed P1 P2
1 100 1.2 NaN
2 150 1.3 NaN
3 500 1.4 NaN
4 500 1.5 NaN
5 900 NaN 1.6
6 150 NaN 1.7
7 100 NaN 1.8
Essentially, my aim is to replace the values for P(n)
with the values for Ratio
.
It seems like a simple task, however I'm quite stuck here. I have tried to use pandas crosstab but can only seem to utilise two variables to return the number of instances rather than the values. I tried this;
new_df = pd.crosstab(df['Speed'], df['Ratio'].fillna('n/a'))
Is it possible to use pd.crosstab
to achieve this or is another method available?
CodePudding user response:
Use DataFrame.filter
for get DataFrame with columsn starting by P
and then for replace use DataFrame.mask
for not missing values tested by DataFrame.notna
:
df1 = df.filter(regex='^P')
df[df1.columns] = df1.mask(df1.notna(), df['Ratio'], axis=0)
print (df)
Time Speed P1 P2 Ratio
0 1 100 1.2 NaN 1.2
1 2 150 1.3 NaN 1.3
2 3 500 1.4 NaN 1.4
3 4 500 1.5 NaN 1.5
4 5 900 NaN 1.6 1.6
5 6 150 NaN 1.7 1.7
6 7 100 NaN 1.8 1.8