I have a dataframe that has multiple time columns and an assigned value.
import pandas as pd
df = pd.DataFrame({'time': {0: 0.2, 1: 0.3, 2: 0.4, 3: nan}, 'val1': {0: 1.0, 1: 3.0, 2: 1.0, 3: nan}, 'time2': {0: 0.1, 1: 0.4, 2: 0.8, 3: 1.0}, 'val2': {0: 2, 1: 2, 2: 9, 3: 2}})
Which looks like this:
time val1 time2 val2
0 0.2 1.0 0.1 2
1 0.3 3.0 0.4 2
2 0.4 1.0 0.8 9
3 NaN NaN 1.0 2
There can be many more time and value columns (but they always come in pairs). I would like to merge all the times columns into ONE column, while keeping and filling in the val
's to their corressponding times.
Example output:
time val1 val2
0 0.1 1.0 2.0
1 0.2 1.0 2.0
2 0.3 3.0 2.0
3 0.4 1.0 2.0
4 0.8 1.0 9.0
5 1.0 1.0 2.0
I have asked this question previously, and one answer got very close! Answer and output below:
df1 = (pd.wide_to_long(df.rename(columns={'time':'time1'}).reset_index(),
'time', i='index', j='t')
.sort_values(['time','val2'])
.drop_duplicates('time')
.dropna(subset=['time'])
.reset_index(drop=True))
output:
val1 val2 time
0 1.0 2 0.1
1 1.0 2 0.2
2 3.0 2 0.3
3 3.0 2 0.4 <- val1 incorrect
4 1.0 9 0.8
5 NaN 2 1.0
CodePudding user response:
IIUC, you can't achieve this with wide_to_long
.
You don't have a canonical reshaping. There are duplicate values (e.g. time 0.4) and you need to make a choice here.
So, I guess you need to perform two merges and combine in the desired order:
m1 = (
df[['time', 'val1']]
.merge(df[['time2', 'val2']]
.rename(columns={'time2': 'time'}),
on='time', how='outer')
.sort_values(by='time')
)
m2 = (
df[['time', 'val2']]
.merge(df[['time2', 'val1']]
.rename(columns={'time2': 'time'}),
on='time', how='outer')
.sort_values(by='time')
)
out = m1.combine_first(m2).dropna(subset='time')
output:
time val1 val2
4 0.1 1.0 2.0
0 0.2 1.0 2.0
1 0.3 3.0 2.0
2 0.4 1.0 2.0
5 0.8 1.0 9.0
6 1.0 NaN 2.0
CodePudding user response:
Here is another simple approach. melt
the data, sort to have time1 before time2 and in case of duplicate times, get the first row for val1 and last for val2.
cols = ['val1', 'val2']
(df
.rename(columns={'time': 'time1'})
.melt(id_vars=cols, value_name='time')
.dropna(subset='time')
.sort_values(by=['time', 'variable'])
.groupby('time').agg({'val1': 'first', 'val2': 'last'})
.reset_index()
)
output:
time val1 val2
0 0.1 1.0 2
1 0.2 1.0 2
2 0.3 3.0 2
3 0.4 1.0 2
4 0.8 1.0 9
5 1.0 NaN 2