Dataframe with column of type list: Append to selected rows-CodePudding

I have two dataframes (created with code below) as

df1
       Fecha Vals
0 2001-01-01   []
1 2001-01-02   []
2 2001-01-03   []
3 2001-01-04   []
4 2001-01-05   []
5 2001-01-06   []
6 2001-01-07   []
7 2001-01-08   []
8 2001-01-09   []

df2
       Fecha  Vals
0 2001-01-01   0.0
1 2001-01-03   1.0
2 2001-01-05   2.0
3 2001-01-07   3.0
4 2001-01-09   4.0

I want to append values in df2 to each corresponding row in df1 to obtain

df1
       Fecha Vals
0 2001-01-01   [0.0]
1 2001-01-02   []
2 2001-01-03   [1.0]
3 2001-01-04   []
4 2001-01-05   [2.0]
5 2001-01-06   []
6 2001-01-07   [3.0]
7 2001-01-08   []
8 2001-01-09   [4.0]

I am close to finishing this with for loops, but for large dataframes my partial work already shows this becomes very slow. I suspect there is a way to do it faster, without looping, but I couldn't so far get there.

As a first step, I could filter rows in df1 with

df1['Fecha'].isin(df2['Fecha'].values)

Notes:

I will next need to repeat the operation with df3, etc., appending to other rows in df1. I wouldn't want to remove duplicates.
The uniform skipping in df2 is a fabricated case.
After appending is complete, I would like to create one column for the averages of each row, and another column for the standard deviation.
Code to create my dfs

import datetime
import pandas as pd
yy = 2001
date_list = ['{:4d}-{:02d}-{:02d}'.format(yy, mm, dd) for mm in range(1, 2) for dd in range(1, 10)]
fechas1 = [datetime.datetime.strptime(date_base, '%Y-%m-%d') for date_base in date_list]
nf1 = len(fechas1)
vals1 = [[] for _ in range(nf1)]
dic1 = { 'Fecha': fechas1, 'Vals': vals1 }
df1 = pd.DataFrame(dic1)
fechas2 = [datetime.datetime.strptime(date_list[idx], '%Y-%m-%d') for idx in range(0, nf1, 2)]
nf2 = len(fechas2)
vals2 = [float(idx) for idx in range(nf2)]
dic2 = { 'Fecha': fechas2, 'Vals': vals2 }
df2 = pd.DataFrame(dic2)

Related:

CodePudding user response：

You can use merge instead of looping and a couple of lambda like this to update non-matched rows like this-

import pandas as pd

df1 = pd.DataFrame({'Fecha': ['2001-01-01', '2001-01-02', '2001-01-03', '2001-01-04', '2001-01-05', '2001-01-06', '2001-01-07', '2001-01-08', '2001-01-09'], 'Vals': [[] for _ in range(9)]})
df2 = pd.DataFrame({'Fecha': ['2001-01-01', '2001-01-03', '2001-01-05', '2001-01-07', '2001-01-09'], 'Vals': [0.0, 1.0, 2.0, 3.0, 4.0]})

# Merge df1 and df2 on the 'Fecha' column, using an outer join
result = pd.merge(df1, df2, on='Fecha', how='left')
# Fill the null values in the 'Vals_y' column with an empty list
result['Vals_y'] = result['Vals_y'].apply(lambda x: [] if pd.isnull(x) else x)
# Append the values in the 'Vals_y' column to the 'Vals_x' column as a new element in a list for all rows where the 'Vals_y' column is not an empty list
result['Vals'] = result.apply(lambda row: row['Vals_x']   [row['Vals_y']] if pd.notnull(row['Vals_y']) else row['Vals_x'], axis=1)

# drop unnecessary columns
result.drop(['Vals_x', 'Vals_y'], axis=1, inplace=True)
print(result)

Output:

        Fecha   Vals
0  2001-01-01  [0.0]
1  2001-01-02     []
2  2001-01-03  [1.0]
3  2001-01-04     []
4  2001-01-05  [2.0]
5  2001-01-06     []
6  2001-01-07  [3.0]
7  2001-01-08     []
8  2001-01-09  [4.0]

CodePudding user response：

Using a simple loop:

s = df2.set_index('Fecha')['Vals']
idx = df1.loc[df1['Fecha'].isin(df2['Fecha'])].index

for i in idx:
    df1.loc[i, 'Vals'].append(s[df1.loc[i, 'Fecha']])

Updated df1:

        Fecha   Vals
0  2001-01-01  [0.0]
1  2001-01-02     []
2  2001-01-03  [1.0]
3  2001-01-04     []
4  2001-01-05  [2.0]
5  2001-01-06     []
6  2001-01-07  [3.0]
7  2001-01-08     []
8  2001-01-09  [4.0]