I have two dataframes (created with code below) as
df1
Fecha Vals
0 2001-01-01 []
1 2001-01-02 []
2 2001-01-03 []
3 2001-01-04 []
4 2001-01-05 []
5 2001-01-06 []
6 2001-01-07 []
7 2001-01-08 []
8 2001-01-09 []
df2
Fecha Vals
0 2001-01-01 0.0
1 2001-01-03 1.0
2 2001-01-05 2.0
3 2001-01-07 3.0
4 2001-01-09 4.0
I want to append values in df2
to each corresponding row in df1
to obtain
df1
Fecha Vals
0 2001-01-01 [0.0]
1 2001-01-02 []
2 2001-01-03 [1.0]
3 2001-01-04 []
4 2001-01-05 [2.0]
5 2001-01-06 []
6 2001-01-07 [3.0]
7 2001-01-08 []
8 2001-01-09 [4.0]
I am close to finishing this with for
loops, but for large dataframes my partial work already shows this becomes very slow.
I suspect there is a way to do it faster, without looping, but I couldn't so far get there.
As a first step, I could filter rows in df1
with
df1['Fecha'].isin(df2['Fecha'].values)
Notes:
- I will next need to repeat the operation with
df3
, etc., appending to other rows indf1
. I wouldn't want to remove duplicates. - The uniform skipping in
df2
is a fabricated case. - After appending is complete, I would like to create one column for the averages of each row, and another column for the standard deviation.
- Code to create my
df
s
import datetime
import pandas as pd
yy = 2001
date_list = ['{:4d}-{:02d}-{:02d}'.format(yy, mm, dd) for mm in range(1, 2) for dd in range(1, 10)]
fechas1 = [datetime.datetime.strptime(date_base, '%Y-%m-%d') for date_base in date_list]
nf1 = len(fechas1)
vals1 = [[] for _ in range(nf1)]
dic1 = { 'Fecha': fechas1, 'Vals': vals1 }
df1 = pd.DataFrame(dic1)
fechas2 = [datetime.datetime.strptime(date_list[idx], '%Y-%m-%d') for idx in range(0, nf1, 2)]
nf2 = len(fechas2)
vals2 = [float(idx) for idx in range(nf2)]
dic2 = { 'Fecha': fechas2, 'Vals': vals2 }
df2 = pd.DataFrame(dic2)
Related:
- Python intersection of 2 dataframes with list-type columns
- How to append list of values to a column of list in dataframe
- Python appending a list to dataframe column
- Pandas dataframe append to column containing list
- Define a column type as 'list' in Pandas
- https://towardsdatascience.com/dealing-with-list-values-in-pandas-dataframes-a177e534f173
CodePudding user response:
You can use merge
instead of looping and a couple of lambda
like this to update non-matched rows like this-
import pandas as pd
df1 = pd.DataFrame({'Fecha': ['2001-01-01', '2001-01-02', '2001-01-03', '2001-01-04', '2001-01-05', '2001-01-06', '2001-01-07', '2001-01-08', '2001-01-09'], 'Vals': [[] for _ in range(9)]})
df2 = pd.DataFrame({'Fecha': ['2001-01-01', '2001-01-03', '2001-01-05', '2001-01-07', '2001-01-09'], 'Vals': [0.0, 1.0, 2.0, 3.0, 4.0]})
# Merge df1 and df2 on the 'Fecha' column, using an outer join
result = pd.merge(df1, df2, on='Fecha', how='left')
# Fill the null values in the 'Vals_y' column with an empty list
result['Vals_y'] = result['Vals_y'].apply(lambda x: [] if pd.isnull(x) else x)
# Append the values in the 'Vals_y' column to the 'Vals_x' column as a new element in a list for all rows where the 'Vals_y' column is not an empty list
result['Vals'] = result.apply(lambda row: row['Vals_x'] [row['Vals_y']] if pd.notnull(row['Vals_y']) else row['Vals_x'], axis=1)
# drop unnecessary columns
result.drop(['Vals_x', 'Vals_y'], axis=1, inplace=True)
print(result)
Output:
Fecha Vals
0 2001-01-01 [0.0]
1 2001-01-02 []
2 2001-01-03 [1.0]
3 2001-01-04 []
4 2001-01-05 [2.0]
5 2001-01-06 []
6 2001-01-07 [3.0]
7 2001-01-08 []
8 2001-01-09 [4.0]
CodePudding user response:
Using a simple loop:
s = df2.set_index('Fecha')['Vals']
idx = df1.loc[df1['Fecha'].isin(df2['Fecha'])].index
for i in idx:
df1.loc[i, 'Vals'].append(s[df1.loc[i, 'Fecha']])
Updated df1
:
Fecha Vals
0 2001-01-01 [0.0]
1 2001-01-02 []
2 2001-01-03 [1.0]
3 2001-01-04 []
4 2001-01-05 [2.0]
5 2001-01-06 []
6 2001-01-07 [3.0]
7 2001-01-08 []
8 2001-01-09 [4.0]