I have one dataframe df1
that looks like this. The date format is M/D/Y.
date | a |
---|---|
1/1/2001 | 100 |
2/1/2001 | 101 |
And another dataframe df2
that represents outflows and looks like this:
date | a |
---|---|
1/1/2001 | 2 |
1/2/2001 | 3 |
1/3/2001 | -2 |
1/4/2001 | 0 |
Desired output df:
date | a |
---|---|
1/1/2001 | 100 |
1/2/2001 | 98 |
1/3/2001 | 101 |
1/4/2001 | 99 |
I need to take the df1
values from the first of every month as the starting value. For each subsequent day of the month, I want to add the values from df2
from the starting value of df1
and place the results in df3
.
CodePudding user response:
OK, this does what you ask. Your own data is inconsistent, however. Your text says you ADD the number to the inventory; on the first day, you SUBTRACT the number, but you add the rest.
So, given "x1.csv":
date,inventory
1/1/2001,100
2/1/2001,101
3/1/2001,103
and "x2.csv":
date,update
1/1/2001,2
1/2/2001,3
1/3/2001,-2
1/4/2001,0
2/1/2001,5
2/2/2001,3
2/3/2001,-10
3/1/2001,0
3/2/2001,0
This Python code:
import csv
import pandas as pd
f1 = csv.reader( open('x1.csv'))
f2 = csv.reader( open('x2.csv'))
# Read the first file into a dictionary.
inventory = {}
for row in f1:
if row[0] != 'date':
inventory[row[0]] = int(row[1])
# Now process each line in the detail list.
current = 0
rows = []
for row in f2:
if row[0] == 'date':
continue
if row[0] in inventory:
current = inventory[row[0]]
rows.append( (row[0],str(current)) )
current = int(row[1])
df = pd.DataFrame( rows, columns=['date','level'])
print(df)
produces this output:
date level
0 1/1/2001 100
1 1/2/2001 102
2 1/3/2001 105
3 1/4/2001 103
4 2/1/2001 101
5 2/2/2001 106
6 2/3/2001 109
7 3/1/2001 103
8 3/2/2001 103
CodePudding user response:
The exact logic is still unclear, but for the global logic of merging, you can use merge
on the month period:
df2['a'] = df2.merge(df1,
left_on=pd.to_datetime(df2['date'], dayfirst=False).dt.to_period('M'),
right_on=pd.to_datetime(df1['date'], dayfirst=False).dt.to_period('M'),
how='left', suffixes=('_', '')
)['a']
output:
date a
0 1/1/2001 102
1 1/2/2001 103
2 1/3/2001 98
3 1/4/2001 100
output of the intermediate:
df2.merge(df1,
left_on=pd.to_datetime(df2['date'], dayfirst=False).dt.to_period('M'),
right_on=pd.to_datetime(df1['date'], dayfirst=False).dt.to_period('M'),
how='left', suffixes=('_', '')
)['a']
0 100
1 100
2 100
3 100
Name: a, dtype: int64