Home > Back-end >  Python Dataframes merge multi match
Python Dataframes merge multi match

Time:11-28

I'm new with Dataframe. I would like to kwon how (if possible) can I merge 2 Dataframes with multiple match For example

[df1]

date                  ZipCode Weather
2022-11-25 00:00:00   123456   34
2022-11-25 00:00:15   123456   35
2022-11-25 00:00:30   123456   36
[df2]
date                  ZipCode   host
2022-11-25 00:00:00   123456   host1
2022-11-25 00:00:00   123456   host2
2022-11-25 00:00:00   123456   host3
2022-11-25 00:00:15   123456   host1
2022-11-25 00:00:30   123456   host2
2022-11-25 00:00:30   123456   host3

Expected results:

date                  ZipCode   host   Weather
2022-11-25 00:00:00   123456   host1   34
2022-11-25 00:00:00   123456   host2   34
2022-11-25 00:00:00   123456   host3   34
2022-11-25 00:00:15   123456   host1   35
2022-11-25 00:00:30   123456   host2   36
2022-11-25 00:00:30   123456   host3   36

My objetive is assign weather measures to each host. I have weather measurements every 15 minutes for one ZipCode (One line) By the other hand, I have several host KPIs for one time and one ZipCode (multiples lines)

Can I perfomr this activity with Dataframes?

Thanks in advance!

CodePudding user response:

You could use the join function in pandas which joins one dataframe's index to the index of the other. Try something like

import pandas as pd

data1 = \
[['2022-11-25 00:00:00',  123456, 34],
['2022-11-25 00:00:15',  123456, 35],
['2022-11-25 00:00:30',  123456, 36]]

columns1 =['date', 'ZipCode', 'Weather']

data2 = \
[['2022-11-25 00:00:00',  123456, 'host1'],
['2022-11-25 00:00:00',  123456, 'host2'],
['2022-11-25 00:00:00',  123456, 'host3'],
['2022-11-25 00:00:15',  123456, 'host1'],
['2022-11-25 00:00:30',  123456, 'host2'],
['2022-11-25 00:00:30',  123456, 'host3']]

columns2 =['date', 'ZipCode', 'host']

df1 = pd.DataFrame(data=data1, columns=columns1)
df1.date = pd.to_datetime(df1.date)
df1.set_index('date', inplace=True)
df2 = pd.DataFrame(data=data2, columns=columns2)
df2.date = pd.to_datetime(df2.date)
df2.set_index('date', inplace=True)
df3 = df1.join(df2['host'], on='date')
df3

CodePudding user response:

We do that by using merge and setting the argument on to ['date', 'ZipCode']:

new_df = pd.merge(df2, df1, on=['date', 'ZipCode'])

Output

>>> new_df
...      date           ZipCode host    Weather
0   2022-11-25 00:00:00 123456  host1   34
1   2022-11-25 00:00:00 123456  host2   34
2   2022-11-25 00:00:00 123456  host3   34
3   2022-11-25 00:00:15 123456  host1   35
4   2022-11-25 00:00:30 123456  host2   36
5   2022-11-25 00:00:30 123456  host3   36
  • Related