I have two data source:
raw_data = {'site_394$line_2420$tag_144': {1670231589000: 7,
1671231589000: 7,
1672231589000: 9,
1673231589000: 7},
'site_395$line_2420$tag_154': {1670231589000: 9,
1671231589000: 10,
1672231589000: 25,
1673231589000: 6}}
and
events_data=[
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1670231589000,
"to_date": 1670232589000,
"event_name": "Event One"
},
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1671231589000,
"to_date": 1671332589000,
"event_name": "Event Two"
},
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1671231589000,
"to_date": 1671332589000,
"event_name": "Event Two Update"
},
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1670231589100,
"to_date": 1670232589200,
"event_name": "Event Three"
},
{
"tag":"site_395$line_2420$tag_154",
"from_date": 1670231589000,
"to_date": 1670232589000,
"event_name": "Event One"
},
{
"tag":"site_395$line_2420$tag_154",
"from_date": 1671231589000,
"to_date": 1671332589000,
"event_name": "Event Two"
},
{
"tag":"site_395$line_2420$tag_154",
"from_date": 1670231589100,
"to_date": 1670232589200,
"event_name": "Event Three"
}
]
I would like to combine the two into a single dataframe as shown below. The logic is, for a column in raw_data
, if the index of raw data falls between from_date
and to_date
in events_data
, then event_name
should be replaced in place of value of the respective column. One catch is, if there are multiple matches, then the value should be appended comma separated. If the value of the column in raw_data
is integer,
Expected result:
site_394$line_2420$tag_144 | site_395$line_2420$tag_154 | |
---|---|---|
1670231589000 | Event One | Event One |
1671231589000 | Event Two,Event Two update | Event Two |
1672231589000 | 9 | 25.0 |
1673231589000 | 7 | 6.0 |
Any help or hint is appreciated.
CodePudding user response:
Create dataframe from raw_data
and than iterate over events_data
and assign values when the condition is met
df = pd.DataFrame(raw_data)
for event in events_data:
mask = df.index.to_series().between(event['from_date'], event['to_date'])
if not (val := df[mask][event['tag']]).empty:
val = val.iloc[0]
if type(val) in [int, np.int64]:
df.loc[mask, event['tag']] = event['event_name']
else:
df.loc[mask, event['tag']] = f"{val}, {event['event_name']}"
print(df)
Output:
site_394$line_2420$tag_144 site_395$line_2420$tag_154
1670231589000 Event One Event One
1671231589000 Event Two, Event Two Update Event Two
1672231589000 9 25
1673231589000 7 6
CodePudding user response:
import pandas as pd
raw_data = {'site_394$line_2420$tag_144': {1670231589000: 7,
1671231589000: 7,
1672231589000: 9,
1673231589000: 7},
'site_395$line_2420$tag_154': {1670231589000: 9,
1671231589000: 10,
1672231589000: 25,
1673231589000: 6}}
events_data=[
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1670231589000,
"to_date": 1670232589000,
"event_name": "Event One"
},
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1671231589000,
"to_date": 1671332589000,
"event_name": "Event Two"
},
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1671231589000,
"to_date": 1671332589000,
"event_name": "Event Two Update"
},
{
"tag":"site_394$line_2420$tag_144",
"from_date": 1670231589100,
"to_date": 1670232589200,
"event_name": "Event Three"
},
{
"tag":"site_395$line_2420$tag_154",
"from_date": 1670231589000,
"to_date": 1670232589000,
"event_name": "Event One"
},
{
"tag":"site_395$line_2420$tag_154",
"from_date": 1671231589000,
"to_date": 1671332589000,
"event_name": "Event Two"
},
{
"tag":"site_395$line_2420$tag_154",
"from_date": 1670231589100,
"to_date": 1670232589200,
"event_name": "Event Three"
}
]
raw_data_df = pd.DataFrame(raw_data)
events_data_df = pd.DataFrame(events_data)
for tag in raw_data_df.columns:
for index, row in events_data_df.iterrows():
if row['tag'] == tag:
for i in raw_data_df[tag]:
if i >= row['from_date'] and i <= row['to_date'] :
if type(raw_data_df[tag][i]) == int:
raw_data_df[tag][i] = raw_data_df[tag][i] ',' row['event_name']
raw_data_df = pd.DataFrame(raw_data)
events_data_df = pd.DataFrame(events_data)
for tag in raw_data_df.columns:
for index, row in events_data_df.iterrows():
if row['tag'] == tag:
for i in raw_data_df[tag]:
if i >= row['from_date'] and i <= row['to_date'] :
if type(raw_data_df[tag][i]) == int:
raw_data_df[tag][i] = str(raw_data_df[tag][i]) ',' row['event_name']
else:
raw_data_df[tag][i] = raw_data_df[tag][i] ',' row['event_name']
print(raw_data_df)
i hope this is what you want