Home > Software engineering >  How to create a dataframe using pandas based on comparing index with values of multiple columns in o
How to create a dataframe using pandas based on comparing index with values of multiple columns in o

Time:12-14

I have two data source:

raw_data = {'site_394$line_2420$tag_144': {1670231589000: 7,
  1671231589000: 7,
  1672231589000: 9,
  1673231589000: 7},
 'site_395$line_2420$tag_154': {1670231589000: 9,
  1671231589000: 10,
  1672231589000: 25,
  1673231589000: 6}}

and

events_data=[
    {
      "tag":"site_394$line_2420$tag_144",
      "from_date": 1670231589000,
      "to_date": 1670232589000,
      "event_name": "Event One"
    },
    {
      "tag":"site_394$line_2420$tag_144",
      "from_date": 1671231589000,
      "to_date": 1671332589000,
      "event_name": "Event Two"
    },
    {
        "tag":"site_394$line_2420$tag_144",
      "from_date": 1671231589000,
      "to_date": 1671332589000,
      "event_name": "Event Two Update"
    },
    {
        "tag":"site_394$line_2420$tag_144",
      "from_date": 1670231589100,
      "to_date": 1670232589200,
      "event_name": "Event Three"
    },
     {
         "tag":"site_395$line_2420$tag_154",
      "from_date": 1670231589000,
      "to_date": 1670232589000,
      "event_name": "Event One"
    },
    {
        "tag":"site_395$line_2420$tag_154",
      "from_date": 1671231589000,
      "to_date": 1671332589000,
      "event_name": "Event Two"
    },
    {
      "tag":"site_395$line_2420$tag_154",
      "from_date": 1670231589100,
      "to_date": 1670232589200,
      "event_name": "Event Three"
    }
]

I would like to combine the two into a single dataframe as shown below. The logic is, for a column in raw_data, if the index of raw data falls between from_date and to_date in events_data, then event_name should be replaced in place of value of the respective column. One catch is, if there are multiple matches, then the value should be appended comma separated. If the value of the column in raw_data is integer,

Expected result:

site_394$line_2420$tag_144 site_395$line_2420$tag_154
1670231589000 Event One Event One
1671231589000 Event Two,Event Two update Event Two
1672231589000 9 25.0
1673231589000 7 6.0

Any help or hint is appreciated.

CodePudding user response:

Create dataframe from raw_data and than iterate over events_data and assign values when the condition is met

df = pd.DataFrame(raw_data)

for event in events_data:
    mask = df.index.to_series().between(event['from_date'], event['to_date'])
    if not (val := df[mask][event['tag']]).empty:
        val = val.iloc[0]
        if type(val) in [int, np.int64]:
            df.loc[mask, event['tag']] = event['event_name']
        else:
            df.loc[mask, event['tag']] = f"{val}, {event['event_name']}"

print(df)

Output:

                site_394$line_2420$tag_144 site_395$line_2420$tag_154
1670231589000                    Event One                  Event One
1671231589000  Event Two, Event Two Update                  Event Two
1672231589000                            9                         25
1673231589000                            7                          6

CodePudding user response:

import pandas as pd

raw_data = {'site_394$line_2420$tag_144': {1670231589000: 7,
  1671231589000: 7,
  1672231589000: 9,
  1673231589000: 7},
 'site_395$line_2420$tag_154': {1670231589000: 9,
  1671231589000: 10,
  1672231589000: 25,
  1673231589000: 6}}

events_data=[
    {
      "tag":"site_394$line_2420$tag_144",
      "from_date": 1670231589000,
      "to_date": 1670232589000,
      "event_name": "Event One"
    },
    {
      "tag":"site_394$line_2420$tag_144",
      "from_date": 1671231589000,
      "to_date": 1671332589000,
      "event_name": "Event Two"
    },
    {
        "tag":"site_394$line_2420$tag_144",
      "from_date": 1671231589000,
      "to_date": 1671332589000,
      "event_name": "Event Two Update"
    },
    {
        "tag":"site_394$line_2420$tag_144",
      "from_date": 1670231589100,
      "to_date": 1670232589200,
      "event_name": "Event Three"
    },
     {
         "tag":"site_395$line_2420$tag_154",
      "from_date": 1670231589000,
      "to_date": 1670232589000,
      "event_name": "Event One"
    },
    {
        "tag":"site_395$line_2420$tag_154",
      "from_date": 1671231589000,
      "to_date": 1671332589000,
      "event_name": "Event Two"
    },
    {
      "tag":"site_395$line_2420$tag_154",
      "from_date": 1670231589100,
      "to_date": 1670232589200,
      "event_name": "Event Three"
    }
]

raw_data_df = pd.DataFrame(raw_data)
events_data_df = pd.DataFrame(events_data)

for tag in raw_data_df.columns:
    for index, row in events_data_df.iterrows():
        if row['tag'] == tag:
            for i in raw_data_df[tag]:
                if i >= row['from_date'] and i <= row['to_date'] :
                    if type(raw_data_df[tag][i]) == int:
                        raw_data_df[tag][i] = raw_data_df[tag][i]   ','   row['event_name']

raw_data_df = pd.DataFrame(raw_data)
events_data_df = pd.DataFrame(events_data)

for tag in raw_data_df.columns:
    for index, row in events_data_df.iterrows():
        if row['tag'] == tag:
            for i in raw_data_df[tag]:
                if i >= row['from_date'] and i <= row['to_date'] :
                    if type(raw_data_df[tag][i]) == int:
                        raw_data_df[tag][i] = str(raw_data_df[tag][i])   ','   row['event_name']
                    else:
                        raw_data_df[tag][i] = raw_data_df[tag][i]   ','   row['event_name']

print(raw_data_df)

i hope this is what you want

  • Related