Home > front end >  Insert values from a column in a DataFrame to another based on column values
Insert values from a column in a DataFrame to another based on column values

Time:11-09

I am wondering how I can add one column from a dataframe, to another based on the dates in both columns. I tried to simplify this as much as I can.

I have two dataframes one is filled with event types and the cost done for the event, the other being the observation from a nearby station giving how much rainfall occurred as well. But

Here is what the dataframes looks like:

d1 = {'ObservationDate': ['2021-09-11', '2021-06-24', '2020-09-23', '2015-10-09'],
        'EventType': ['Flood', 'Flood', 'Wind', 'Hail'],
      'Closures': ['Schools', 'Schools', 'None', 'Offices'],
      'Cost': [1000, 4000, 100, 8000]}
events = pd.DataFrame(data = d1)

display(events)

    ObservationDate EventType   Closures    Cost
0   2021-09-11       Flood      Schools     1000
1   2021-06-24       Flood      Schools     4000
2   2020-09-23       Wind       None         100
3   2015-10-09       Hail       Offices     8000

And,

d2 = {'ObservationDate': ['2021-09-11', '2021-06-24', '2015-10-09', '2018-07-16'], 
      'Rainfall': [45, 90, 32, 22]}
observation = pd.DataFrame(data = d2)

display(observation)

    ObservationDate Rainfall
0   2021-09-11         45
1   2021-06-24         90
2   2015-10-09         32
3   2018-07-16         22

What I then did was filtered the observation DataFrame to get only the relevant event dates (in my actual code I do have the dates converted to the Pandas DateTime type).

filtered = observation[observation.ObservationDate.dt.strftime('%y%m%d').isin(events.ObservationDate.dt.strftime('%y%m%d'))]

What I am trying to do is take the Dates in the filtered dataframe and then add the "Event Type" column from the Event Dataframe, but only for those specific dates.

The ideal result would look like:

    ObservationDate Rainfall    EventType
0   2021-09-11        45          Flood
1   2021-06-24        90          Flood
2   2015-10-09        32          Hail

Things I have tried but I can't seem to get it to work right is merging them, also trying to take the index of where the "isin" is true, as well as a few other mindless attempts.

I intuitively know how I would go about this, I would honestly just loop through each of the rows then find where the dates are equal to one another, then assign that value to the column. But I am trying to find a more "Python" way of doing this.

Any help is appreciated.

CodePudding user response:

use merge

observation.merge(events.iloc[:, :2])

CodePudding user response:

Following Clegane's proposal, using merge() in this case would require you to previously modify slightly the events df:

output = observation.merge(events[['ObservationDate','EventType']],how='left',on='ObservationDate').dropna()

Returning:

  ObservationDate  Rainfall EventType
0      2021-09-11        45     Flood
1      2021-06-24        90     Flood
2      2015-10-09        32      Hail

Besides using merge() as Clegane proposed, you can also consider using map():

observation['EventType'] = observation['ObservationDate'].map(events[['ObservationDate','EventType']].set_index('ObservationDate').to_dict()['EventType'])

Returning:

  ObservationDate  Rainfall EventType
0      2021-09-11        45     Flood
1      2021-06-24        90     Flood
2      2015-10-09        32      Hail
3      2018-07-16        22       NaN

You can then simply drop the Null row with the NaN value

  • Related