Home > Enterprise >  Extract data and add to new column based on value id
Extract data and add to new column based on value id

Time:08-02

I am trying to extract elevation data from my stations information dataframe and add it to my rides dataframe.

Take df1 and df2 for example:

df1 = pd.DataFrame(
    {
        "Ride ID": ["100", "101", "102", "103"],
        "StartStation ID": ["2", "3", "4", "1"],
        "Endstation ID": ["3", "1", "2", "4"],
    })

df2 = pd.DataFrame(
    {
        "Station ID": ["1", "2", "3", "4"],
        "Elevation": ["24", "13", "10", "20"],
    })

I want to extract the elevation per station (based on ID number) and add this data to the main dataset

So I end up with this:


Should I use a loop of write a function to do this? 
I was thinking about a for loop with if statement but I have not managed to make it work.

Thank you
df3 = pd.DataFrame(
    {
        "Ride ID": ["100", "101", "102", "103"],
        "StartStation ID": ["2", "3", "4", "1"],
        "StartStation Elevation": ["13", "10", "20", "24"],
        "Endstation ID": ["3", "1", "2", "4"],
        "Endstation Elevation": ["10", "24", "13", "20"],
    })

CodePudding user response:

simple merge with Station ID Column Try this,

pd.merge(df1, df2, left_on=['StartStation ID'], right_on=['Station ID'])

O/P:

  Ride ID StartStation ID Endstation ID Station ID Elevation
0     100               2             3          2        13
1     101               3             1          3        10
2     102               4             2          4        20
3     103               1             4          1        24

Note: Rearrange columns as your wish.

CodePudding user response:

I suggest that you use pandas functions instead of writing your own loop, it's more efficient this way.
What you want to do seems to be a merge: https://pandas.pydata.org/docs/reference/api/pandas.merge.html

You can merge on columns that have different names by using left_on and right_on.

df3 = pd.merge(left=df1, right=df2, left_on="StartStation ID", right_on="Station ID", how="inner")

You can change the how parameter depending on how you want to make the merge

CodePudding user response:

You can do:

df3 = df1.copy()
d = df2.set_index('Station ID')['Elevation']
df3['StartStation Elevation'] = df1['StartStation ID'].map(d)
df3['Endstation Elevation'] = df1['Endstation ID'].map(d)

print(df3):

  Ride ID StartStation ID Endstation ID StartStation Elevation  \
0     100               2             3                     13   
1     101               3             1                     10   
2     102               4             2                     20   
3     103               1             4                     24   

  Endstation Elevation  
0                   10  
1                   24  
2                   13  
3                   20  
  • Related