How can I delete some rows according to different numbers-CodePudding

pretty new to python. I'm currently trying to modify a dataframe that looks like this

 stations  level_1   distance
    6          1     0.870363
    6          2     1.851696
    6          3     2.019918
    8          1     1.261097
    8          2     1.16174
    8          3     1.261097
    25         1     1.14675
    25         2     1.83467
    25         3     1.83602

as you can see, each station has 3 types of level and corresponding distance. What I want to achieve is, delete some number of rows according to both 'stations' and 'level_1'.

For example for station 6, I want only 2 of level_1 to stay, for station 8, I want 2 and 3 of level_1 to stay and for station 25, I only want 3 to stay.

Consequently I want my dataframe to look like this

stations  level_1   distance
    6          2     1.851696
    8          2     1.16174
    8          3     1.261097
    25         3     1.83602

This is not my real data though, I have 137 stations and each station has 954 levels so I need a pythonic way to achieve this. So far, I couldn't come up with something. Thanks in advance!

Edit: The logic to delete the rows is different for each station. I have created another dataframe for that. Looks like this: 1

CodePudding user response：

There are multiple ways to do this:

pandas.DataFrame.loc:

df.loc[
    # Keep station 6 and level 2
    ((df['stations'] == 6) & (df['level_1'] == 2)) |
    # Keep station 8 and levels 2 and 3
    ((df['stations'] == 8) & (df['level_1'].isin([2, 3]))) |
    # Keep station 25 and level 3
    ((df['stations'] == 25) & (df['level_1'] == 3))
]

Another way would be to create a spreadsheet containing the levels you'd like to keep along with the stations, something like:

station    level
6          2
8          2
8          3

Then we can merge this dataframe with your one, and pandas will automatically inner join and only keep them:

# Read your spreadsheet:
new_table = pd.read_csv(r"path/to/your/sheet")

# Merge the data, this would keep only the columns found in both dataframes by default.
df = df.merge(new_table)

This uses pandas.DataFrame.merge.