pretty new to python. I'm currently trying to modify a dataframe that looks like this
stations level_1 distance
6 1 0.870363
6 2 1.851696
6 3 2.019918
8 1 1.261097
8 2 1.16174
8 3 1.261097
25 1 1.14675
25 2 1.83467
25 3 1.83602
as you can see, each station has 3 types of level and corresponding distance. What I want to achieve is, delete some number of rows according to both 'stations' and 'level_1'.
For example for station 6, I want only 2 of level_1 to stay, for station 8, I want 2 and 3 of level_1 to stay and for station 25, I only want 3 to stay.
Consequently I want my dataframe to look like this
stations level_1 distance
6 2 1.851696
8 2 1.16174
8 3 1.261097
25 3 1.83602
This is not my real data though, I have 137 stations and each station has 954 levels so I need a pythonic way to achieve this. So far, I couldn't come up with something. Thanks in advance!
Edit: The logic to delete the rows is different for each station. I have created another dataframe for that. Looks like this: 1
CodePudding user response:
There are multiple ways to do this:
df.loc[
# Keep station 6 and level 2
((df['stations'] == 6) & (df['level_1'] == 2)) |
# Keep station 8 and levels 2 and 3
((df['stations'] == 8) & (df['level_1'].isin([2, 3]))) |
# Keep station 25 and level 3
((df['stations'] == 25) & (df['level_1'] == 3))
]
Another way would be to create a spreadsheet containing the levels you'd like to keep along with the stations, something like:
station level
6 2
8 2
8 3
Then we can merge this dataframe with your one, and pandas will automatically inner join and only keep them:
# Read your spreadsheet:
new_table = pd.read_csv(r"path/to/your/sheet")
# Merge the data, this would keep only the columns found in both dataframes by default.
df = df.merge(new_table)
This uses pandas.DataFrame.merge
.