I have a tropical cyclone dataset like this: TC dataset
I want to group the dataset according to the column 'TC_name', i.e. one TC one group, but 'TC_name' may be identical for different years and the same TC may travel interannually, so groupby()
maybe doesn't work.
I also tried to loop line by line and find where the 'TC_name' changes, cut the dataset there to get a dataframe for each TC. However, this gives me over 1000 dataframes and I don't know how to represent this 2D information in a 3D formation.
I guess multi index
might work through assigning a high level index
to each TC, but I cannot reach it after trying many times.
An example is provided below for clarification: (I need seperate 3 tropical cyclones)
import pandas as pd
data = {'TC_name':['RITA','RITA','IDA','IDA','IDA','IDA','RITA','RITA','RITA'],
'Year':[2020,2020,2020,2020,2020,2021,2021,2021,2021]}
df = pd.DataFrame(data)
df
TC_name Year
0 RITA 2020
1 RITA 2020
2 IDA 2020
3 IDA 2020
4 IDA 2020
5 IDA 2021
6 RITA 2021
7 RITA 2021
8 RITA 2021
And it would be better to get this format, or any other efficient 3D data structure representation.
TC_name Year
1 1 RITA 2020
2 RITA 2020
2 1 IDA 2020
2 IDA 2020
3 IDA 2020
4 IDA 2021
3 1 RITA 2021
2 RITA 2021
3 RITA 2021
Thanks a lot for your help!
CodePudding user response:
Edit
Here another possibility. We reset the index to keep the original sorting:
df = df.reset_index()
Then we assign a new id within each TC_name
group:
df['id2'] = df.groupby('TC_name')['index'].transform(lambda x: pd.factorize(x)[0] 1)
Here the intermediate output:
index | TC_name | Year | id2 | |
---|---|---|---|---|
0 | 0 | RITA | 2020 | 1 |
1 | 1 | RITA | 2020 | 2 |
2 | 2 | IDA | 2020 | 1 |
3 | 3 | IDA | 2020 | 2 |
4 | 4 | IDA | 2020 | 3 |
5 | 5 | IDA | 2021 | 4 |
Then we assign a new id every time the TC_name changes:
df['id'] = (df["TC_name"] != df["TC_name"].shift(1)).cumsum()
And finally we drop the index
and set the multiindex:
df = df.drop('index', axis=1).set_index(['id', 'id2'])
Here the final output, which looks like yours:
TC_name | Year | ||
---|---|---|---|
1 | 1 | RITA | 2020 |
2 | RITA | 2020 | |
2 | 1 | IDA | 2020 |
2 | IDA | 2020 | |
3 | IDA | 2020 | |
4 | IDA | 2021 | |
3 | 3 | RITA | 2021 |
4 | RITA | 2021 | |
5 | RITA | 2021 |