Home > Net >  Sorting Data from a Data Frame to a specific format using Pandas Python
Sorting Data from a Data Frame to a specific format using Pandas Python

Time:10-09

I have a got below mentioned Dataframe extracted from an unstructured txt. file. I need it sorted as shown below mentioned example format in the exact order. Please also note that when capturing the date only the date next to(or following) TDY should be captured.

Please note this is a simplified version of the Data frame and the data continues for 3 months.

Dummy initial txt file https://drive.google.com/drive/u/0/folders/1CRzCMRo_TV2B-6ZDPA0kb7S7DGuDg7IY

Dataframe

0 5 36 43 79
1 TDY 01JUN2022 1.2 TOM 1.2
2 SPT 05JUN2022 1.4 1WK 1.4
3 2WK 01JUN2022 1.5 1MO 1.6
4 2MO 06JUN2022 1.7 3MO 1.7
5 6MO 01JUN2022 1.4 9MO 1.8
6 1YR 01JUN2022 1.7 2YR 1.3
7 3YR 01JUN2022 1.2 4YR 1.2
8 5YR 01JUN2022 1.4 6YR 1.4
9 TDY 02JUN2022 1.2 TOM 1.2
10 SPT 02JUN2022 1.4 1WK 1.4
11 2WK 07JUN2022 1.5 1MO 1.6
12 2MO 02JUN2022 1.7 3MO 1.7
13 6MO 02JUN2022 1.4 9MO 1.8
14 1YR 02JUN2022 1.7 2YR 1.3
15 3YR 02JUN2022 1.2 4YR 1.2
16 5YR 07JUN2022 1.4 6YR 1.4
17 TDY 03JUN2022 1.2 TOM 1.2
18 SPT 03JUN2022 1.4 1WK 1.4
19 2WK 03JUN2022 1.5 1MO 1.6
20 2MO 09JUN2022 1.7 3MO 1.7
21 6MO 03JUN2022 1.4 9MO 1.8
22 1YR 02JUN2022 1.7 2YR 1.3
23 3YR 03JUN2022 1.2 4YR 1.2
24 5YR 03JUN2022 1.4 6YR 1.4

Output required

Date TDY TOM SPT 1WK 2WK 1MO 2MO 3MO 6MO 9MO 1YR 2YR 3YR 4YR 5YR 6YR
01JUN2022 1.2 1.2 1.4 1.4 1.5 1.6 1.7 1.7 1.4 1.8 1.7 1.3 1.2 1.2 1.4 1.4
02JUN2022 1.2 1.2 1.4 1.4 1.5 1.6 1.7 1.7 1.4 1.8 1.7 1.3 1.2 1.2 1.4 1.4
03JUN2022 1.2 1.2 1.4 1.4 1.5 1.6 1.7 1.7 1.4 1.8 1.7 1.3 1.2 1.2 1.4 1.4

CodePudding user response:

Not sure if you wanted to keep that date format or not?

dates_df = pd.to_datetime(df.loc[df["0"] == "TDY"]["5"]).reset_index(drop=True)

column_order = ["TDY", "TOM", "SPT", "1WK", "2WK", "1MO", "2MO", "3MO",
                "6MO", "9MO", "1YR", "2YR", "3YR", "4YR", "5YR", "6YR"]

header_df = df[["0", "43"]].melt(value_name="headers")
value_df = df[["36", "79"]].melt(value_name="values")
melted_df = pd.concat([header_df, value_df], axis=1)[["headers", "values"]]

grouped_mapping = melted_df.groupby("headers")["values"].apply(list).to_dict()
grouped_df = pd.DataFrame(grouped_mapping).reindex(columns=column_order)

final_df = pd.merge(dates_df, grouped_df, left_index=True, right_index=True).rename(columns={"5": "Dates"})
print(final_df)

       Dates  TDY  TOM  SPT  1WK  2WK  1MO  ...  9MO  1YR  2YR  3YR  4YR  5YR  6YR
0 2022-06-01  1.2  1.2  1.4  1.4  1.5  1.6  ...  1.8  1.7  1.3  1.2  1.2  1.4  1.4
1 2022-06-02  1.2  1.2  1.4  1.4  1.5  1.6  ...  1.8  1.7  1.3  1.2  1.2  1.4  1.4
2 2022-06-03  1.2  1.2  1.4  1.4  1.5  1.6  ...  1.8  1.7  1.3  1.2  1.2  1.4  1.4
  • Related