I have the following dataset:
Name Year Date Value
x year1 date1 v1
x year1 date2 v2
x year1 date3 v3
x year2 date1 v4
x year2 date2 v5
x year2 date3 v6
z year1 date1 v7
z year1 date2 v8
z year1 date3 v9
z year2 date1 v10
z year2 date2 v11
z year2 date3 v12
y year1 date1 v13
y year1 date2 v14
y year1 date3 v15
y year2 date1 v16
y year2 date2 v17
y year2 date3 v18
I would like the following dataset output:
Name Year Date Value
x year1 date1 v1
x year2 date1 v4
x year1 date2 v2
x year2 date2 v5
x year1 date3 v3
x year2 date3 v6
z year1 date1 v7
z year2 date1 v10
z year1 date2 v8
z year2 date2 v11
z year1 date3 v9
z year2 date3 v12
y year1 date1 v13
y year2 date1 v16
y year1 date2 v14
y year2 date2 v17
y year1 date3 v15
y year2 date3 v18
I tried the following code but my 'Name' column is being sorted to 'x,y,z' as well. I want the 'Name' column order to stay as 'x,z,y':
df.sort_values(['Name', 'Date'])
CodePudding user response:
IIUC, let's create a new categorical dtype with set order:
namedtype = pd.CategoricalDtype([*'xzy'], ordered=True)
df['Name'] = df['Name'].astype(namedtype)
df.sort_values(['Name', 'Date', 'Year'])
Output:
Name Year Date Value
0 x year1 date1 v1
3 x year2 date1 v4
1 x year1 date2 v2
4 x year2 date2 v5
2 x year1 date3 v3
5 x year2 date3 v6
6 z year1 date1 v7
9 z year2 date1 v10
7 z year1 date2 v8
10 z year2 date2 v11
8 z year1 date3 v9
11 z year2 date3 v12
12 y year1 date1 v13
15 y year2 date1 v16
13 y year1 date2 v14
16 y year2 date2 v17
14 y year1 date3 v15
17 y year2 date3 v18
CodePudding user response:
datar
is a reimagination of pandas APIs.
It's very easy to implement using datar:
>>> from datar.all import f, tribble, arrange, match
>>> df = tribble(
... f.Name, f.Year, f.Date, f.Value,
... "x", "year1", "date1", "v1",
... "x", "year1", "date2", "v2",
... "x", "year1", "date3", "v3",
... "x", "year2", "date1", "v4",
... "x", "year2", "date2", "v5",
... "x", "year2", "date3", "v6",
... "z", "year1", "date1", "v7",
... "z", "year1", "date2", "v8",
... "z", "year1", "date3", "v9",
... "z", "year2", "date1", "v10",
... "z", "year2", "date2", "v11",
... "z", "year2", "date3", "v12",
... "y", "year1", "date1", "v13",
... "y", "year1", "date2", "v14",
... "y", "year1", "date3", "v15",
... "y", "year2", "date1", "v16",
... "y", "year2", "date2", "v17",
... "y", "year2", "date3", "v18",
... )
>>> df >> arrange(match(f.Name, f.Name), f.Date, f.Year)
Name Year Date Value
<object> <object> <object> <object>
0 x year1 date1 v1
3 x year2 date1 v4
1 x year1 date2 v2
4 x year2 date2 v5
2 x year1 date3 v3
5 x year2 date3 v6
6 z year1 date1 v7
9 z year2 date1 v10
7 z year1 date2 v8
10 z year2 date2 v11
8 z year1 date3 v9
11 z year2 date3 v12
12 y year1 date1 v13
15 y year2 date1 v16
13 y year1 date2 v14
16 y year2 date2 v17
14 y year1 date3 v15
17 y year2 date3 v18