Home > Net >  Can you group multiple rows all into one row by column value with Python using pandas?
Can you group multiple rows all into one row by column value with Python using pandas?

Time:06-18

How do I change this:

Date URL Description Category
2022-06-17 14:24:52 /XYBkLO public A
2022-06-17 14:24:52 /XYBkLO public B
2022-06-17 14:24:52 /XYBkLO public C
2022-06-17 14:25:05 /ZWrTVu public A
2022-06-17 14:25:05 /ZWrTVu public B
2022-06-17 14:25:05 /ZWrTVu public C

To this:

Date URL Description Category Date URL Description Category Date URL Description Category
2022-06-17 14:24:52 /XYBkLO public A 2022-06-17 14:24:52 /XYBkLO public B 2022-06-17 14:24:52 /XYBkLO public C
2022-06-17 14:25:05 /ZWrTVu public A 2022-06-17 14:25:05 /ZWrTVu public B 2022-06-17 14:25:05 /ZWrTVu public C

I would like to keep everything with the same URL on one row, but I don't know how I could implement this using pandas. Is there perhaps another way or another library I should use? Could really use some help

CodePudding user response:

You can try:

from itertools import cycle, count, islice
from collections import defaultdict


def fn(x):
    d = defaultdict(lambda: count(1))
    names = cycle(x.columns)
    vals = x.values.ravel()

    return pd.DataFrame(
        [vals],
        columns=[f"{n}.{next(d[n])}" for n in islice(names, len(vals))],
    )


x = df.groupby("URL").apply(fn).reset_index(drop=True)
print(x)

Prints:

                Date.1    URL.1 Description.1 Category.1               Date.2    URL.2 Description.2 Category.2               Date.3    URL.3 Description.3 Category.3
0  2022-06-17 14:24:52  /XYBkLO        public          A  2022-06-17 14:24:52  /XYBkLO        public          B  2022-06-17 14:24:52  /XYBkLO        public          C
1  2022-06-17 14:25:05  /ZWrTVu        public          A  2022-06-17 14:25:05  /ZWrTVu        public          B  2022-06-17 14:25:05  /ZWrTVu        public          C

CodePudding user response:

You can look into pandas transpose function. transpose

Not 100% sure it will fit your use case but it seems like a good place to start.

CodePudding user response:

Here is another way:

(pd.concat(
    [df.set_index('URL',drop=False) for _,df in df.groupby('Category')]
    ,axis=1)
.reset_index(drop=True))

Output:

    Date             URL    Description Category    Date             URL    Description Category  Date               URL    Description Category
0   6/17/2022 14:24 /XYBkLO public       A          6/17/2022 14:24 /XYBkLO public       B        6/17/2022 14:24   /XYBkLO public      C
1   6/17/2022 14:25 /ZWrTVu public       A          6/17/2022 14:25 /ZWrTVu public       B        6/17/2022 14:25   /ZWrTVu public      C
  • Related