Home > Software engineering >  Pandas simple groupby and apply complains "Columns must be same length as key"
Pandas simple groupby and apply complains "Columns must be same length as key"

Time:02-15

Essentially I have a table of timestamps and some data and want to group by the same timestamps and change the timestamps on a grouping basis. I got something working with Interpolate seconds to milliseconds in dataset?

The solution seems to work fine for many rows but not for simple datasets and I can't figure out why. I've narrowed it down to a simple example below.

Data:

    t  val
    0  0.3
    0  0.2
    0  0.6
    0  0.4

Expected result:

    t  val
    1  0.3
    1  0.2
    1  0.6
    1  0.4

Code:

df = pd.DataFrame([[0, 0.3], [0, 0.2], [0, 0.6], [0, 0.4]], columns=["t", "val"])

# Group by timestamp and add  1 to each (just for demonstration)
df.t = df.groupby("t", group_keys=False).apply(lambda df: df.t   1)

This raises ValueError: Columns must be same length as key and I can't see what I'm doing wrong. Any help appreciated.

CodePudding user response:

If need output values to new column use GroupBy.transform with specify column after groupby for processing:

df.t = df.groupby('t')['t'].transform(lambda x: x   1)

Linked solution with np.linspace should be changed:

df.t = df.groupby('t')['t'].transform(lambda x: x   np.linspace(0, 1, len(x)))
print (df)
          t  val
0  0.000000  0.3
1  0.333333  0.2
2  0.666667  0.6
3  1.000000  0.4 

Or add counter by GroupBy.cumcount:

df.t  = df.groupby('t').cumcount()
  • Related