Pandas: merging column entries into a single row-CodePudding

I have the following dataset, where the 'transaction details' column stores all the values for every entry, but in different rows, when it should be storing every entry in the same row. Is there any way this could be done, so that if the first value of 'Transaction Details' is called, all the relevant lines are printed out, instead of just the first line?

Perhaps

CodePudding user response：

With df your dataframe you could do something like this:

import numpy as np
import pandas as pd

df_new = df[~df.Date.isna()].reset_index(drop=True)
df_new["Transaction Details"] = (
    df["Transaction Details"]
        .groupby(np.where(df.Date.isna(), 0, 1).cumsum())
        .apply(lambda col: ", ".join(str(item) for item in col))
        .reset_index(drop=True)
)

Just as an illustration: Result - df_new - for the following dataframe

df = pd.DataFrame(
    {
        "Date": [1, np.NaN, np.NaN, 2, np.NaN, np.NaN, np.NaN],
        "Transaction Details": ["a", "b", "c", "d", "e", "f", "g"]
    }
)

   Date Transaction Details
0   1.0                   a
1   NaN                   b
2   NaN                   c
3   2.0                   d
4   NaN                   e
5   NaN                   f
6   NaN                   g

   Date Transaction Details
0   1.0             a, b, c
1   2.0          d, e, f, g

If df["Transaction Details"] only contains strings, then you can replace

        .apply(lambda col: ", ".join(str(item) for item in col))

with .apply(", ".join).

CodePudding user response：

Setting up

Let's create some sample data first.

df = pd.DataFrame({
    "Date": ["01-Apr", np.nan, np.nan, "02-Apr", np.nan],
    "Details": ["Payment", "Supplier Payment", "1000", "Payment", "SGD 1658.5"]
})

     Date           Details
0  01-Apr           Payment
1     NaN  Supplier Payment
2     NaN              1000
3  02-Apr           Payment
4     NaN        SGD 1658.5

Merging without separators

If you want to merge the rows without any separators in between, you can try this.

df["Date"] = df["Date"].ffill()
df = df.fillna("").groupby("Date", as_index=False).sum()

Output

This yields the following result.

     Date                      Details
0  01-Apr  PaymentSupplier Payment1000
1  02-Apr            PaymentSGD 1658.5

Merging with separators

Things will get more complicated if you want to have some separators between the merged values.

sep = ", "
df["Date"] = df["Date"].ffill()
df["Details"]  = sep
df = df.fillna("").groupby("Date", as_index=False).sum()
df["Details"] = df["Details"].str[:-1 * len(sep)]

Output

This gives the following result.

     Date                          Details
0  01-Apr  Payment, Supplier Payment, 1000
1  02-Apr              Payment, SGD 1658.5