Home > Back-end >  How to apply pivot to the data frame columns if there is specific pattern like numbers then letters?
How to apply pivot to the data frame columns if there is specific pattern like numbers then letters?

Time:12-27

I am trying to write a code inside Python3 to convert the first table into the expected table. Would you please help me to convert the one I have into the expected one at the bottom?

CURRENT ONE:

1104369592
MA01609Worcester
MA01604Worcester
MA01604Worcester
1104842379
MA01040Hampden
MA01040Hampden
1104896613
MA02745Bristol
MA02745Bristol

EXPECTED ONE:

1104469592 MA01609Worcester
1104469592 MA01609Worcester
1104469592 MA01609Worcester
1104842379 MA01040Hampden
1104842379 MA01040Hampden
1104896613 MA02745Bristol
1104896613 MA02745Bristol

I did not find any solution. I tried pivot, but I dont know how to do it.

CodePudding user response:

May not be the cleanest, but it could work:

import pandas as pd

current = pd.Series([
    "1104369592",
    "MA01609Worcester",
    "MA01604Worcester",
    "MA01604Worcester",
    "1104842379",
    "MA01040Hampden",
    "MA01040Hampden",
    "1104896613",
    "MA02745Bristol",
    "MA02745Bristol"
])

def get_subframe(x):
    return pd.DataFrame({
        "num": x.iloc[0],
        "val": x.iloc[1:]
    })

groups = s.str.isnumeric().cumsum()
out = s.groupby(groups).apply(get_subframe).reset_index(drop=True)

out:

          num               val
0  1104369592  MA01609Worcester
1  1104369592  MA01604Worcester
2  1104369592  MA01604Worcester
3  1104842379    MA01040Hampden
4  1104842379    MA01040Hampden
5  1104896613    MA02745Bristol
6  1104896613    MA02745Bristol

CodePudding user response:

Another method:

mask = df["Column1"].str.match(r"\d{9}")

df["Column2"] = df.loc[~mask, "Column1"]
df.loc[~mask, "Column1"] = np.nan
df["Column1"] = df["Column1"].ffill()
df = df.dropna()

print(df)

Prints:

      Column1           Column2
1  1104369592  MA01609Worcester
2  1104369592  MA01604Worcester
3  1104369592  MA01604Worcester
5  1104842379    MA01040Hampden
6  1104842379    MA01040Hampden
8  1104896613    MA02745Bristol
9  1104896613    MA02745Bristol

Input dataframe:

            Column1
0        1104369592
1  MA01609Worcester
2  MA01604Worcester
3  MA01604Worcester
4        1104842379
5    MA01040Hampden
6    MA01040Hampden
7        1104896613
8    MA02745Bristol
9    MA02745Bristol
  • Related