I am trying to write a code inside Python3 to convert the first table into the expected table. Would you please help me to convert the one I have into the expected one at the bottom?
CURRENT ONE:
1104369592
MA01609Worcester
MA01604Worcester
MA01604Worcester
1104842379
MA01040Hampden
MA01040Hampden
1104896613
MA02745Bristol
MA02745Bristol
EXPECTED ONE:
1104469592 MA01609Worcester
1104469592 MA01609Worcester
1104469592 MA01609Worcester
1104842379 MA01040Hampden
1104842379 MA01040Hampden
1104896613 MA02745Bristol
1104896613 MA02745Bristol
I did not find any solution. I tried pivot, but I dont know how to do it.
CodePudding user response:
May not be the cleanest, but it could work:
import pandas as pd
current = pd.Series([
"1104369592",
"MA01609Worcester",
"MA01604Worcester",
"MA01604Worcester",
"1104842379",
"MA01040Hampden",
"MA01040Hampden",
"1104896613",
"MA02745Bristol",
"MA02745Bristol"
])
def get_subframe(x):
return pd.DataFrame({
"num": x.iloc[0],
"val": x.iloc[1:]
})
groups = s.str.isnumeric().cumsum()
out = s.groupby(groups).apply(get_subframe).reset_index(drop=True)
out:
num val
0 1104369592 MA01609Worcester
1 1104369592 MA01604Worcester
2 1104369592 MA01604Worcester
3 1104842379 MA01040Hampden
4 1104842379 MA01040Hampden
5 1104896613 MA02745Bristol
6 1104896613 MA02745Bristol
CodePudding user response:
Another method:
mask = df["Column1"].str.match(r"\d{9}")
df["Column2"] = df.loc[~mask, "Column1"]
df.loc[~mask, "Column1"] = np.nan
df["Column1"] = df["Column1"].ffill()
df = df.dropna()
print(df)
Prints:
Column1 Column2
1 1104369592 MA01609Worcester
2 1104369592 MA01604Worcester
3 1104369592 MA01604Worcester
5 1104842379 MA01040Hampden
6 1104842379 MA01040Hampden
8 1104896613 MA02745Bristol
9 1104896613 MA02745Bristol
Input dataframe:
Column1
0 1104369592
1 MA01609Worcester
2 MA01604Worcester
3 MA01604Worcester
4 1104842379
5 MA01040Hampden
6 MA01040Hampden
7 1104896613
8 MA02745Bristol
9 MA02745Bristol