How to create two columns of a dataframe from separate lines in a text file-CodePudding

I have a text file where every other row either begins with "A" or "B" like this

A810 WE WILDWOOD DR
B20220901BROOKE
A6223 AMHERST BAY
B20221001SARAI

How can I read the text file and create a two column pandas dataframe where the line beginning with "A" is a column and likewise for the "B", on a single row. Like this

|A                  |B              |
|:------------------|:--------------|
|A810 WE WILDWOOD DR|B20220901BROOKE|
|:------------------|---------------|
|A6223 AMHERST BAY  |B20221001SARAI |
|:------------------|---------------|

CodePudding user response：

You can approach this by using pandas.DataFrame.shift and pandas.DataFrame.join :

from io import StringIO    
import pandas as pd

s = """A810 WE WILDWOOD DR
B20220901BROOKE
A6223 AMHERST BAY
B20221001SARAI
"""

df = pd.read_csv(StringIO(s), header=None, names=["A"])
#in your case, df = pd.read_csv("path_of_your_txtfile", header=None, names=["A"])

out = (
            df
              .join(df.shift(-1).rename(columns= {"A": "B"}))
              .iloc[::2]
              .reset_index(drop=True)
       )

# Output :

print(out)
                     A                B
0  A810 WE WILDWOOD DR  B20220901BROOKE
1    A6223 AMHERST BAY   B20221001SARAI

CodePudding user response：

What about using a pivot?

col = df[0].str.extract('(.)', expand=False)

out = (df
   .assign(col=col, idx=df.groupby(col).cumcount())
   .pivot(index='idx', columns='col', values=0)
   .rename_axis(index=None, columns=None)
)

Output:

                     A                B
0  A810 WE WILDWOOD DR  B20220901BROOKE
1    A6223 AMHERST BAY   B20221001SARAI

CodePudding user response：

Another possible solution, which only works if the strings alternate, regularly, between A and B, as the OP states:

pd.DataFrame(df.values.reshape((-1, 2)), columns=list('AB'))

Output:

                     A                B
0  A810 WE WILDWOOD DR  B20220901BROOKE
1    A6223 AMHERST BAY   B20221001SARAI