I have a text file where every other row either begins with "A" or "B" like this
A810 WE WILDWOOD DR
B20220901BROOKE
A6223 AMHERST BAY
B20221001SARAI
How can I read the text file and create a two column pandas dataframe where the line beginning with "A" is a column and likewise for the "B", on a single row. Like this
|A |B |
|:------------------|:--------------|
|A810 WE WILDWOOD DR|B20220901BROOKE|
|:------------------|---------------|
|A6223 AMHERST BAY |B20221001SARAI |
|:------------------|---------------|
CodePudding user response:
You can approach this by using pandas.DataFrame.shift
and pandas.DataFrame.join
:
from io import StringIO
import pandas as pd
s = """A810 WE WILDWOOD DR
B20220901BROOKE
A6223 AMHERST BAY
B20221001SARAI
"""
df = pd.read_csv(StringIO(s), header=None, names=["A"])
#in your case, df = pd.read_csv("path_of_your_txtfile", header=None, names=["A"])
out = (
df
.join(df.shift(-1).rename(columns= {"A": "B"}))
.iloc[::2]
.reset_index(drop=True)
)
# Output :
print(out)
A B
0 A810 WE WILDWOOD DR B20220901BROOKE
1 A6223 AMHERST BAY B20221001SARAI
CodePudding user response:
What about using a pivot
?
col = df[0].str.extract('(.)', expand=False)
out = (df
.assign(col=col, idx=df.groupby(col).cumcount())
.pivot(index='idx', columns='col', values=0)
.rename_axis(index=None, columns=None)
)
Output:
A B
0 A810 WE WILDWOOD DR B20220901BROOKE
1 A6223 AMHERST BAY B20221001SARAI
CodePudding user response:
Another possible solution, which only works if the strings alternate, regularly, between A
and B
, as the OP states:
pd.DataFrame(df.values.reshape((-1, 2)), columns=list('AB'))
Output:
A B
0 A810 WE WILDWOOD DR B20220901BROOKE
1 A6223 AMHERST BAY B20221001SARAI