How to add data in Pandas Dataframe dynamically using Python?-CodePudding

I wish to extract the data from a txt file which is given below and store in to a pandas Dataframe that has 8 columns.

Lorem | Ipsum | is | simply | dummy
text | of | the | printing | and
typesetting | industry. | Lorem
more | recently | with | desktop | publishing | software | like | Aldus
Ipsum | has | been | the | industry's
standard | dummy | text | ever | since | the | 1500s
took | a | galley | of | type | and
scrambled | it | to | make | a | type | specimen | book
It | has | survived | not | only | five | centuries, | but
the | leap | into | electronic | typesetting
remaining | essentially | unchanged
It | was | popularised | in | the | 1960s | with | the
Lorem | Ipsum | passages, | and
PageMaker | including | versions | of | Lorem | Ipsum

Data on each line is separated by a pipe sign which refers to a data inside each cell of a row and column. My end goal is to have the data inserted in dataframe as per below format.

Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Column 8
-------------------------------------------------------------------------------------
Lorem     |    Ipsum   |    is      |    simply  | dummy     |
text      |      of    |    the     |   printing |  and      |
typesetting| industry. |    Lorem   |
more       | recently  |    with    |   desktop  | publishing|  software  |   like     | Aldus    |

and so on.....

I performed below but I am unable to add data dynamically into dataframe.

import pandas as pd

with open(file) as f:
   data = f.read().split('\n')

columns = ['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 5', 'Column 6', 'Column 7', 'Column 8']

df = pd.DataFrame(columns=columns)

for i in data:
    row = i.split(' | ')
    df = df.append({'Column 1': f'{row[0]}', 'Column 2': f'{row[1]}', 'Column 3': f'{row[2]}', 'Column 4': f'{row[3]}', 'Column 5': f'{row[4]}'}, ignore_index = True)

Above is manual way of adding row's cells to a dataframe, but I require the dynamic way i.e. how do append the rows so as whatever may be number of cells in row, it may get added.

CodePudding user response：

Use read_csv for read txt file:

names = [f"Column {i}" for i in range(1, 9)]
df = pd.read_csv(file, sep="\s \|\s ", names = names, header=None)
    
print (df)
       Column 1     Column 2     Column 3    Column 4     Column 5  Column 6  \
0         Lorem        Ipsum           is      simply        dummy      None   
1          text           of          the    printing          and      None   
2   typesetting    industry.        Lorem        None         None      None   
3          more     recently         with     desktop   publishing  software   
4         Ipsum          has         been         the   industry's      None   
5      standard        dummy         text        ever        since       the   
6          took            a       galley          of         type       and   
7     scrambled           it           to        make            a      type   
8            It          has     survived         not         only      five   
9           the         leap         into  electronic  typesetting      None   
10    remaining  essentially    unchanged        None         None      None   
11           It          was  popularised          in          the     1960s   
12        Lorem        Ipsum    passages,         and         None      None   
13    PageMaker    including     versions          of        Lorem     Ipsum   

      Column 7 Column 8  
0         None     None  
1         None     None  
2         None     None  
3         like    Aldus  
4         None     None  
5        1500s     None  
6         None     None  
7     specimen     book  
8   centuries,      but  
9         None     None  
10        None     None  
11        with      the  
12        None     None  
13        None     None

CodePudding user response：

import pandas as pd

text = """
Lorem | Ipsum | is | simply | dummy
text | of | the | printing | and
typesetting | industry. | Lorem
more | recently | with | desktop | publishing | software | like | Aldus
Ipsum | has | been | the | industry's
standard | dummy | text | ever | since | the | 1500s
took | a | galley | of | type | and
scrambled | it | to | make | a | type | specimen | book
It | has | survived | not | only | five | centuries, | but
the | leap | into | electronic | typesetting
remaining | essentially | unchanged
It | was | popularised | in | the | 1960s | with | the
Lorem | Ipsum | passages, | and
PageMaker | including | versions | of | Lorem | Ipsum
"""

# Create a 'jagged' list of words...
data = [i.split(" | ") for i in text.strip().split("\n")]

# ... which you can pass to pd.DataFrame directly:
columns = [f"Column {i}" for i in range(1, 9)]
df = pd.DataFrame(data, columns=columns)

df:

       Column 1     Column 2     Column 3    Column 4     Column 5  Column 6    Column 7 Column 8
0         Lorem        Ipsum           is      simply        dummy      None        None     None
1          text           of          the    printing          and      None        None     None
2   typesetting    industry.        Lorem        None         None      None        None     None
3          more     recently         with     desktop   publishing  software        like    Aldus
4         Ipsum          has         been         the   industry's      None        None     None
5      standard        dummy         text        ever        since       the       1500s     None
6          took            a       galley          of         type       and        None     None
7     scrambled           it           to        make            a      type    specimen     book
8            It          has     survived         not         only      five  centuries,      but
9           the         leap         into  electronic  typesetting      None        None     None
10    remaining  essentially    unchanged        None         None      None        None     None
11           It          was  popularised          in          the     1960s        with      the
12        Lorem        Ipsum    passages,         and         None      None        None     None
13    PageMaker    including     versions          of        Lorem     Ipsum        None     None

CodePudding user response：

You can do it by creating a series for each line and then creating the dataframe by concatenating those series.

import pandas as pd

with open(file) as f:
   data = f.read().split('\n')

lines = []

for i in data:
    row = i.split(' | ')
    lines.append(pd.Series(row))

df = pd.concat(lines, axis=1).T

You will dynamically get the right number of columns. The columns will be named just 0, 1, 2... but if you need to rename them to Column 1, Column 2... you can easily do it via:

df = df.rename(columns={c: f"Column {c}" for c in df.columns})

CodePudding user response：

Are you trying to append 5 columns to a dataframe with 8 columns, right? Try to read this Append Dataframe with Different Number of Columns. And also check this documentation Ways to Merge Data on Pandas.

Probably it's enought to solve this problem