Snow fly naked 720 degrees, a pandas take a column data, error and replace Spaces and line breaks-CodePudding

The import codecs
The import OS
The import regex
The import pandas

# pandas read before cleaning the text data of single quotes

# pandas processing data
Def pandas_fomat (filepath) :
Print (' began to clean the data============')

Read_table=pandas. Read_table (filepath, engine='python', sep='|', the header=None, error_bad_lines=False)
Column 73 # processing of data, each line multiple Spaces in a row, and there is a line breaks
Read_table=read_table [[73]] [[73]]. Apply (lambda x: fix_lines (x), axis=1)
# is above the lambda and fog, request a great god help
,1,3,73 read_table [[0]]. To_csv (' AppendTxt. TXT 'mode=' a ', the header=True, the index=None)

Print ("============clean data finish ')

Def fix_lines (x) : # processing the data in the list, update to the SQL database
Result='
X="'. Join (x.s plit ())
# x=x.r eplace (', ', ')
Result=x.s trip ()
Return the result

# traversal folder
Def walkFile (filespath) :
For root, dirs, files in OS. Walk (filespath) :

# root said that the current is on a visit to the folder path
# dirs said subdirectory under the folder name list
# files representing the file folder list

# traverse the file
For f in files:

Fullpath=OS. Path. Join (root, f)
Print (fullpath)
Pandas_fomat (fullpath)

# iterate through all the folder
For d in dirs:
Print (OS) path) join (root, d))

Def the main () :
Current_dir=OS. Path. Join (OS) path) abspath (OS) path) dirname (__file__)), 'TextOut \ \')
# print (current_dir)
# print (' yishagndizhi)
WalkFile (current_dir)

If __name__=="__main__ ':
The main ()

The data format:
1 2 2, 3, 5
6 6 4 f 4 4 s er
this is a line of data, in the end I have to deal with to become
1,2,2,3,5,6,6,4, f, 4, 4 s, er

CodePudding user response:

1 2 2, 3, 5
6 6 4 f 4 4 s er
Are these 73 column content is?

CodePudding user response:

reference 1st floor chuifengde response:

1 2 2, 3, 5
6 6 4 f 4 4 s er
Are these 73 column content is?

Is this line is one of the 73 class capacity,
Not the format of the line is always the same, in fact, I just need to take the reciprocal of the second comma after segmentation of content; Is the s

CodePudding user response:

 def fix_lines (x) : 
C=x.r eplace (" \ n ", ") 
While c.f ind (" ") & gt; 0: 
C=c.r eplace (" ", "") 
C=c.s. trip (). The replace (" ", ', ') 
Return c

CodePudding user response:

 # if as long as the penultimate: 
Def fix_lines (x) : 
C=x.r eplace (" \ n ", ") 
While c.f ind (" ") & gt; 0: 
C=c.r eplace (" ", "") 
C=c.s. trip (). The replace (" ", ', ') 
C.s. plit return (", ") [2] the if c.f ind (", ") & gt; The else 0 c 

# - call: 
Df [73] fillna (" "). The apply (fix_lines)

CodePudding user response:

The return object. The.__getattribute__ (self, name)
AttributeError: 'Series' object has no attribute' find '

Suggest this mistake, everyone I have a baidu cloud links everybody to help me see bai '
https://pan.baidu.com/s/1ScKuEfyhFmILpp8FDQ9KDQ
The extracted code: xfou

10|000010|B|可口可乐330ml4 |Coke 330ml4 |组(Pcs) |6| 330.00|ml(毫升) |4| |G|M|0|0|N|6| |20200409|C|N| | | |正品 |上海 | | |.00|9999999.99|.00|201107| | |0| |.00|0| |315|S|4957|YVON|6|可口可乐330ml4 |Y| |可口可乐330ml4 |1|0|360|D | | |585958|67|P |Y|Y|Y|N|N|N|N| | |N|N|N|6|1| P|99| 7 COCACOLA COCACOLA 可口可乐可口可乐L2 INV99901L20000 10000810 P 99 0 1030307010000000000

Above is a few
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
This data in accordance with the | after break up, I need the first 1/2/4/73 this four columns of data, including column 73 data, I just need the Coca-Cola these four words, namely [L2] Coca-Cola in front of the four subsystems

CodePudding user response:

 import pandas as pd 

Def fix_lines (x) : 
C=x.r eplace (" \ n ", ") 
While "" in c: 
C=c.r eplace (" ", ") 
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull