The python for loop 1 million data processing-CodePudding

How to optimize the for loop, when the length of the index more than 100000 processing speed becomes slow, how to optimize this kind of circumstance, bosses teach
For the ind in the range (len (index)) :
Temp=df [df index==index [ind]]
Try:
Dsflux. Append (temp. Sort_values (by='.vn ', axis=0, ascending=False) [2-0] [' dsflux] values. The scheme ())
TEM. Append (temp. Sort_values (by='.vn ', axis=0, ascending=False) [4-0] [' TEM] values. The scheme ())
HUM. Append (temp. Sort_values (by='.vn ', axis=0, ascending=False) [4-0] [' HUM '] values. The scheme ())
WIND. Append (temp. Sort_values (by='.vn ', axis=0, ascending=False) [4-0] [' WIND '] values. The scheme ())
Except:
Dsflux. Append (np. Nan)
TEM. Append (np. Nan)
HUM. Append (np. Nan)
WIND. Append (np. Nan)
Try:
ALBEDO. Append (temp. Sort_values (by='.vn ', axis=0, ascending=False) [2-0] [r]. 'ALBEDO values. The scheme ())
Except:
ALBEDO. Append (np. Nan)
Try:
WTYPE. Append (temp. Sort_values (by='.vn ', axis=0, ascending=False) [1-0] [' WTYPE] values. The scheme ())
Except:
WTYPE. Append (np. Nan)

CodePudding user response:

First to eliminate unwanted elements, remove a few

CodePudding user response:

The operation speed is one over two hundred of the c + + Python!
Let's tell you a good idea to learn C

Tip: to get rid of some useless

CodePudding user response:

Ft, radi, tem,.vn, hum, wind, albedo, wtype=[], [], [], [], [], [], [], []
If len (data) & gt; 0:
For I in range (len (data) :
Ft. Append (data [I] [0])
Radi. Append (data [I] [1])
Tem. Append (data [I] [2])
.vn. Append (data [I] [3])
Hum. Append (data [I] [4])
Wind. Append (data [I] [5])
Albedo. Append (data [I] [6])
If the data [I] [7]==6:
If the initialize=='YES' :
Wtype. Append (1)
The else:
Wtype. Append (np. Nan)
The else:
Wtype. Append (wtype_dict (data [I] [7]) [0])

Df=pd DataFrame (
{' dsflux: radi, 'tem: tem,'.vn:.vn, 'hum, hum,' wind: wind, 'albedo: albedo,' wtype: wtype},
The index=ft)
Df=df fillna (method='ffill')
Df=df fillna (method='bfill')
Df=df [~ df [' wtype] the isin ([0.539])]
Index=df. Groupby (df) index) mean (). The index
Dsflux, TEM, HUM, WIND, ALBEDO, WTYPE=[], [], [], [], [], []
For the ind in the range (len (index)) :
Temp=df [df index==index [ind]]
Temp_data=https://bbs.csdn.net/topics/temp.sort_values (by='.vn, axis=0, ascending=False)
Try:
Dsflux. Append (temp_data [2-0] [' dsflux] values. The scheme ())
TEM. Append (temp_data [4-0] [' TEM] values. The scheme ())
HUM. Append (temp_data [4-0] [' HUM '] values. The scheme ())
WIND. Append (temp_data [4-0] [' WIND '] values. The scheme ())
Except:
Dsflux. Append (np. Nan)
TEM. Append (np. Nan)
HUM. Append (np. Nan)
WIND. Append (np. Nan)
Try:
ALBEDO. Append (temp_data [2-0] [r]. 'ALBEDO values. The scheme ())
Except:
ALBEDO. Append (np. Nan)
Try:
WTYPE. Append (temp_data [1-0] [' WTYPE] values. The scheme ())
Except:
WTYPE. Append (np. Nan)
Odc={' dsflux: dsflux, 'tem: tem,'.vn: df. Groupby (df) index) ['.vn]. Max (). The values, 'hum' : hum, 'wind: wind,
'albedo: albedo,' wtype: wtype}
Outdf=pd. DataFrame (odc, index=index). Dropna ()
The else:
Outdf=pd. DataFrame (columns=[' dsflux ', 'tem', '.vn ', 'hum', 'wind' and 'albedo', 'wtype'])

Such changes, the speed of 30%, still feeling a bit slow, again how should be optimized,

CodePudding user response:

Then don't use a for loop and slice computing

Or chunking