Home > Software engineering >  Is there any faster ways to make DataFrame with .describe() values?
Is there any faster ways to make DataFrame with .describe() values?

Time:10-31

I have DataFrame with 20000 rows and 1600 columns. Each row represent an observed object and each column is a date. Example:

df2 = pd.DataFrame(np.array([[1, 2, 3, 4, 5], [6, np.NaN, np.NaN, np.NaN, 10], [np.NaN, np.NaN, 14, 13, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]),
                   columns=['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04', '2016-01-05'],
                   index=[1, 2, 3, 4, 5])

I want to get new DataFrame, wich should include elements of .describe() function and couple more (first value, last value and number of observations / number of dates scince first observation

I've made this:

for i in df2.index:
    df[i] = df2.T[i].describe()

But it is very slow, so I am looking for some faster solutions and help with other columns

Expected result is

    count   mean  std         min   max   first_v  last_v  density  
1   5       3     1.581139    1     5     1        5       1
2   2       8     2.828427    6     10    6        10      0.4
3   3       14    1.000000    13    15    14       15      1
4   5       18    1.581139    16    20    16       20      1
5   5       23    1.581139    21    25    21       25      1

CodePudding user response:

Instead of your loop just use:

df = df2.T.describe()
  • Related