Find t confidence interval across rows in dataframe-CodePudding

This is an example dataframe, my actual dataframe has 100s more rows.

nums_1  nums_2  nums_3
1       1       8
2       1       7
3       5       9

Is there a method that will calculate the 95% confidence interval across each row? A method that would work for large dataframe?

df = pd.DataFrame({'nums_1': [1, 2, 3], 'nums_2': [1, 1, 5], 'nums_3' : [8,7,9]})

CodePudding user response：

You can use stats.norm.interval and find confidence interval at 95% level with numpy.mean and numpy.std of values in each row like below:

from scipy import stats
import numpy as np

df = pd.DataFrame({'nums_1': [1, 2, 3], 'nums_2': [1, 1, 5], 'nums_3' : [8,7,9]})

df['95_interval'] = df.apply(lambda row : \
                             stats.norm.interval(0.95, loc=np.mean(row), 
                                                 scale=np.std(row)), axis=1)

Output:

>>> df
    nums_1  nums_2  nums_3  95_interval
0       1   1           8   (-3.134217846965163, 9.80088451363183)
1       2   1           7   (-1.8109239490159825, 8.477590615682649)
2       3   5           9   (0.7776575196232134, 10.55567581371012)

CodePudding user response：

You can use:

from scipy import stats

df.apply(lambda x: stats.t.interval(0.95, len(x)-1, loc=np.mean(x), scale=stats.sem(x)), axis=1)

You will obtain essentially the same results by using the following:

import statsmodels.stats.api as sms

df.apply(lambda x: sms.DescrStatsW(x).tconfint_mean(), axis=1)

Both answers return the same result - tuples.

The answer is described here: Compute a confidence interval from sample data What is important to understand is that it works correctly if each row (each sample) is drawn independently from a normal distribution with an unknown standard deviation.

When it comes to large dataframes, the easy solution is to use swifter. However, it only speeds up your calculations twice. Nevertheless, it is worth trying: https://towardsdatascience.com/do-you-use-apply-in-pandas-there-is-a-600x-faster-way-d2497facfa66

import statsmodels.stats.api as SMS
import swifter

df.swifter.apply(lambda x: sms.DescrStatsW(x).tconfint_mean(), axis=1)