python - how to create a more compact group for dictionary-CodePudding

Hi this part of my code for a biology project:

# choosing and loading the file:
df = pd.read_csv('Dafniyot_Data.csv',delimiter=',')
#grouping data by C/I groups:
CII = df[df['group'].str.contains('CII')]
CCI = df[df['group'].str.contains('CCI')]
CCC = df[df['group'].str.contains('CCC')]
III = df[df['group'].str.contains('III')]
CIC = df[df['group'].str.contains('CIC')]
ICC = df[df['group'].str.contains('ICC')]
IIC = df[df['group'].str.contains('IIC')]
ICI = df[df['group'].str.contains('ICI')]
#creating a dictonary of the groups:
dict = {'CII':CII, 'CCI':CCI, 'CCC':CCC,'III':III,'CIC':CIC,'ICC':ICC,'IIC':IIC,'ICI':ICI}

#T test
#FERTUNITY
#using ttest for checking FERTUNITY - grandmaternal(F0)
t_F0a = stats.ttest_ind(CCC['N_offspring'],ICC['N_offspring'],nan_policy='omit')
t_F0b = stats.ttest_ind(CCI['N_offspring'],ICI['N_offspring'],nan_policy='omit')
t_F0c = stats.ttest_ind(IIC['N_offspring'],CIC['N_offspring'],nan_policy='omit')
t_F0d = stats.ttest_ind(CCI['N_offspring'],III['N_offspring'],nan_policy='omit')
t_F0 = {'FERTUNITY - grandmaternal(F0)':[t_F0a,t_F0b,t_F0c,t_F0d]}

I need to repeat the ttest part 6 more times with either changing the groups(CCC,etc..)or the row from the df('N_offspring',survival) which takes a lot of lines in the project.

I'm trying to find a way to still get the dictionary of each group in the end:

t_F0 = {'FERTUNITY - grandmaternal(F0)':[t_F0a,t_F0b,t_F0c,t_F0d]}

Because its vey useful for me later, but in a less repetitive way with less lines

CodePudding user response：

Use itertools.product to generate all the keys, and a dict comprehension to generate the values:

from itertools import product
keys = [''.join(items) for items in product("CI", repeat=3)]
the_dict = { key: df[df['group'].str.contains(key)] for key in keys }

Similarly, you can generate the latter part of your test keys:

half_keys = [''.join(items) for items in product("CI", repeat=2)]
t_F0 = {
    'FERTUNITY - grandmaternal(F0)': [
        stats.ttest_ind(
            the_dict[f"C{half_key}"]['N_offspring'],
            the_dict[f"I{half_key}"]['N_offspring'],
            nan_policy='omit'
        ) for half_key in half_keys
    ],
}

As an aside, you should not use dict as a variable name: it already has a meaning (the type of dict objects).

As a second aside, this deals with the literal question of how to DRY up creating a dictionary. However, do consider what Chris said in comments; this may be an XY problem.