Populating multiple variables in each iteration with pandas-CodePudding

I have a number of variables and my intention is to populate each of them in a number of iterations while each need a different expression in order to extract their values. A rough equivalent of what I am trying to do is the following for loop.

pairs = {('Ams', 'Rot') : 10, ('Del', 'Utr') : 12, ('Ams', 'Utr') : 14, ('Del', 'Rot') : 16}

var_1 = []
var_2 = []
var_3 = []
var_4 = []

for i in range(3):
    for (j, k) in pairs:
        var_1.append(i)
        var_2.append(j)
        var_3.append(k)
        var_4.append(pairs[(j, k)])

df = {'Var_1' : var_1, 'Var_2' : var_2, 'Var_3' : var_3, 'Var_4' : var_4}
df = pd.DataFrame(df)
print(df)

My Desired output:

    Var_1 Var_2 Var_3  Var_4
0       0   Ams   Rot     10
1       0   Del   Utr     12
2       0   Ams   Utr     14
3       0   Del   Rot     16
4       1   Ams   Rot     10
5       1   Del   Utr     12
6       1   Ams   Utr     14
7       1   Del   Rot     16
8       2   Ams   Rot     10
9       2   Del   Utr     12
10      2   Ams   Utr     14
11      2   Del   Rot     16

However, I am curious to know whether there is a more efficient way of doing this in particular with pandas. In the end I would like to create a pandas DataFrame of the following dictionary.

CodePudding user response：

You could just use dict-comprehension to setup that easily

names = ['var_1', 'var_2', 'var_3', 'var_4']
values = {n: range(3) for n in names}
df = pd.DataFrame(values)

   var_1  var_2  var_3  var_4
0      0      0      0      0
1      1      1      1      1
2      2      2      2      2

But creating a dataframe with identical columns is a bit strange, that doesn't have much informations

CodePudding user response：

Try:

df = (pd.DataFrame({n: pd.Series(pairs) for n in range(3)})
        .stack()
        .rename_axis(["Var_2", "Var_3", "Var_1"])
        .rename("Var_4")
        .reset_index()
        .sort_values("Var_1", ignore_index=True)
        .sort_index(axis=1)
        )

>>> df

   Var_1 Var_2 Var_3  Var_4
0       0   Ams   Rot     10
1       0   Del   Utr     12
2       0   Ams   Utr     14
3       0   Del   Rot     16
4       1   Ams   Rot     10
5       1   Del   Utr     12
6       1   Ams   Utr     14
7       1   Del   Rot     16
8       2   Ams   Rot     10
9       2   Del   Utr     12
10      2   Ams   Utr     14
11      2   Del   Rot     16

CodePudding user response：

You can use a solution based on indexes:

As you have a dict, create a dataframe with data are values and indexes are the keys. In your case, you have tuple keys so use the index will be a pd.MultiIndex. At this point you have Var_2, Var_3 and Var_4.

The tricky part is to generate Var_1 from this dataframe. Repeat your index 3 times and reindex the dataframe. All values are duplicated. So you have 3 x (Ams, Rot, 10), 3 x (Del, Utr, 12) and so on. Now if you group this duplicated rows together, you can use cumcount to create an ID (0 -> first instance, 1 -> second instance, ...). Finally sort your dataframe by index (Var_1) and reset it to get your expected result.

# Part 1: create Var_2, Var_3 and Var_4
mi = pd.MultiIndex.from_tuples(pairs.keys(), names=['Var_2', 'Var_3'])
df = pd.DataFrame({'Var_4': pairs.values()}, index=mi).reset_index()

# Part 2: create Var_1
df = df.reindex(df.index.repeat(3))
df = df.set_index(df.groupby(df.columns.tolist()).cumcount().rename('Var_1')) \
       .sort_index().reset_index()

Output:

>>> df
    Var_1 Var_2 Var_3  Var_4
0       0   Ams   Rot     10
1       0   Del   Utr     12
2       0   Ams   Utr     14
3       0   Del   Rot     16
4       1   Ams   Rot     10
5       1   Del   Utr     12
6       1   Ams   Utr     14
7       1   Del   Rot     16
8       2   Ams   Rot     10
9       2   Del   Utr     12
10      2   Ams   Utr     14
11      2   Del   Rot     16

CodePudding user response：

Try:

df = pd.concat([pd.Series(pairs, name='Var_4').to_frame()]*3, keys=range(3),
               names=['Var_1', 'Var_2', 'Var_3']).reset_index()

Output:

    Var_1 Var_2 Var_3  Var_4
0       0   Ams   Rot     10
1       0   Del   Utr     12
2       0   Ams   Utr     14
3       0   Del   Rot     16
4       1   Ams   Rot     10
5       1   Del   Utr     12
6       1   Ams   Utr     14
7       1   Del   Rot     16
8       2   Ams   Rot     10
9       2   Del   Utr     12
10      2   Ams   Utr     14
11      2   Del   Rot     16

CodePudding user response：

We could also create a DataFrame with pairs (this will create a DataFrame with one row and MultiIndex columns), repeat it (since we want to repeat the same row 3 times, we use Index.repeat reindex to repeat 3 times). Then we use reset_index rename_axis reset_index to get "Var_1" correctly named and ordered. Then melt will deliver the data in the desired shape. Finally, we could use sort_values reset_index to get a DataFrame the same as the one you built.

tmp = pd.DataFrame(pairs, index=[0])
out = (tmp.reindex(tmp.index.repeat(3))
       .reset_index(drop=True)
       .rename_axis('Var_1')
       .reset_index()
       .melt(id_vars=['Var_1'], var_name=['Var_2', 'Var_3'], value_name='Var_4')
       .sort_values(by='Var_1')
       .reset_index(drop=True))

Output:

    Var_1 Var_2 Var_3  Var_4
0       0   Ams   Rot     10
1       0   Del   Utr     12
2       0   Ams   Utr     14
3       0   Del   Rot     16
4       1   Ams   Rot     10
5       1   Del   Utr     12
6       1   Ams   Utr     14
7       1   Del   Rot     16
8       2   Ams   Rot     10
9       2   Del   Utr     12
10      2   Ams   Utr     14
11      2   Del   Rot     16

Or you could write a list comprehension and build a DataFrame with a list. This is very similar to what you already have. The only difference is instead of building 4 separate lists, it builds one list.

tmp = [[i, j, k, v] for i in range(3) for (j, k), v in pairs.items()]
df = pd.DataFrame(tmp, columns=['Var_1', 'Var_2', 'Var_3', 'Var_4'])