Home > Net >  How to spilt each column into more columns in a given dataframe
How to spilt each column into more columns in a given dataframe

Time:11-19

I have over 100 columns of the week and for each column of the week, I want to proportion it into days and assign row-specific values to each row over 7 new columns. Like this

enter image description here

I am new to python, I know I need a while loops and for loops but now sure how to go about doing this. Can anyone help?

Based on previous advice I had from this forum, the below work for Week1, can someone advise me on how to loop each week for weeks 2, 3, 4 to nth week?

import pandas as pd

df = pd.DataFrame({"Week1": [9, 30, 35, 65],"Week2": [20, 10, 25, 55],"Week3": [19, 35, 40, 15],"Week4": [7, 10, 70, 105]})

# define which columns need to be created
# this will be the range between 1 and the maximum of the Total Number column
columns_to_fill = ["col"   str(i) for i in range(1, 8)]
# columns_to_fill = [ col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, .... , col28 ]


# now, go through each row of your dataframe
for indx, row in df.iterrows():
    # and for each column in the new columns to be filled
    # check if the number is smaller or equal than the row's Total Number
    # if it is smaller, fill the column with 1
    # else fill the column with 0
    
    for number, column in enumerate(columns_to_fill):
        if number   1 <= row["Week1"]:
            df.loc[indx, column] = 1
        else:
            df.loc[indx, column] = 0
    

    # now check if there is a remainder
    remainder = row["Week1"] - 7
    
    # while remainder is greater than 0
    # we need to continue adding  1 to the columns
    while remainder > 0:
        for number, column in enumerate(columns_to_fill):
            if number   1 <= remainder:
                df.loc[indx, column]  = 1
            else:
                continue
        # update remainder
        remainder = remainder - 7

CodePudding user response:

Here is a vectorized option, first repeat each row 7 times (number of days a week), and add an extra index level with set_index being the day number.

_df = (
    df.loc[df.index.repeat(7)]
      .set_index(np.array(list(range(1,8))*len(df)), append=True)
)
print(_df.head(10))
#      Week1  Week2  Week3  Week4
# 0 1      9     20     19      7
#   2      9     20     19      7
#   3      9     20     19      7
#   4      9     20     19      7
#   5      9     20     19      7
#   6      9     20     19      7
#   7      9     20     19      7
# 1 1     30     10     35     10
#   2     30     10     35     10
#   3     30     10     35     10

now calculate the result of the entire division with //7, then add the rest where needed using the modulo % that you can compare with the extra index level created as it is the day number.

# entire division
res = _df//7

# add the rest where needed
res  = (_df%7 >= _df.index.get_level_values(1).to_numpy()[:, None]).astype(int)
print(res)
#      Week1  Week2  Week3  Week4
# 0 1      2      3      3      1
#   2      2      3      3      1
#   3      1      3      3      1
#   4      1      3      3      1
#   5      1      3      3      1
#   6      1      3      2      1
#   7      1      2      2      1
# 1 1      5      2      5      2
#   2      5      2      5      2
#   3      4      2      5      2

Finally, reshape and rename columns if wanted.

# reshape the result
res = res.unstack()

# rename the columns if you don't want multiindex
res.columns = [f'{w}_col{i}' for w, i in res.columns]

print(res)
#    Week1_col1  Week1_col2  Week1_col3  Week1_col4  Week1_col5  Week1_col6  \
# 0           2           2           1           1           1           1   
# 1           5           5           4           4           4           4   
# 2           5           5           5           5           5           5   
# 3          10          10           9           9           9           9   

#    Week1_col7  Week2_col1  Week2_col2  Week2_col3  Week2_col4  Week2_col5  \
# 0           1           3           3           3           3           3   
# 1           4           2           2           2           1           1   
# ...

and you can still join to your original dataframe

res = df.join(res)

CodePudding user response:

This iterates over the original data frame. The data frame rows can be of any length and have any number of header elements. The sub-header can have any number of elements (1D).

#Import
import numpy as np
import pandas as pd

#Example data frame and sub-header.
df = pd.DataFrame({"Week1": [9, 30, 35, 65],"Week2": [20, 10, 25, 55],"Week3": [19, 35, 40, 15],"Week4": [7, 10, 70, 105]})
subHeader = ['day1','day2','day3','day4','day5','day6','day7']

#Sort data frame and sub header.
df = df.reindex(sorted(df.columns), axis=1)
subHeader.sort()

#Extract relevant variables.
cols = df.shape[1]
rows = df.shape[0]
subHeadLen = len(subHeader)
mainHeader = list(df.columns)
meanHeadLen = len(mainHeader)

#MultiIndex main header with sub-header.
header = pd.MultiIndex.from_product([mainHeader,subHeader], names=['Week','Day'])

#Hold vals in temporary matrix.
mat = np.zeros((rows,meanHeadLen*subHeadLen))

#Iterate over data frame weeks. For every value in each row distribute over matrix indices by incrementing elements daily.
for col in range(cols):
    for val in range(rows):
        while df.iat[val,col] > 0:
            for subVal in range(subHeadLen):
                if df.iat[val,col] > 0:
                    mat[val][col*subHeadLen   subVal] = mat[val][col*subHeadLen   subVal]   1
                    df.iat[val,col] = df.iat[val,col] - 1

#Final data frame.
df2 = pd.DataFrame(mat,columns=header)
print(df2)
  • Related