Home > Blockchain >  Adding values from list as new column and set column name to date
Adding values from list as new column and set column name to date

Time:04-26

I have dataframe with weekly data from which I took only unique IDs:

weekID = df["meterID"].unique().tolist()

Lets say it looks like this:

[1,
2,
3,
4,
5,
6,
7,
8,
9,
10]

For every day of this weekly data I took only IDs that fits my conditions, so lets say I have 7 list:

day1

[2,
3]

day2

[3,
4,
5]

day3

[6,
7]

day4

[2,
3,
4,
5,
6,
7,
8]

day5

[8,
9]

day6

[2,
3]

day7

[2,
3]

I want to put this things together into one .csv file in which:

  1. all IDs from weekID are index
  2. every day-list will append as new column with column_name = date So the final output would look lite this:
ID     day1       day2          day3             day5           day6        day7
1                                                                                 
2      YES                                                      YES          YES           
3      YES       YES                                            YES          YES              
4                 YES                                                                
5                 YES                                                                
6                               YES                                                  
7                               YES                                                  
8                                                YES                                 
9                                                YES                                 
10                                                                                 

I have tried:

df = pd.DataFrame()
df["weekID"] = weekID
dfSuspects["day1"] = ID_day1
output: ValueError: Length of values (94) does not match length of index (495)

I would be very gratefull for telling my what I am doing wrong.

CodePudding user response:

Let's start with your arrays:

day1 = [2, 3]
day2 = [3, 4, 5]
day3 = [6, 7]
day4 = [2, 3, 4, 5, 6, 7, 8]
day5 = [8, 9]
day6 = [2, 3]
day7 = [2, 3]

For convenience, let's gather these into a single structure:

>>> local_vars = locals()
>>> days = [local_vars[f'day{i}'] for i in range(1, 8)]
[[2, 3], [3, 4, 5], [6, 7], [2, 3, 4, 5, 6, 7, 8], [8, 9], [2, 3], [2, 3]]

You can get to this stage any other way you like, this is just to set-up an end-to-end solution.

Now, you just need to iterate through each day - create a list for each that would include a value for each of your index values, whether present in the list for that day or not, and then finally wrap it all into a dataframe:

pd.DataFrame(
  {
     f'day{i   1}': [x in day for x in range(10)]
     for i, day in enumerate(days)
  }
)

Final result:

    day1   day2   day3   day4   day5   day6   day7
0  False  False  False  False  False  False  False
1  False  False  False  False  False  False  False
2   True  False  False   True  False   True   True
3   True   True  False   True  False   True   True
4  False   True  False   True  False  False  False
5  False   True  False   True  False  False  False
6  False  False   True   True  False  False  False
7  False  False   True   True  False  False  False
8  False  False  False   True   True  False  False
9  False  False  False  False   True  False  False

If you really want strings replace x in day with 'TRUE' if x in day else '':

   day1  day2  day3  day4  day5  day6  day7
0                                          
1                                          
2  TRUE              TRUE        TRUE  TRUE
3  TRUE  TRUE        TRUE        TRUE  TRUE
4        TRUE        TRUE                  
5        TRUE        TRUE                  
6              TRUE  TRUE                  
7              TRUE  TRUE                  
8                    TRUE  TRUE            
9                          TRUE            

CodePudding user response:

Assume you have the seven list, first find the max index among them

day1 = [2, 3]
day2 = [3, 4, 5]
day3 = [6, 7]
day4 = [2, 3, 4, 5, 6, 7, 8]
day5 = [8, 9]
day6 = [2, 3]
day7 = [2, 3]

days = [globals()[f'day{i 1}'] for i in range(7)]

max_id = max(map(max, days))

Then create an empty dataframe with columns number matches the day count and index range ends at max_id

df = pd.DataFrame(columns=[f'day{i 1}' for i in range(7)], index=range(1, max_id 1))

Then apply on columns to check if index is in day list.

df = df.apply(lambda col: col.index.isin(globals()[col.name]), axis=0)
print(df)

    day1   day2   day3   day4   day5   day6   day7
1  False  False  False  False  False  False  False
2   True  False  False   True  False   True   True
3   True   True  False   True  False   True   True
4  False   True  False   True  False  False  False
5  False   True  False   True  False  False  False
6  False  False   True   True  False  False  False
7  False  False   True   True  False  False  False
8  False  False  False   True   True  False  False
9  False  False  False  False   True  False  False

At last, replace the True and False value according to your needs

df = df.replace({True: 'Yes', False: ''})
print(df)

  day1 day2 day3 day4 day5 day6 day7
1
2  Yes            Yes       Yes  Yes
3  Yes  Yes       Yes       Yes  Yes
4       Yes       Yes
5       Yes       Yes
6            Yes  Yes
7            Yes  Yes
8                 Yes  Yes
9                      Yes

CodePudding user response:

This looks a bit like something .str.get_dummies() could solve:

days = [[2, 3], [3, 4, 5], [6, 7], [2, 3, 4, 5, 6, 7, 8], [8, 9], [2, 3], [2, 3]]

ser = pd.Series(
    ["|".join(map(str, day)) for day in days],
    index=[f"day{i}" for i in range(1, len(days)   1)]
)
result = (
    ser.str.get_dummies().replace({1: "YES", 0: ""}).rename(int, axis=1).T
       .sort_index()
)

Result:

  day1 day2 day3 day4 day5 day6 day7
2  YES            YES       YES  YES
3  YES  YES       YES       YES  YES
4       YES       YES               
5       YES       YES               
6            YES  YES               
7            YES  YES               
8                 YES  YES          
9                      YES          

But since 1 isn't present in any day the respective row is missing. So is this accident or can that really happen?

  • Related