I have dataframe with weekly data from which I took only unique IDs:
weekID = df["meterID"].unique().tolist()
Lets say it looks like this:
[1,
2,
3,
4,
5,
6,
7,
8,
9,
10]
For every day of this weekly data I took only IDs that fits my conditions, so lets say I have 7 list:
day1
[2,
3]
day2
[3,
4,
5]
day3
[6,
7]
day4
[2,
3,
4,
5,
6,
7,
8]
day5
[8,
9]
day6
[2,
3]
day7
[2,
3]
I want to put this things together into one .csv file in which:
- all IDs from weekID are index
- every day-list will append as new column with column_name = date So the final output would look lite this:
ID day1 day2 day3 day5 day6 day7
1
2 YES YES YES
3 YES YES YES YES
4 YES
5 YES
6 YES
7 YES
8 YES
9 YES
10
I have tried:
df = pd.DataFrame()
df["weekID"] = weekID
dfSuspects["day1"] = ID_day1
output: ValueError: Length of values (94) does not match length of index (495)
I would be very gratefull for telling my what I am doing wrong.
CodePudding user response:
Let's start with your arrays:
day1 = [2, 3]
day2 = [3, 4, 5]
day3 = [6, 7]
day4 = [2, 3, 4, 5, 6, 7, 8]
day5 = [8, 9]
day6 = [2, 3]
day7 = [2, 3]
For convenience, let's gather these into a single structure:
>>> local_vars = locals()
>>> days = [local_vars[f'day{i}'] for i in range(1, 8)]
[[2, 3], [3, 4, 5], [6, 7], [2, 3, 4, 5, 6, 7, 8], [8, 9], [2, 3], [2, 3]]
You can get to this stage any other way you like, this is just to set-up an end-to-end solution.
Now, you just need to iterate through each day - create a list for each that would include a value for each of your index values, whether present in the list for that day or not, and then finally wrap it all into a dataframe:
pd.DataFrame(
{
f'day{i 1}': [x in day for x in range(10)]
for i, day in enumerate(days)
}
)
Final result:
day1 day2 day3 day4 day5 day6 day7
0 False False False False False False False
1 False False False False False False False
2 True False False True False True True
3 True True False True False True True
4 False True False True False False False
5 False True False True False False False
6 False False True True False False False
7 False False True True False False False
8 False False False True True False False
9 False False False False True False False
If you really want strings replace x in day
with 'TRUE' if x in day else ''
:
day1 day2 day3 day4 day5 day6 day7
0
1
2 TRUE TRUE TRUE TRUE
3 TRUE TRUE TRUE TRUE TRUE
4 TRUE TRUE
5 TRUE TRUE
6 TRUE TRUE
7 TRUE TRUE
8 TRUE TRUE
9 TRUE
CodePudding user response:
Assume you have the seven list, first find the max index among them
day1 = [2, 3]
day2 = [3, 4, 5]
day3 = [6, 7]
day4 = [2, 3, 4, 5, 6, 7, 8]
day5 = [8, 9]
day6 = [2, 3]
day7 = [2, 3]
days = [globals()[f'day{i 1}'] for i in range(7)]
max_id = max(map(max, days))
Then create an empty dataframe with columns number matches the day count and index range ends at max_id
df = pd.DataFrame(columns=[f'day{i 1}' for i in range(7)], index=range(1, max_id 1))
Then apply
on columns to check if index is in day list.
df = df.apply(lambda col: col.index.isin(globals()[col.name]), axis=0)
print(df)
day1 day2 day3 day4 day5 day6 day7
1 False False False False False False False
2 True False False True False True True
3 True True False True False True True
4 False True False True False False False
5 False True False True False False False
6 False False True True False False False
7 False False True True False False False
8 False False False True True False False
9 False False False False True False False
At last, replace the True
and False
value according to your needs
df = df.replace({True: 'Yes', False: ''})
print(df)
day1 day2 day3 day4 day5 day6 day7
1
2 Yes Yes Yes Yes
3 Yes Yes Yes Yes Yes
4 Yes Yes
5 Yes Yes
6 Yes Yes
7 Yes Yes
8 Yes Yes
9 Yes
CodePudding user response:
This looks a bit like something .str.get_dummies()
could solve:
days = [[2, 3], [3, 4, 5], [6, 7], [2, 3, 4, 5, 6, 7, 8], [8, 9], [2, 3], [2, 3]]
ser = pd.Series(
["|".join(map(str, day)) for day in days],
index=[f"day{i}" for i in range(1, len(days) 1)]
)
result = (
ser.str.get_dummies().replace({1: "YES", 0: ""}).rename(int, axis=1).T
.sort_index()
)
Result:
day1 day2 day3 day4 day5 day6 day7
2 YES YES YES YES
3 YES YES YES YES YES
4 YES YES
5 YES YES
6 YES YES
7 YES YES
8 YES YES
9 YES
But since 1
isn't present in any day the respective row is missing. So is this accident or can that really happen?