My goal is to iterate through a list of possible B values, such that each ID (col A) will have new rows added with C = 0 where the possible B value did not previously exist in the DF.
I have a dataframe with:
A B C
0 id1 2 10
1 id1 3 20
2 id2 1 30
possible_B_values = [1 2 3]
Resulting in:
A B C
0 id1 1 0
1 id1 2 10
2 id1 3 20
3 id2 1 30
4 id2 2 0
5 id2 3 0
Thanks in advance!
CodePudding user response:
Using some index trickery:
import pandas as pd
df = pd.read_clipboard() # Your df here
possible_B_values = [1, 2, 3]
extrapolate_columns = ["A", "B"]
index = pd.MultiIndex.from_product(
[df["A"].unique(), possible_B_values],
names=extrapolate_columns
)
out = df.set_index(extrapolate_columns).reindex(index, fill_value=0).reset_index()
out:
A B C
0 id1 1 0
1 id1 2 10
2 id1 3 20
3 id2 1 30
4 id2 2 0
5 id2 3 0
CodePudding user response:
Maybe you can create a dataframe with list of tuples with possible B values and merge it with original one
import pandas as pd
# Create a list of tuples with the possible B values and a C value of 0
possible_b_values = [1, 2, 3]
possible_b_rows = [(id, b, 0) for id in df['A'].unique() for b in possible_b_values]
# Create a new DataFrame from the list of tuples
possible_b_df = pd.DataFrame(possible_b_rows, columns=['A', 'B', 'C'])
# Merge the new DataFrame with the original one, using the 'A' and 'B' columns as the keys
df = df.merge(possible_b_df, on=['A', 'B'], how='outer')
# Fill any null values in the 'C' column with 0
df['C'] = df['C'].fillna(0)
print(df)
CodePudding user response:
Here is a one-liner pure pandas
way of solving this -
- You set the index as
B
(this will help in re-indexing later) - Gropuby column
A
and then for columnC
apply the following apply function to reindexB
- The lambda function
x.reindex(range(1,4), fill_value=0)
basically takes each group of dataframex
for each id, and then reindexes it fromrange(1,4) = 1,2,3
and fills the nan values with 0. - Finally you
reset_index
to bringA
andB
back into the dataframe.
out = df.set_index('B') \ # Set index as B
.groupby(['A'])['C'] \ # Groupby A and use apply on column C
.apply(lambda x: x.reindex(range(1,4), fill_value=0))\ # Reindex B to range(1,4) for each group and fill 0
.reset_index() # Reset index
print(out)
A B C
0 id1 1 0
1 id1 2 10
2 id1 3 20
3 id2 1 30
4 id2 2 0
5 id2 3 0