I am looking to generate multiple rows based off a single record from a list.
For example, I have a CSV file (e.g. File A) as follows:
User ID | Total Value | Multiple Value | Remaining Value |
---|---|---|---|
123 | 1007.25 | 11 | 7.25 |
456 | 804.25 | 9 | 4.25 |
I want to create another CSV file (e.g. File B) like this:
User ID | Final Value |
---|---|
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 100.00 |
123 | 7.25 |
456 | 100.00 |
456 | 100.00 |
456 | 100.00 |
456 | 100.00 |
456 | 100.00 |
456 | 100.00 |
456 | 100.00 |
456 | 100.00 |
456 | 4.25 |
I tried using the pandas.concat function, along with a for loop, but I can't seem to get that to work properly (it errors out).
CodePudding user response:
If I understand the logic correctly, you can do this without the columns "Multiple Value" and "Remaining Value":
import numpy as np
import pandas as pd
df = pd.read_clipboard() # Your df here
df["Final Value"] = df["Total Value"].apply(lambda x: np.minimum(x - np.arange(0, x, 100), 100))
out = df[["User ID", "Final Value"]].explode("Final Value")
User ID Final Value
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 100.0
0 123 7.25
1 456 100.0
1 456 100.0
1 456 100.0
1 456 100.0
1 456 100.0
1 456 100.0
1 456 100.0
1 456 100.0
1 456 4.25
CodePudding user response:
This could be a solution:
User_ID_List = df["User ID"].to_list()
Multiple_Value_List = df["Multiple Value"].to_list()
Remaining_Value_List = df["Remaining Value"].to_list()
New_User_ID_List= []
New_Multiple_Value_List = []
for x in range(len(User_ID_List):
Multiple_Value = Multiple_Value_List[x]
for y in range(Multiple_Value):
New_User_ID_List.append(User_ID_List[x])
New_Multiple_Value_List.append(Multiple_Value_List[x])
New_Multiple_Value_List.pop()
New_Multiple_Value_List.append(Remaining_Value_List[x])
df = pd.DataFrame()
df["User Id"] = New_User_ID_List
df["Final Value"] = New_Multiple_Value_List
CodePudding user response:
Perhaps something like this?
import numpy as np
def spread(g):
tot, rem, n = g[['Total Value', 'Remaining Value', 'Multiple Value']].squeeze()
n -= 1
val = (tot - rem) / n
return np.r_[np.repeat(val, n), rem]
out = df.groupby('User ID').apply(spread).explode().to_frame('Final Value')
>>> out
Final Value
User ID
123 100.0
123 100.0
123 100.0
123 100.0
123 100.0
123 100.0
123 100.0
123 100.0
123 100.0
123 100.0
123 7.25
456 100.0
456 100.0
456 100.0
456 100.0
456 100.0
456 100.0
456 100.0
456 100.0
456 4.25
Then:
>>> print(out.to_csv())
User ID,Final Value
123,100.0
123,100.0
...
Or rather: out.to_csv(my_file_b)
.