Home > database >  How can I create more lines based on the value of a column?
How can I create more lines based on the value of a column?

Time:07-14

I've created a code in Python that suggests which vehicles I should use based on the volume I need to fit inside it.

Now I need to create X different lines based on the sum of the vehicle types my code suggested. In this example, it suggested 4 type_1 and 1 type_3 vehicles, so I need to create 5 different lines with the same information (route, date) in each of them, and in the column "vehicle type" it will reproduce the vehicle type.

Example of input and output

I hope I was clear and please let me know how I can help you understand my question.

Thanks!

CodePudding user response:

This method can run through every row and output a new dataframe as you mentioned. I added another row to show its efficiency.

df = pd.DataFrame({
    "route" : ["Route ABC", "Route DEF"],
    "date" : ["2022-07-11", "2022-09-10"],
    "vehicle_type_1" : [4,1],
    "vehicle_type_2" : [0,2],
    "vehicle_type_3" : [1,3]
})

# Empty lists to append data into
route = []
date = []
vType = []

for x in range(len(df)):
    for y in df.columns:
        if "vehicle_type" in y:
            route.append([df['route'][x]] * df[y][x]) # Multiply all values by the number of vehicle types
            date.append([df['date'][x]] * df[y][x])
            vType.append([y.split("_")[2]] * df[y][x])
            
# "Explode" your list of lists into a list of values
route = [x for xx in route for x in xx]
date = [x for xx in date for x in xx]
vType = [x for xx in vType for x in xx]

newDF = pd.DataFrame({"route":route, "date":date, "vehicle_type":vType})
newDF

Output:

    route       date        vehicle_type
0   Route ABC   2022-07-11  1
1   Route ABC   2022-07-11  1
2   Route ABC   2022-07-11  1
3   Route ABC   2022-07-11  1
4   Route ABC   2022-07-11  3
5   Route DEF   2022-09-10  1
6   Route DEF   2022-09-10  2
7   Route DEF   2022-09-10  2
8   Route DEF   2022-09-10  3
9   Route DEF   2022-09-10  3
10  Route DEF   2022-09-10  3

CodePudding user response:

One approach is to create a list of rows to create your output from your input. You could do something like this, adding a variable number of rows based on how the count for each vehicle type.

rows = [] 
data_dictionaries = input.to_dict(orient='records')
for entry in data_dictionaries:
    route = entry['route']
    date = entry['date']
    num_type1 = entry['vehicle_type_1']
    num_type2 = entry['vehicle_type_2']
    num_type3 = entry['vehicle_type_3']

    rows  = [{'route': route, 'date': date, 'vehicle_type': 1}]*num_type1
    rows  = [{'route': route, 'date': date, 'vehicle_type': 2}]*num_type2
    rows  = [{'route': route, 'date': date, 'vehicle_type': 3}]*num_type3

And then convert back to a dataframe at the end!

output = pd.Dataframe(rows)
  • Related