Home > Enterprise >  Csv file split comma separated values into separate rows and dividing the corresponding dollar amoun
Csv file split comma separated values into separate rows and dividing the corresponding dollar amoun

Time:01-12

beginner here! I have a csv file with comma separated values. I want to split each comma separated value in different rows in pandas. However, the corresponding dollar amounts should be divided by the number of comma separated values in each cell and export the result in a different csv file. the csv table and the desired output table

I have used df.explode(IDs) but couldn’t figure out how to divide the Dollar_Amount by the number of IDs in the corresponding cells.

import pandas as pd
in_csv = pd.read_csv(‘inputCSV.csv’)
new_csv = df.explode(‘IDs’)
new_csv.to_csv(‘outputCSV.csv’)

CodePudding user response:

There will be a one line way to do this with a lambda function (if you are new, read up on lambda functions!) but as a slightly less new beginner, I think its easier to think about this as two separate operations.

Operation 1 - get the count of ids, Operation 2 - do the division

If you take a look here https://towardsdatascience.com/count-occurrences-of-a-value-pandas-e5dad02303e9 you'll get a good lesson on how to do the group by you need to get the count of ids and join it back to your data frame. I'd read that because its a much more detailed explainer, but if you want a simple line of code consider this Pandas, how to count the occurance within grouped dataframe and create new column?

Once you have it, the divison is as simple as df['new_col'] = df['col1']/df['col2']

CodePudding user response:

You can divide the dollar amount by the number of ids in each row before using explode. This can be done as follows:

# Preprocessing
df['Dollar_Amount'] = df['Dollar_Amount'].str[1:].str.replace(',', '').astype(float)
df['IDs'] = df['IDs'].str.split(",")

# Compute the new dollar amount and explode
df['Dollar_Amount'] = df['Dollar_Amount'] / df['IDs'].str.len()
df = df.explode('IDs')

# Postprocessing
df['Dollar_Amount'] = df['Dollar_Amount'].round(2).apply(lambda x: '${0:,.2f}'.format(x))

With an example input:

        IDs  Dollar_Amount  A
0   1,2,3,4    $100,000.00  4
1     5,6,7     $50,000.00  3
2         9     $20,000.00  1
3     10,11     $20,000.00  2

The result is as follows:

   IDs  Dollar_Amount   A
0    1     $25,000.00   4
0    2     $25,000.00   4
0    3     $25,000.00   4
0    4     $25,000.00   4
1    5     $16,666.67   3
1    6     $16,666.67   3
1    7     $16,666.67   3
2    9     $20,000.00   1
3   10     $10,000.00   2
3   11     $10,000.00   2
  • Related