Home > Blockchain >  How to Explode row into multiple rows based on value of another column?
How to Explode row into multiple rows based on value of another column?

Time:09-01

I have a dataframe with values similar to this (values changed for security) and there are around 1000 lines of data here:

dataframe = pd.DataFrame({'Subnet.1.1': ['514.71.90', '871.84.0','33.45.16'], 'Difference' : 
['10','16','4'], 'Location': ['Alaska', 'Hawaii', 'Maine']})

# Result:
                Subnet.1.1  Difference  Location
           0    514.71.90   10          Alaska
           1    871.84.0    16          Hawaii
           2    33.45.16    4           Maine

I need to explode the third octet of the IP address depending on the value of the 'Difference' column, while keeping the information intact.

the problem is I have 255 (unique values) in the 'Difference' column. and I have written the below code but I really don't want to do a block of text for every occurrence (unique value)and then concat all the new resulting frames together at the end. I am sure there is a more positive and efficient way to do this but I can't figure it out.

#So what I am thinking is to separate each value into a new dataframe:

subnet_10 = dataframe[dataframe['Difference'] == '10']   # ..and so on

Then run the below code for each subnet:

try:
  df2_10 = subnet_10["Subnet.1.1"].str.split(".", expand=True)

  df3_10 = (df2_10.join(df2_10.
                pop(2). # Extract the third element
                astype(int) # Cast the string to int
                .apply(lambda start: range(start, start 10)) # expand the range
                .explode() # explode into individual rows
                .astype(str) # cast back into str
               )
      )

  df_final_10 = (df3_10[[0, 1, 2]] # Select IPs in the right order
            .agg('.'.join, axis=1) # Rebuild them
            .to_frame(name="Subnet.1.1") # turn into df
            .join(subnet_10.drop("Subnet.1.1", axis=1))) # join with subnet_10 dataframe
except:
  df_final_10 = pd.DataFrame()

So my question is:

**Is it possible to write a code that will check for each value in the 'Difference' column and and explode the Subnet.1.1 column accordingly without writing a block for each occurrence in the 'Difference' Column? **

Expected Output for each 'Difference' value:

Subnet.1.1   Difference   Location
514.71.90    10           Alaska
514.71.91    10           Alaska
....etc to 
514.71.100   10           Alaska
871.84.0     4            Hawaii
871.84.1     4            Hawaii
etc

CodePudding user response:

I'm sure there has to be a more efficient way to solve this, but this is the best I could come up with for now. The idea is to make a new temporary column containing a list of all the IPs of the subnet and then use the explode function to explode the column. The downside to this is that the Subnet1.1 column stays the same so you just have to drop it and replace it with the new column. In order to do that string manipulation I built a simple function that is passed as argument in the apply function.

# create DataFrame
df = pd.DataFrame({'Subnet.1.1': ['514.71.90', '871.84.0', '33.45.16'],
                   'Difference': ['10', '16', '4'],
                   'Location': ['Alaska', 'Hawaii', 'Maine']})
# define function
def list_of_subnets(row):
    # split IP into 3 its three components
    comps = row['Subnet.1.1'].split(".")
    # get value of Difference column as an integer 
    diff = int(row['Difference'])
    # assemble the IP's components (range goes to diff   1 as to include the last element)
    return [comps[0]   '.'   comps[1]   '.'   str(int(comps[2]) i) for i in range(diff 1)]

# apply function to a new column
df['temp'] = df.apply(list_of_subnets,axis=1)
# explode 'temp' column
df = df.explode('temp')
# drop and rename columns 
df = df.drop('Subnet.1.1',axis=1).rename(columns={'temp':'Subnet.1.1'})
# rearrange columns to original format
df = df[['Subnet.1.1','Difference','Location']]

Output is the same as yours (except the Difference for IP: 871.84.0 was 16 in the start and 4 in the end, I guess that was a typo).

CodePudding user response:

I think for now the only way is for me to write a block for each occurrence.

I appreciate anyone's time on this and any answer/ response later would be greatly appreciated.

  • Related