Convert one column to rows and columns-CodePudding

I keep running into this use and I haven't found a good solution. I am asking for a solution in python, but a solution in R would also be helpful.

I've been getting data that looks something like this:

import pandas as pd

data = {'Col1': ['Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '']}

df = pd.DataFrame(data)


             Col1
0             Bob
1             101
3
4             Sue
5             102
6   Second Street
7
8            Alex
9             200
10   Third Street
11

The pattern in my real data does repeat like this. Sometimes there is a blank row (or more than 1), and sometimes there are not any blank rows. The important part here is that I need to convert this column into a row.

I want the data to look like this.

   Name Address         Street
0   Bob     101   First Street
1   Sue     102  Second Street
2  Alex     200   Third Street

I have tried playing around with this, but nothing has worked. My thought was to iterate through a few rows at a time, assign the values to the appropriate column, and just build a data frame row by row.

x = len(df['Col1'])
holder = pd.DataFrame()
new_df = pd.DataFrame()

while x < 4:
    temp = df.iloc[:5]
    holder['Name'] = temp['Col1'].iloc[0]
    holder['Address'] = temp['Col1'].iloc[1]
    holder['Street'] = temp['Col1'].iloc[2]

    new_df = pd.concat([new_df, holder])

    df = temp[5:]
    df.reset_index()

    holder = pd.DataFrame()

    x = len(df['Col1'])


new_df.head(10)

CodePudding user response：

In R,

data <- data.frame(
  Col1 = c('Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '')
)

k<-which(grepl("Street", data$Col1) == TRUE)
j <- k-1
i <- k-2
data.frame(
  Name = data[i,],
  Adress = data[j,],
  Street = data[k,]
)

  Name Adress        Street
1  Bob    101  First Street
2  Sue    102 Second Street
3 Alex    200  Third Street

Or, if Street not ends with Street but Adress are always a number, you can also try

j <- which(apply(data, 1, function(x) !is.na(as.numeric(x)) ))
i <- j-1
k <- j 1

CodePudding user response：

Python3

In Python 3, you can convert your DataFrame into an array and then reshape it.

n = df.shape[0]
df2 = pd.DataFrame(
    data=df.to_numpy().reshape((n//4, 4), order='C'),
    columns=['Name', 'Address', 'Street', 'Empty'])

This produces for your sample data this:

   Name Address         Street Empty
0   Bob     101   First Street      
1   Sue     102  Second Street      
2  Alex     200   Third Street

If you like you can remove the last column:

df2 = df2.drop(['Empty'], axis=1)

   Name Address         Street
0   Bob     101   First Street
1   Sue     102  Second Street
2  Alex     200   Third Street

One-liner code

df2 =  pd.DataFrame(data=df.to_numpy().reshape((df.shape[0]//4, 4), order='C' ), columns=['Name', 'Address', 'Street', 'Empty']).drop(['Empty'], axis=1)

   Name Address         Street
0   Bob     101   First Street
1   Sue     102  Second Street
2  Alex     200   Third Street

CodePudding user response：

In python i believe this may help u.

  1 import pandas as pd
  2
  3 data = {'Col1': ['Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '']}
  4
  5 var = list(data.values())[0]
  6 var2 = []
  7 for aux in range(int(len(var)/4)):
  8     var2.append(var[aux*4: aux*4 3])
  9 data = pd.DataFrame(var2, columns=['Name', 'Address','Street',])
 10 print(data)

CodePudding user response：

With pandas we can do it pivot after use cumcount and mod get the key

df.index=df.index//4
df['key'] = df.groupby(df.index).cumcount()
out = df.pivot(columns='key',values='Col1')
out.columns = ['Name', 'Address', 'Street', 'Empty']
out
Name Address         Street Empty
0   Bob     101   First Street
1   Sue     102  Second Street
2  Alex     200   Third Street