I keep running into this use and I haven't found a good solution. I am asking for a solution in python, but a solution in R would also be helpful.
I've been getting data that looks something like this:
import pandas as pd
data = {'Col1': ['Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '']}
df = pd.DataFrame(data)
Col1
0 Bob
1 101
3
4 Sue
5 102
6 Second Street
7
8 Alex
9 200
10 Third Street
11
The pattern in my real data does repeat like this. Sometimes there is a blank row (or more than 1), and sometimes there are not any blank rows. The important part here is that I need to convert this column into a row.
I want the data to look like this.
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
I have tried playing around with this, but nothing has worked. My thought was to iterate through a few rows at a time, assign the values to the appropriate column, and just build a data frame row by row.
x = len(df['Col1'])
holder = pd.DataFrame()
new_df = pd.DataFrame()
while x < 4:
temp = df.iloc[:5]
holder['Name'] = temp['Col1'].iloc[0]
holder['Address'] = temp['Col1'].iloc[1]
holder['Street'] = temp['Col1'].iloc[2]
new_df = pd.concat([new_df, holder])
df = temp[5:]
df.reset_index()
holder = pd.DataFrame()
x = len(df['Col1'])
new_df.head(10)
CodePudding user response:
In R
,
data <- data.frame(
Col1 = c('Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '')
)
k<-which(grepl("Street", data$Col1) == TRUE)
j <- k-1
i <- k-2
data.frame(
Name = data[i,],
Adress = data[j,],
Street = data[k,]
)
Name Adress Street
1 Bob 101 First Street
2 Sue 102 Second Street
3 Alex 200 Third Street
Or, if Street
not ends with Street
but Adress
are always a number, you can also try
j <- which(apply(data, 1, function(x) !is.na(as.numeric(x)) ))
i <- j-1
k <- j 1
CodePudding user response:
Python3
In Python 3, you can convert your DataFrame into an array and then reshape it.
n = df.shape[0]
df2 = pd.DataFrame(
data=df.to_numpy().reshape((n//4, 4), order='C'),
columns=['Name', 'Address', 'Street', 'Empty'])
This produces for your sample data this:
Name Address Street Empty
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
If you like you can remove the last column:
df2 = df2.drop(['Empty'], axis=1)
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
One-liner code
df2 = pd.DataFrame(data=df.to_numpy().reshape((df.shape[0]//4, 4), order='C' ), columns=['Name', 'Address', 'Street', 'Empty']).drop(['Empty'], axis=1)
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
CodePudding user response:
In python i believe this may help u.
1 import pandas as pd
2
3 data = {'Col1': ['Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '']}
4
5 var = list(data.values())[0]
6 var2 = []
7 for aux in range(int(len(var)/4)):
8 var2.append(var[aux*4: aux*4 3])
9 data = pd.DataFrame(var2, columns=['Name', 'Address','Street',])
10 print(data)
CodePudding user response:
With pandas we can do it pivot
after use cumcount
and mod get the key
df.index=df.index//4
df['key'] = df.groupby(df.index).cumcount()
out = df.pivot(columns='key',values='Col1')
out.columns = ['Name', 'Address', 'Street', 'Empty']
out
Name Address Street Empty
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street