I have dataframes as follows;
df1:
Col1 | Col2 |
---|---|
1 | hey.1 |
2 | and.5 |
3 | hai.6 |
4 | six.4 |
I want to split Col2 by delimiter ".". I am doing that by executing this code:
df1[["Col3", "Col2"]] = df1["Col2"].str.split(".", expand = True)
My outcome is this (which is ultimately what I want):
df1:
Col1 | Col2 | Col3 |
---|---|---|
1 | hey | 1 |
2 | and | 5 |
3 | hai | 6 |
4 | six | 4 |
MY PROBLEM IS; I also have another df called df2, where col2 does not have the delimiter ".".
df2:
Col1 | Col2 |
---|---|
1 | ho |
2 | has |
3 | text |
4 | map |
The problem is that I still need to execute the same code on this df as well. But when the col2 does not have the delimiter, the desired outcome I want is this;
df2:
Col1 | Col2 | Col3 |
---|---|---|
1 | ho | NA |
2 | has | NA |
3 | text | NA |
4 | map | NA |
But when running this code on df2;
df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True)
I receive the following error: "ValueError: Columns must be same length as key"
My question is: Is there a way to write the code so that it successfully executes on dfs like df1, but it also execute succesfully on dfs like df2 (by passing NAs when delimiter does not exist)?
CodePudding user response:
Trick is create DataFrame with 2 columns after split - because swapped order is used default columns names [1, 0]
in DataFrame.reindex
:
df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True).reindex([1, 0], axis=1)
Or if original order use [0, 1]
:
df2[["Col2", "Col3"]] = df2["Col2"].str.split(".", expand = True).reindex([0, 1], axis=1)
Sample:
print (df2)
Col1 Col2
0 1 ho
1 2 has
print (df2["Col2"].str.split(".", expand = True))
0
0 ho
1 has
df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True).reindex([1, 0], axis=1)
print (df2)
Col1 Col2 Col3
0 1 ho NaN
1 2 has NaN
If 2 columns DataFrame after split same solution working:
print (df2)
Col1 Col2
0 1 ho.3
1 2 has.4
print (df2["Col2"].str.split(".", expand = True))
0 1
0 ho 3
1 has 4
df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True).reindex([1, 0], axis=1)
print (df2)
Col1 Col2 Col3
0 1 ho 3
1 2 has 4