Home > Net >  pandas.Series.str.len not behaving as expected
pandas.Series.str.len not behaving as expected

Time:11-30

I am trying to work with lists in Pandas cells. Not the best idea, maybe. Still I find that the pandas.Series.str.len method does not work as shown in the documentation.

Here goes my code:

    df3=pd.DataFrame({"comas":
                      [
        "1,2,3",
        "4,5,6,7,8,9",
        "7,8",
        "9,10,11,12"
                      ]
                     }                
                    ) 

    df3["split"]=df3["comas"].str.split()

From there I obtain the following dataframe, as expected:

    comas   split
0   1,2,3   [1,2,3]
1   4,5,6,7,8,9 [4,5,6,7,8,9]
2   7,8 [7,8]
3   9,10,11,12  [9,10,11,12]

Now I want to know the length of each list.

df3["split"].str.len()

and I get

0    1
1    1
2    1
3    1
Name: split, dtype: int64

What I see in the documentation is that

s = pd.Series(['dog',
                '',
                5,
                {'foo' : 'bar'},
                [2, 3, 5, 7],
                ('one', 'two', 'three')])
s
0                  dog
1
2                    5
3       {'foo': 'bar'}
4         [2, 3, 5, 7]
5    (one, two, three)
dtype: object
s.str.len()
0    3.0
1    0.0
2    NaN
3    1.0
4    4.0
5    3.0
dtype: float64

Can somebody explain to me what is the difference between the list item in the fourth element of the example series and my series? I am using pandas version 1.3.3

Thank you in advance!


Edit:

RoseGod is right, I did not include the separator in the split. I was confused because Jupyter shows the elements in the dataframe pretty much the same way regardless if it was actually separated or not

df3.loc[0]

comas                  1,2,3
split                [1,2,3]
split_separator    [1, 2, 3]
Name: 0, dtype: object

It does show the element as a single string if I get a single cell, though:

df3.loc[0,"split"]
['1,2,3']

df3.loc[0,"split_separator"]
['1', '2', '3']

CodePudding user response:

In the split you should specify how you want to split:

df3["split"]=df3["comas"].str.split(',')

Then the output will be the shape you want:

df3["split"].str.len()

0    3
1    6
2    2
3    4
Name: split, dtype: int64

If you print the value in the first row after you use the split without specifying the delimiter this is the output:

['1,2,3']

you can see it didn't really split it into a list but just created a list that contains the string value.

  • Related