Home > Software engineering >  Concatenating chars with String.charAt() and operator turns them into encoded UTF-8
Concatenating chars with String.charAt() and operator turns them into encoded UTF-8

Time:11-23

I'm having trouble properly concatenating characters out of a String[] using String.charAt() and operator with the following method.

private PcbGroup createPcbGroup(String[] metadata, PcbGroup pcbGroup) {
    char group_short = metadata[2].charAt(0);
    
    pcbGroup.setId(Integer.parseInt(metadata[0]));
    pcbGroup.setGroup_name(metadata[1]);
    for (int i = 1; i < metadata[2].length(); i  ) {
        group_short  = metadata[2].charAt(i);
    }
    pcbGroup.setGroup_short(group_short);

    // create and return pcbGroup of this metadata
    return pcbGroup;
}  

I'm reading a CSV file with BufferedReader and populate String[] metadata with it. The content of the String[] metadata is [3, "Foo", ML]. The line char group_short = metadata[2].charAt(0); correctly assings 'M' to char group_short. It then turns into ? (space intended) when concatenating it with the second character 'L'.

When I save this object, Hibernate complains about noncorrect String value which appears to be '\xC2\x99'.

Hibernate: insert into pcb_group (group_name, group_short, id) values (?, ?, ?)

2022-11-22 10:41:35.902  WARN 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 1366, SQLState: HY000

2022-11-22 10:41:35.903 ERROR 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : (conn=3030) Incorrect string value: '\xC2\x99' for 
column 'group_short' at row 1

2022-11-22 10:41:35.903  INFO 2988 --- [           main] o.h.e.j.b.internal.AbstractBatchImpl     
: HHH000010: On release of batch it still contained JDBC statements

I'm struggling with this little thing for a few hours now and it's getting on my nerves, could use some help.

CodePudding user response:

This line doesn't do what you seem to think it does:

group_short  = metadata[2].charAt(i);

While this might look like a string concatenation to you, it's not. group_short is of type char meaning it holds a single character*.

What this does is add the codepoint value of the other characters to the one of the first character which doesn't result in anything semantically meaningful for your use case (one could argue that it's a very simple kind of hashing, but it's not even good at that).

What you want to do is have a String (or ideally StringBuilder) variable and do proper concatenation:

String group_short = ""   metadata[2].charAt(0);

// and later in the loop:
group_short  = metadata[2].charAt(i);

Note that with this code group_short will be exactly the same value as metadata[2] at the end of the loop, making all of that code equivalent to (but way less efficient than) group_short = Objects.nonNull(metadata[2]).

* Due to the complexity of Unicode and Java Strings using UTF-16 this is not entirely accurate as multiple char values can be required to make up a single "character" in the "human language" sense, it's close enough for this issue.

  • Related