Home > Back-end >  Java: Concatenating chars with String.charAt() and operator turns them into encoded UTF-8
Java: Concatenating chars with String.charAt() and operator turns them into encoded UTF-8

Time:11-22

I'm having trouble properly concatenating characters out of a String[] using String.charAt() and operator with the following method.

private PcbGroup createPcbGroup(String[] metadata, PcbGroup pcbGroup) {
    char group_short = metadata[2].charAt(0);
    
    pcbGroup.setId(Integer.parseInt(metadata[0]));
    pcbGroup.setGroup_name(metadata[1]);
    for (int i = 1; i < metadata[2].length(); i  ) {
        group_short  = metadata[2].charAt(i);
    }
    pcbGroup.setGroup_short(group_short);

    // create and return pcbGroup of this metadata
    return pcbGroup;
}  

I'm reading a CSV file with BufferedReader and populate String[] metadata with it. The content of the String[] metadata is [3, "Foo", ML]. The line char group_short = metadata[2].charAt(0); correctly assings 'M' to char group_short. It then turns into ? (space intended) when concatenating it with the second character 'L'.

When i save this object, Hibernate complains about a incorrect String value which appears to be '\xC2\x99'. So first ML turned into ? and got interpreted by Hibernate as '\xC2\x99'.

Hibernate: insert into pcb_group (group_name, group_short, id) values (?, ?, ?)

2022-11-22 10:41:35.902  WARN 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 1366, SQLState: HY000

2022-11-22 10:41:35.903 ERROR 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : (conn=3030) Incorrect string value: '\xC2\x99' for 
column 'group_short' at row 1

2022-11-22 10:41:35.903  INFO 2988 --- [           main] o.h.e.j.b.internal.AbstractBatchImpl     
: HHH000010: On release of batch it still contained JDBC statements

I'm struggling with this little thing for a few hours now and it's getting on my nerves, could use some help.

CodePudding user response:

This line doesn't do what you seem to think it does:

group_short  = metadata[2].charAt(i);

While this might look like a string concatenation to you, it's not. group_short is of type char meaning it holds a single character*.

What this does is add the codepoint value of the other characters to the one of the first character which doesn't result in anything semantically meaningful for your use case (one could argue that it's a very simple kind of hashing, but it's not even good at that).

What you want to do is have a String (or ideally StringBuilder) variable and do proper concatenation:

String group_short = ""   metadata[2].charAt(0);

// and later in the loop:
group_short  = metadata[2].charAt(i);

* Due to the complexity of Unicode and Java Strings using UTF-16 this is not entirely accurate as multiple char values can be required to make up a single "character" in the "human language" sense, it's close enough for this issue.

  • Related