I have been attempting to save data to a MySQL db and getting a very puzzling error.
Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'alternatenames'
A bit of searching lead me to believe that this was entirely down to unicode characters, but it is more complex than that.
But first, ensure that your connection string has:
${database.url}?amp;useUnicode=true&characterEncoding=UTF-8"
Then ensure that the columns to store the characters are:
- CHARACTER SET utf8
or
- CHARACTER SET utf8mb4
Finally check the way your characters are represented as you could be attempting to store UTF-16 "surrogate pairs". The term "surrogate pair" refers to a means of encoding Unicode characters with high code-points in the UTF-16 encoding scheme. I wasn't able to get UTF-16 to work with Hibernate so I have resorted to stripping this data.
The following functions assisted me in cleansing the data:
/**
* I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
* http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
* @param query
* @return
* @see #setAlternatenames(String) and {@link #setAlternatenamesSafe(String)}
*/
public static String removeSurrogatesFromCSV(String query)
{
String parts[] = query.split(",");
String sep = "";
StringBuilder sb = new StringBuilder();
for (String part:parts)
{
sb.append(sep);
part = StringUtils.trimToEmpty(removeSurrogates(part,false));
if (!part.isEmpty())
{
sb.append(part);
sep=",";
}
}
return sb.toString();
}
/**
* I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
* http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
* @param query
* @return
*/
public static String removeSurrogates(String query)
{
return removeSurrogates(query,false);
}
/**
* I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
* http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
* @param query
* @param preserveLength if true then the surrogates are turned into '[]'.
* @return
*/
public static String removeSurrogates(String query,boolean preserveLength)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char ch = query.charAt(i);
if (Character.isHighSurrogate(ch))
{
if (preserveLength) sb.append("[]");
i++;//skip the next char is it's supposed to be low surrogate
}
else
sb.append(ch);
}
return sb.toString();
}
Hope this helps!
See: