Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'alternatenames'A bit of searching lead me to believe that this was entirely down to unicode characters, but it is more complex than that.
But first, ensure that your connection string has:
${database.url}?amp;useUnicode=true&characterEncoding=UTF-8"Then ensure that the columns to store the characters are:
- CHARACTER SET utf8
or - CHARACTER SET utf8mb4
Finally check the way your characters are represented as you could be attempting to store UTF-16 "surrogate pairs". The term "surrogate pair" refers to a means of encoding Unicode characters with high code-points in the UTF-16 encoding scheme. I wasn't able to get UTF-16 to work with Hibernate so I have resorted to stripping this data.
The following functions assisted me in cleansing the data:
/**
* I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
* http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
* @param query
* @return
* @see #setAlternatenames(String) and {@link #setAlternatenamesSafe(String)}
*/
public static String removeSurrogatesFromCSV(String query)
{
String parts[] = query.split(",");
String sep = "";
StringBuilder sb = new StringBuilder();
for (String part:parts)
{
sb.append(sep);
part = StringUtils.trimToEmpty(removeSurrogates(part,false));
if (!part.isEmpty())
{
sb.append(part);
sep=",";
}
}
return sb.toString();
}
/**
* I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
* http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
* @param query
* @return
*/
public static String removeSurrogates(String query)
{
return removeSurrogates(query,false);
}
/**
* I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
* http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
* @param query
* @param preserveLength if true then the surrogates are turned into '[]'.
* @return
*/
public static String removeSurrogates(String query,boolean preserveLength)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char ch = query.charAt(i);
if (Character.isHighSurrogate(ch))
{
if (preserveLength) sb.append("[]");
i++;//skip the next char is it's supposed to be low surrogate
}
else
sb.append(ch);
}
return sb.toString();
}
Hope this helps!
See:
- Test Text "walmart obama 👽💔"
- http://rishida.net/tools/conversion/
- http://mathiasbynens.be/notes/mysql-utf8mb4
- http://stackoverflow.com/questions/13653712/java-sql-sqlexception-incorrect-string-value-xf0-x9f-x91-xbd-xf0-x9f
- http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
No comments:
Post a Comment