[Tapestry5] upgrading to 5.3:Invalid byte 3 of 3-byte UTF-8 sequence

hxzon 2011-12-05
After upgrading to 5.3-rc-1, all my templates are not parsed successfully. The error message points to the line where there are Chinese words.

Stack Trace below:
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 3 of 3-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:432)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2792)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
        at org.apache.tapestry5.internal.services.XMLTokenStream.parse(XMLTokenStream.java:306)
        at org.apache.tapestry5.internal.services.SaxTemplateParser.parse(SaxTemplateParser.java:163)
        ... 85 more
解决方法:
This means that the byte stream is converted from the system encoding to UTF-8 and then back. During this transformation a number of characters are wasted. In my case the solution would be the following:
337: InputStreamReader rawReader = new InputStreamReader(rawStream, "UTF-8"); // used UTF-8 charset
341: PrintWriter writer = new PrintWriter(new OutputStreamWriter(bos, "UTF-8")); // used UTF-8 charset
参考:
https://issues.apache.org/jira/browse/TAP5-1741
Global site tag (gtag.js) - Google Analytics