Search:

Java Forum

Ask Question UnAnswered

Home » Forum » Java

RSS Feeds

converting farsi charset

Asked By: Caleb Date: Jul 26 Category: Java Views: 2543

i want to convert farsi characters from iso-8859-1 to utf-8.
please help if you know something about it.

4 Answers Found

Answer #1 Answered By: Tara Ryan Answered On: Jul 26

Is your ISO-8859-1 content stored in a file? If so, try this:


FileInputStream fis = new FileInputStream("iso8859fileName");
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");

FileOutputStream fos = new FileOutputStream("utf8fileName");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");

Answer #2 Answered By: Sam Anderson Answered On: Jul 26

tnx for your help but my ISO-8859-1 content stored in oracle 9i
database and i want to use it in my jsp web pages.

Answer #3 Answered By: Mehreen Malik Answered On: Jul 26

String Class --> http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html
useful methods for ur request :
public byte[] getBytes(String enc)
throws UnsupportedEncodingException
Convert this String into bytes according to the specified character encoding, storing the result into a new byte array.
Parameters:
enc - The name of a supported character encoding
Returns:
The resultant byte array
Throws:
UnsupportedEncodingException - If the named encoding is not supported
Since:
JDK1.1
-------------------------------------------------
public String(byte[] bytes,
String enc)
throws UnsupportedEncodingException
Construct a new String by converting the specified array of bytes using the specified character encoding. The length of the new String is a function of the encoding, and hence may not be equal to the length of the byte array.
Parameters:
bytes - The bytes to be converted into characters
enc - The name of a supported character encoding
Throws:
UnsupportedEncodingException - If the named encoding is not supported
Since:
JDK1.1


http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc
Character Encodings
Various constructors and methods in the java.lang and java.io packages accept string arguments that specify the character encoding to be used when converting between raw eight-bit bytes and sixteen-bit Unicode characters. Such encodings are named by strings composed of the following characters:
The uppercase letters 'A' through 'Z' ('\u0041' through '\u005a'),
The lowercase letters 'a' through 'z' ('\u0061' through '\u007a'),
The digits '0' through '9' ('\u0030' through '\u0039'),
The dash character '-' ('\u002d', HYPHEN-MINUS),
The colon character ':' ('\u003a', COLON), and
The underscore character '_' ('\u005f', LOW LINE).

An encoding name must begin with either a letter or a digit. The empty string is not a legal encoding name.
An encoding may have more than one name. One of an encoding's names is considered to be its canonical name. The canonical name of an encoding is the name returned by the getEncoding methods of the InputStreamReader and OutputStreamWriter classes.

Encoding names generally follow the conventions documented in RFC2278: IANA Charset Registration Procedures. If an encoding listed in the IANA Charset Registry is supported by an implementation of the Java platform then one of its names must be the name listed in the registry. Many encodings are given more than one name in the registry, in which case the registry identifies one of the names as MIME-preferred. An implementation of the Java platform must support the MIME-preferred registry name for a supported encoding if there is one; for convenience it may additionally support other registry names. The IANA MIME-preferred name of an encoding, if there is one, is often, but not necessarily, its canonical name. Following IANA convention, the mapping from IANA registry names to encodings is not case-sensitive.

Every implementation of the Java platform is required to support the following character encodings. Consult the release documentation for your implementation to see if any other encodings are supported.

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit Unicode Transformation Format
UTF-16BE Sixteen-bit Unicode Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit Unicode Transformation Format, little-endian byte order
UTF-16 Sixteen-bit Unicode Transformation Format, byte order specified by a mandatory initial byte-order mark (either order accepted on input, big-endian used on output)
The various Unicode Transformation Formats are described in detail in The Unicode Standard and in the Unicode FAQ.
Every instance of the Java virtual machine has a default character encoding. The default encoding is determined during virtual-machine startup and typically depends upon the locale and encoding being used by the underlying operating system.

Answer #4 Answered By: Daya Sharma Answered On: Jul 26

you can convert your iso-8859-1 String in your actionForm with this command:


name = new String(name.getBytes("iso-8859-1"), "utf-8") ;

Didn't find what you were looking for? Find more on converting farsi charset Or get search suggestion and latest updates.

Your Answer

Please login to post answer

Tagged:converting farsi charset

Previous Post:

Struts & Hibernate

Next Post:

Servlet.init() & NoClassDefFoundError

RSS Feeds:	Articles \| Forum \| New Users \| Activities \| Interview FAQ \| Poll \| Hotlinks
Social Networking:	Hall of Fame \| Facebook \| Twitter \| LinkedIn
Terms:	Terms of Use \| Privacy Policy \| Contact us

Java Forum

converting farsi charset

4 Answers Found

Your Answer

Related Post