找到你要的答案

Q:Detect encoding in PHP without multibyte extension?

Q:在PHP中没有检测编码字节扩展?

Is there a way to detect the encoding of a string in PHP without having the mbstring extension loaded? I know it is possible to do so with mb_detect_encoding(), but is there an equivalent, non-multibyte function?

If not, what would it take to implement a detect_encoding() function that would at least detect UTF-8?

有没有一种方法来检测一个PHP字符串的编码没有mbstring扩展加载?我知道它是可以这样做,mb_detect_encoding(),但有一个等价的,非多字节函数?

如果不是,怎样才能实现detect_encoding()功能至少可以检测UTF-8?

answer1: 回答1:

Strings in PHP are just byte sequences, they carry no encoding information with them. mb_detect_encoding doesn't actually detect the string's encoding, it tries to make an educated guess by running the byte sequence against a series of identification functions, one per encoding (by default those given by mb_detect_order), and returns the first one in which the sequence matches. These functions are very basic and don't even exist for many popular encodings.

There is no way, with or without the mbstring extension, to ascertain the encoding of a string - only to maybe rule some out, which you could only do if the string happens to contain byte sequences that would be invalid in those particular encodings.

You will never know whether "\xC2\xA4" is supposed to be the UTF-8 ¤ or ISO-8859-1 ¤ just by looking at it - because they're the exact same bytes.

For more information see: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

PHP是字符串的字节序列,它们没有编码的信息与他们。mb_detect_encoding实际上并不检测字符串的编码,它试图通过运行字节序列对一系列识别功能使一个受过教育的猜测,每一个编码(默认情况下给出mb_detect_order),并返回第一个匹配的序列。这些功能都是很基本的,甚至不为许多流行的编码存在。

没有办法,有或没有mbstring延伸,确定编码的字符串只可能排除一些,你可以只做如果字符串包含的字节序列,会发生在那些特定的编码无效。

你永远不知道“XC2 \ Xa4”应该是UTF-8¤或ISO-8859-1¤只是看着它-因为他们完全相同的字节。

更多信息见:最低限度每个软件开发者绝对,一定要了解Unicode字符集

answer2: 回答2:

There's always iconv, which is generally enabled in PHP by default

<pre>
<?php
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "ISO-8859-1");
var_dump(iconv_get_encoding('all'));
?>
</pre>

总是有iconv,通常在默认情况下启用PHP

<pre>
<?php
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "ISO-8859-1");
var_dump(iconv_get_encoding('all'));
?>
</pre>
php  encoding  utf-8  multibyte