Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
PHP – How to detect character encoding using mb_detect_encoding()
In PHP, mb_detect_encoding() is used to detect the character encoding of a string from an ordered list of candidates. This function is particularly useful when working with multibyte encodings where not all byte sequences form valid strings. If the input contains invalid sequences for a particular encoding, that encoding is rejected and the next one is tested.
Syntax
string mb_detect_encoding(string $string, array|string|null $encoding_list = null, bool $strict = false)
Note: Character encoding detection is not entirely reliable without additional context. It's similar to decoding encrypted data without a key. A Content-Type HTTP header can provide hints about the encoding used.
Parameters
The mb_detect_encoding() function accepts three parameters −
- $string − The string being examined for character encoding.
- $encoding_list − A list of character encodings to try in order. Can be an array of strings or a comma-separated string. If omitted or null, uses the current detect_order set by mbstring.detect_order or mb_detect_order().
- $strict − Controls behavior when the string is invalid in all listed encodings. If false, returns the closest matching encoding. If true, returns false.
Return Value
Returns the detected character encoding as a string, or false if the string is not valid in any of the listed encodings.
Basic Detection Example
<?php $string = "Hello World"; // Detect with current detect_order echo "Default detection: " . mb_detect_encoding($string) . "
"; // Use "auto" (expanded according to mbstring.language) echo "Auto detection: " . mb_detect_encoding($string, "auto") . "
"; // Specify specific encodings echo "Custom list: " . mb_detect_encoding($string, "UTF-8,ASCII,ISO-8859-1") . "
"; // Use array format $encodings = ["ASCII", "UTF-8", "ISO-8859-1"]; echo "Array format: " . mb_detect_encoding($string, $encodings) . "
"; ?>
Default detection: ASCII Auto detection: ASCII Custom list: ASCII Array format: ASCII
Strict Mode Example
<?php // String with Latin-1 encoded characters $string = "\xE1\xE9\xF3\xFA"; // áéóú in ISO-8859-1 // Non-strict mode - returns closest match $result1 = mb_detect_encoding($string, ['ASCII', 'UTF-8'], false); echo "Non-strict: " . ($result1 ?: 'false') . "
"; // Strict mode - returns false if no exact match $result2 = mb_detect_encoding($string, ['ASCII', 'UTF-8'], true); echo "Strict: " . ($result2 ?: 'false') . "
"; // Including correct encoding works in both modes $result3 = mb_detect_encoding($string, ['ASCII', 'UTF-8', 'ISO-8859-1'], false); echo "With correct encoding (non-strict): " . $result3 . "
"; $result4 = mb_detect_encoding($string, ['ASCII', 'UTF-8', 'ISO-8859-1'], true); echo "With correct encoding (strict): " . $result4 . "
"; ?>
Non-strict: UTF-8 Strict: false With correct encoding (non-strict): ISO-8859-1 With correct encoding (strict): ISO-8859-1
Conclusion
The mb_detect_encoding() function is essential for handling strings with unknown character encodings. Use strict mode when you need exact encoding matches, and always provide a comprehensive list of possible encodings for better detection accuracy.
