Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Convert all types of smart quotes with PHP
Smart quotes are special Unicode quotation marks that appear curved or angled, unlike regular straight quotes. In PHP, you can convert these various smart quote types to standard ASCII quotes using character mapping and string replacement ?
Basic Smart Quote Conversion
The following function converts common smart quotes to regular quotes using a character mapping array ?
<?php
function convertSmartQuotes($str) {
$chr_map = array(
// Windows codepage 1252
"\xC2\x82" => "'", // U+0082?U+201A single low-9 quotation mark
"\xC2\x84" => '"', // U+0084?U+201E double low-9 quotation mark
"\xC2\x8B" => "'", // U+008B?U+2039 single left-pointing angle quotation mark
"\xC2\x91" => "'", // U+0091?U+2018 left single quotation mark
"\xC2\x92" => "'", // U+0092?U+2019 right single quotation mark
"\xC2\x93" => '"', // U+0093?U+201C left double quotation mark
"\xC2\x94" => '"', // U+0094?U+201D right double quotation mark
"\xC2\x9B" => "'", // U+009B?U+203A single right-pointing angle quotation mark
// Regular Unicode
"\xC2\xAB" => '"', // U+00AB left-pointing double angle quotation mark
"\xC2\xBB" => '"', // U+00BB right-pointing double angle quotation mark
"\xE2\x80\x98" => "'", // U+2018 left single quotation mark
"\xE2\x80\x99" => "'", // U+2019 right single quotation mark
"\xE2\x80\x9A" => "'", // U+201A single low-9 quotation mark
"\xE2\x80\x9B" => "'", // U+201B single high-reversed-9 quotation mark
"\xE2\x80\x9C" => '"', // U+201C left double quotation mark
"\xE2\x80\x9D" => '"', // U+201D right double quotation mark
"\xE2\x80\x9E" => '"', // U+201E double low-9 quotation mark
"\xE2\x80\x9F" => '"', // U+201F double high-reversed-9 quotation mark
"\xE2\x80\xB9" => "'", // U+2039 single left-pointing angle quotation mark
"\xE2\x80\xBA" => "'", // U+203A single right-pointing angle quotation mark
);
$char_val = array_keys($chr_map);
$rpl = array_values($chr_map);
return str_replace($char_val, $rpl, html_entity_decode($str, ENT_QUOTES, "UTF-8"));
}
// Example usage
$text = ""Hello, this is a 'smart quote' test"";
echo convertSmartQuotes($text);
?>
"Hello, this is a 'smart quote' test"
UTF-8 Validation
If you're unsure whether the input is UTF-8 encoded, validate it first ?
<?php
function validateAndConvert($str) {
// Check if string is valid UTF-8
if (!preg_match('/^\X*$/u', $str)) {
$str = utf8_encode($str);
}
// Apply smart quote conversion
return convertSmartQuotes($str);
}
$text = ""Test with smart quotes"";
echo validateAndConvert($text);
?>
"Test with smart quotes"
Unicode Categories
Smart quotes belong to specific Unicode punctuation categories ?
| Category | Description | Examples |
|---|---|---|
| Ps | Punctuation, Open | " ' « |
| Pe | Punctuation, Close | " ' » |
| Pi | Punctuation, Initial quote | " ' |
| Pf | Punctuation, Final quote | " ' |
Complete Character Normalization
For comprehensive normalization of Windows-1252 characters (0x80-0x9F range), use this extended mapping ?
<?php
function normalizeWindowsChars($str) {
$normalization_map = array(
"\xC2\x80" => "\xE2\x82\xAC", // U+20AC Euro sign
"\xC2\x82" => "\xE2\x80\x9A", // U+201A single low-9 quotation mark
"\xC2\x83" => "\xC6\x92", // U+0192 latin small letter f with hook
"\xC2\x84" => "\xE2\x80\x9E", // U+201E double low-9 quotation mark
"\xC2\x85" => "\xE2\x80\xA6", // U+2026 horizontal ellipsis
"\xC2\x86" => "\xE2\x80\xA0", // U+2020 dagger
"\xC2\x91" => "\xE2\x80\x98", // U+2018 left single quotation mark
"\xC2\x92" => "\xE2\x80\x99", // U+2019 right single quotation mark
"\xC2\x93" => "\xE2\x80\x9C", // U+201C left double quotation mark
"\xC2\x94" => "\xE2\x80\x9D", // U+201D right double quotation mark
// ... additional mappings
);
return str_replace(array_keys($normalization_map), array_values($normalization_map), $str);
}
?>
Conclusion
Converting smart quotes in PHP requires mapping Unicode byte sequences to standard ASCII quotes using str_replace(). Always validate UTF-8 encoding first and consider comprehensive character normalization for Windows-1252 compatibility.
