Quantcast
Channel: MySQL Forums - Character Sets, Collation, Unicode
Viewing all 294 articles
Browse latest View live

MATCH AGAINST (1 reply)

$
0
0
Hi.

Could it be, how can I work for the search StopWords
Match(name) AGAINST('+it +service' IN BOOLEAN MODE)

charset problem (1 reply)

$
0
0
Hello Everybody,

I have a table named 'words' and it contains words. /id long, word char(50)/

I executed a select statement, for search word. The statement doesn't differentiate between two words 'cipó' and 'cipö'

select * from words where word = 'cipö'
The query found the word 'cipó' too. Why?

The charset is utf8 and the collation is utf8_generail_ci.

I tried with this

select * from words where BINARY word = 'cipö'
It works perfectly. But, the HIBERNATE what I use doesn't know this syntax.


Thanks

Question about Double Encoding (1 reply)

$
0
0
Hi, all. I was lucky to find Rick James' post on double encoding. I believe that is the exact problem I'm having. My tables are in UTF8, but the relevant system variables are all in latin1. I'm getting the same symptoms described. I do have two questions, though.

First, I don't full understand how the characters are being translated from latin1 to utf8. Specifically, how:

⚈ latin1 E1 = utf8 C3A1
⚈ latin1 83 = utf8 C692
⚈ latin1 A1 = utf8 C2A1

The first and last ones are correct (a with accent and inverted exclamation, respectively). In UTF8, c6 92 is a "small f with hook" while in latin, 83 is a capital S.

Second, in diagnosing this problem, when we looked at the selected columns through the MySQL command line client, the characters actually looked correct. When we set set names utf8, it then appeared garbled. How was the client able to correctly interpret it?

Thanks for any insight.

change calendar type (1 reply)

$
0
0
Hi everybody,

I want to create a database to store data other than English language, in Persian language, and in Persian(farsi) language we have a different calendar by the name of Shamsi Calendar(persian calendar). Now my question is that how can i change the default Gregorian date to Shamsi date in mysql to store Persian dates.

note:
2011/12/19 = 1390/9/28

Your help and reply will be my pleasure, i appreciate it.
Kind Regards.

Storing & Displaying Chinese characters [at the same time] (1 reply)

$
0
0
Hi,

I'm a newbie in this issue.

I have a form on a web. The web is coded with UTF8 and it has different languages.

I've been able to write with Chinese characters with this form and display this characters afterwards from the database. So the problem is not displaying this characters in my website.

My problem, or what I would like to do is, that when I check the data directly with MySqlAdmin client, Chinese symbols aren't there. They are coded in hexadecimal (I guess). This field in my database is coded in UTF8. (I tried with utf8_general_ci and utf8_unicode_ci and other utf8 and the result is the same)

If I change in mysqladmin the collate character to gb2312 or gb2312_chinese_ci, and then I edit the field and I write chinese characters in it, I can modify the field and I can store it with the chinese charecters. But then, the website doesn't display correctly the chinese field.

I would like to be able to show the chinese characters in the website and store chinese characters in the database at the same time (is it possible??) because I want to export this database afterwards.

I'm using a MySql5.5 + PHP

Thank you so much for your help.

storing hungarian characters (1 reply)

$
0
0
Hi there, I need to store special hungarian accuted characters and no mater how I tried to change the charsets and collation in mysql it inserts/diplays a '?' for some of the characters.
Any help would be much appreciated.

replication from 4.1 to 5.5 and character set change (no replies)

$
0
0
Hello everyone,
in our society we want to migrate our mysql db from an obsolete version 4.1 to a slightly newer version (5.5)

Obviously replication between these two versions is not feasible.
For this we will use the bridge (with engine blackhole)

So the process will be:
4.1 (iso) -> 5.0 (iso) -> 5.1 (utf8) -> 5.5 (utf8).
My question is:
I exclude the ustanza 5.1?
So, I can replicate the 5.5 (utf8) from 5.0 (iso)?
Do you know if there are any contraindications?
We have also some tables with a blob field, stored in latin. How can i replicate the data having the blob content converted into utf8?

Thanks to all
Gaspare

File .SQL set collation LATIN1 to UTF8. (1 reply)

$
0
0
Hello everyone;

I´m using MYSQL-Server 5.5.20 running on Centos; I have develop a script to make backup of my databases but when the SQL file is done and I try to import again I get that the collation is set to latin1 instead of UTF8.

This is the command that I´m using:
mysqldump --skip-set-charset --default-character-set=UTF8 db_masterdb > db-test1-dump.sql

Do you have any idea to help me?

TIA....

UpperCase -> Lowercase (1 reply)

$
0
0
After executing these statements:

SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='TRADITIONAL';

CREATE SCHEMA IF NOT EXISTS `Salma_WTF` DEFAULT CHARACTER SET latin2 COLLATE latin2_general_ci ;
USE `Salma_WTF` ;

-- -----------------------------------------------------
-- Table `Salma_WTF`.`stCountries`atclubfeesatclublocationsatclublocations
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Salma_WTF`.`stCountries` (
`fid` INT NOT NULL AUTO_INCREMENT ,
`fISO3` CHAR(3) NULL ,
`fName` VARCHAR(45) NULL ,
PRIMARY KEY (`fid`) )
ENGINE = InnoDB;


why are then the fieldnames as defined, but the tablename is in lowercase ?

kr/Werner

UTF-8 vs UCS-2 (especially on ndb) (no replies)

$
0
0
I'm in the process of migrating a InnoDB database to ndb, and we're frequently running into the 14000-byte row size limit imposed by ndb. There are quite a number of VARCHAR columns in our DB, and we're using the utf8 character set.
My question is: what (if any) is the benefit of using UTF-8 over UCS-2?
Since VARCHAR needs to allocate memory for the worst-case scenario (i.e., the maximum length) VARCHARs in UTF-8 require 3 times the length, whereas UCS-2 only requires 2 times the length. UTF-8 is optimized for scenarios with mainly one- and two-byte characters, but if the storage mechanism has to assume three bytes anyway, UCS-2 seems to be the better choice.
Am I overlooking something here? It seems like using UTF-8 for VARCHARs is a waste of space (especially problematic for MySQL Cluster with the smaller row memory limit of 14000 bytes).
Any insights?

Latin1 character set driving me mad! (2 replies)

$
0
0
I have a database that I am moving between webhosts. On the prior webhost, this SQL code worked perfectly:

SELECT addresses.address AS address, storedbits.bits AS bits, addresses.id AS id
FROM addresses LEFT JOIN storedbits ON storedbits.addressesid = addresses.id
WHERE address='$address' COLLATE utf8_bin AND block != -1 AND block < $lastconf ORDER BY block, addresses.id, storedbits.bits LIMIT 1

But, after dumping the table as a .sql, and importing it into the new webhost, I get the following error:

Error in query: SELECT addresses.address AS address, storedbits.bits AS bits, addresses.id AS id FROM addresses LEFT JOIN storedbits ON storedbits.addressesid = addresses.id WHERE address='blahblahblah' COLLATE utf8_bin AND block != -1 AND block < 173962 ORDER BY block, addresses.id, storedbits.bits LIMIT 1. COLLATION 'utf8_bin' is not valid for CHARACTER SET 'latin1'

Ok, fine. So I check my tables. They are set to utf8 - default collation. I check my database. It is set to utf8 - default collation. I check my rows. They are set to utf8 - utf8_bin.

If I remove the COLLATE statement, and instead, use BINARY just before the address comparison, then it works - the query runs fine. But it isn't fast enough. It takes 3-4 times as long to run the query as it does using collation on the old host. I don't know if I can attribute this speed difference entirely to query differences, as the hosts could very well have different speed of CPUs available, but I certainly want to be sure I am doing everything I can to make the queries run quickly and efficiently as possible.

So, where are the latin1 characters, if none of my database is set up with them? Why am I getting this error on the new host, but not the old one with the same query?

EDIT: Also, I tried the guide located here: http://docs.moodle.org/22/en/Converting_your_MySQL_database_to_UTF8

I did these steps:
mysqldump -uusername -ppassword -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql
cp dump.sql dump-fixed.sql
vim dump-fixed.sql
:%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/
:%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/
:wq
mysql -uusername -ppassword < dump-fixed.sql

The two find/replace commands in VIM didn't find anything, but I re-imported the dump-fixed.sql anyway. Still comes up with the latin1 problem. I don't understand where/why these latin1 characters are coming from!

WHERE without COLLATE does not work (no replies)

$
0
0
I have a SELECT ... WHERE problem when not using COLLATE.

mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> SET ONE_SHOT collation_connection = utf8_unicode_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT sysname FROM tbl_system WHERE sysname LIKE '%mysql%';
Empty set (0.01 sec)

mysql> SELECT sysname FROM tbl_system WHERE sysname LIKE '%mysql%' COLLATE utf8_unicode_ci;
+-------------------------+
| sysname |
+-------------------------+
| mysqldata01.domain.tld |
| mysqldata02.domain.tld |
| mysqlsrv01.domain.tld |
| mysqlsrv02.domain.tld |
| mysqlsrv03.domain.tld |
| mysqldata03.domain.tld |
+-------------------------+
6 rows in set (0.02 sec)

The server variables are set to utf8:
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE '%coll%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

Also the db/table/colums are utf8:
mysql> SHOW TABLE STATUS WHERE Name = 'tbl_system';
+------------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+------------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| tbl_system | ndbcluster | 10 | Dynamic | 1766 | 60 | 360448 | 0 | 0 | 0 | 1767 | NULL | NULL | NULL | utf8_unicode_ci | NULL | | |
+------------+------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.01 sec)

mysql> SHOW FULL COLUMNS FROM tbl_system WHERE Field='sysname';
+---------+--------------+-----------------+------+-----+---------+-------+----------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+---------+--------------+-----------------+------+-----+---------+-------+----------------------+---------+
| sysname | varchar(255) | utf8_unicode_ci | NO | MUL | NULL | | select,insert,update | |
+---------+--------------+-----------------+------+-----+---------+-------+----------------------+---------+
1 row in set (0.01 sec)

Can I check something more, or is it not possible to use a SELECT without COLLATE in this case? The database runs on a mysql cluster:

mysql> SHOW VARIABLES LIKE '%version%';
+-------------------------+-----------------------------------+
| Variable_name | Value |
+-------------------------+-----------------------------------+
| ndbinfo_version | 459027 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.1.56-ndb-7.1.19-cluster-gpl-log |
| version_comment | MySQL Cluster Server (GPL) |
| version_compile_machine | x86_64 |
| version_compile_os | unknown-linux-gnu |
+-------------------------+-----------------------------------+
7 rows in set (0.00 sec)

saving 'url_encode' with no modifications (1 reply)

$
0
0
hi, i am using php with mysql, and for security reasons, i save all strings as 'urlencode', wich among other things, converts quotes, double quotes (which are string end chars) and basically all that is not a letter or a number into url encode, or a '%' followed by a number. i expect it to save messages written in spanish, so chars like á, ñ, é and others, are parsed by the url encode. there are some scape functions included in php to make a string 'database safe', which works more or less the same (escape end of string chars and other stuff), but most of my code has urlencodes everywhere (so if your answer is 'use those functions, please dont'). when i store the string, urlencoded into the databse, and then retrieve it, i dont find the same symbols i stored, for example, if i store an 'ñ', what i get after i save it and then read it, is a sequence of werid chars, for example, in 'españa', what i get is españa. for a previos project i made a replace function, but i find it like cutting corners, so how do i store my &## and retrieve the same %## i save? somebody told me i should check the definition of the table, what definition should i use? (i use the default engines when declaring tables, since i do not specify a particular engine)
tnx

sort / search / display - columns (1 reply)

$
0
0
Hello guys,

We have a problem with collations and sorting behavior of mysql.

We have in a table "names" names like this:
Müller, Měyer, Śebastian, Ałan, ..

Now we want find the row "Müller" when we search after Mueller, Müller (not on Muller)
We want find the row "Měyer" when we search after Měyer, Meyer
We want find the row "Ałan" when we search after Ałan, Alan

So ä = ae and ě = e and ł = l.... (also on sorting it must be the same)

We also needs "no sorting words".

For example, we put on the name "Wolfgang Johann {von}"
{-signs to say the system that it is a non sorting word.

Our solution:
we have 3 columns with the same name. "name, name_sort, name,search".
name_search is used for boolean fulltext search. (on save we change ü/ä/ö/ě/ł.. to ue/ae/oe/e/l..), name_sort is used for sorting (we do the same + remove the words in {}-signs), name is only the display column(we romve the {- signs on output)

is there a way to do this without having this 3 colums ?
this (and other tables, like titles) have many rows, so we cant transform before a query.

Reading text from a BLOB field (no replies)

$
0
0
I was having trouble with question marks being displayed in the text read back from a blob field in MySql. I found a partial solution when I read this conversation: http://stackoverflow.com/questions/948174/how-do-i-convert-from-blob-to-text-in-mysql Converting to utf8 (aka UTF-8) solved part of the problem but my text truncated when it ran into a special character (the em or long dash). The CONVERT() function is still the answer, but I had to figure out which character set our database was using. Below are the steps I followed to figure this out. I'm new to MySQL, so there are probably better ways to get at this information.
1. SHOW FULL COLUMNS FROM mytable IN mydatabase; -- note the collation on the text fields;
2. SHOW CHARACTER SET; -- find the collation you saw in step one in either the Description or Default collation columns. Copy the value from the Charset column.
3. In your SQL SELECT statement add a field like this: CONVERT(someBlobTextField, USING copiedCharsetValue)
4. Run your query and review your data. Viola! It works.

I also ran across a PHP function called nl2br() that adds the line breaks the user had entered back in.

'þ' Will shows '?' in table when I read it from text file (no replies)

$
0
0
Hello,

I am facing problem with using 'þ' character.
I am reading this character from text file which was generated using "INTO OUTFILE..." now I am trying to insert data from this text file to database table using "LOAD DATA INFILE" After performing this steps In my database table data is not written properly it shows '?' everywhere where 'þ' is.

Can you give me tips what is wrong here ..............

Below is the my current settings for mysql variables
=====================================
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'latin1'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'latin1'
'character_set_system', 'utf8'


'collation_connection', 'utf8_general_ci'
'collation_database', 'latin1_bin'
'collation_server', 'latin1_bin'
=====================================

Thanks
Chintan Patel

How to support full Unicode in MySQL databases (no replies)

Problem setting the "default-character-set" parameter in my.ini file. (no replies)

$
0
0
Hello,
I am trying to change the value for the variable "default-character-set" to latin1 as below in the "my.ini" file:

before change
===============

mysql> show variables like 'character%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.5\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

Now, I made the following change o "my.ini" file:

[mysql]
default-character-set=latin1

Now, after making the above change, when I save the file & reconnect as "root" user & execute the command "show variables like 'character%';" I get the following output:
(I even tried restarting the mysql service after making the changes to the "my.ini" file, but did not get the expected results.)

mysql> show variables like 'character%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.5\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

Query
=====
Shouldn't the values for :
1) character_set_client
2) character_set_connection
3) character_set_server
be = latin1.

When I try doing the same change from command line using the command, it works as expected :

mysql -u root -pmanager --default-character-set=latin1

Now, I exceute the following at mysql prompt:

mysql> show variables like 'character%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.5\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

As per MySQL documentation http://dev.mysql.com/doc/refman/5.5/en/server-options.html#option_mysqld_default-character-set
I changed the variable name to character-set-server, but that did n't help either.

But, If the above link is true then how did it work for the command:
mysql -u root -pmanager --default-character-set=latin1
since --default-character-set is deprecated from MySQl 5.5.3 ?

Default collation used in STRCMP() function (4 replies)

$
0
0
Hello,
What is the default collation used in a STRCMP() function if no collation is specified?
(I know that we can explicitly specify a collation for comparing the 2 strings in STRCMP()).

From where does MySQL gets this value - collation_database, collation_server, collation of the table?


Regards,
Sachin Vyas.

utf8_general_mysql500_ci performance (1 reply)

$
0
0
Hi,

I've been looking but haven't found any performance comparison between utf8_general_mysql500_ci and utf8_general_ci in terms of performance.

I know utf8_general_mysql500_ci is supposed to behave the same as utf8_general_ci pre v5.1.24 in terms of ordering but what about performance? will it be the same?

Thanks.
- Jose -
Viewing all 294 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>