climateprediction.net (CPDN) home page
Thread 'teamname - encoding problem'

Thread 'teamname - encoding problem'

Questions and Answers : Getting started : teamname - encoding problem
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user423803

Send message
Joined: 15 Nov 06
Posts: 3
Credit: 52,258
RAC: 0
Message 27691 - Posted: 2 Apr 2007, 16:30:01 UTC

Hello,

my team has a problem encoding the name \"Universität der Bundeswehr München\"

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624.

Within the Boincstats the name is \"Universität der Bundeswehr München\"

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

But it should belong to the right team:
http://www.boincstats.com/stats/boinc_team_graph.php?pr=bo&id=27493

With QMC my team had the same problem. Only an administrator of QMC was able to solve this.

Can anybody help?

ID: 27691 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 27692 - Posted: 2 Apr 2007, 17:08:10 UTC
Last modified: 2 Apr 2007, 17:08:26 UTC

For ä have you tried both ALT + 132 on the number keypad, and ALT + 0228?

For ü try both ALT + 129 and ALT + 0252.

Let us know whether one of those combinations works. (You also have to press the Number Lock key.)
Cpdn news
ID: 27692 · Report as offensive     Reply Quote
old_user423803

Send message
Joined: 15 Nov 06
Posts: 3
Credit: 52,258
RAC: 0
Message 27831 - Posted: 12 Apr 2007, 5:41:41 UTC - in response to Message 27692.  

I tried these combinations, with the result that the teamname was written in an other way but not the correct one.

Can you help?

Thank you
ID: 27831 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 27858 - Posted: 13 Apr 2007, 1:23:58 UTC
Last modified: 13 Apr 2007, 1:25:18 UTC

Sorry about that. On this page I see it correctly:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624

Try this instead.

*Windows Start menu
*Programs
*Accessories
*System Tools
*Character map

(or you may need to click Start, Run, then type Charmap)

In the character map, you then need to

*choose a font
*double-click the character you want
*click Copy
*return to your document
*click Paste


If that doesn\'t work, I\'ll ask JohnofWem or Richard Rodway to come and make suggestions. I think they\'ll know what to do.

My team description doesn\'t display correctly either. (I didn\'t write it.) The word should be món, but ó shows as a square.
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=201


Cpdn news
ID: 27858 · Report as offensive     Reply Quote
old_user423803

Send message
Joined: 15 Nov 06
Posts: 3
Credit: 52,258
RAC: 0
Message 27931 - Posted: 16 Apr 2007, 19:52:20 UTC - in response to Message 27858.  

I tried this, but the problem is still the same. Within Climate Prediction the name is still correct, but within the Boincstats it is not correct.

It is still written in the wrong way.
You can see it here: http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

But it should belong to the right team:
http://www.boincstats.com/stats/boinc_team_graph.php?pr=bo&id=27493

For us it does not matter in which way the team-name is written within Climate-Prediction. For us it is important that the results of Climate Prediction belong to the right team within the boincstats.

Thank you for your help.
ID: 27931 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 27940 - Posted: 17 Apr 2007, 10:39:42 UTC

Here\'s a live link where the problem is obvious

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

I\'ll see if I can get John and/or Richard to advise.
Cpdn news
ID: 27940 · Report as offensive     Reply Quote
ProfileJohnofWem
Avatar

Send message
Joined: 15 Feb 06
Posts: 16
Credit: 7,232,179
RAC: 9,636
Message 27943 - Posted: 17 Apr 2007, 13:13:08 UTC - in response to Message 27940.  
Last modified: 17 Apr 2007, 13:13:55 UTC

Here\'s a live link where the problem is obvious

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

I\'ll see if I can get John and/or Richard to advise.


Hmm. Looks like a misinterpretation of the unicode characters between the Windows fonts and international universal fonts. There is often more than one way of getting the same symbol even in the same font set. They look the same to the viewer but the underlying string of bytes representing the characters is different and often, as in this case, there are two bytes for a single non-Latin character, the first byte indicating where to start counting in the table of more than 256 characters. All standard ASCII characters are numbered between 32 and 127 with extended ones up to 255. Some of these extended ones are accented letters but these could be repeated later in the table, often more than once if they are used in different language sets; this is probably where the problem arises.

I have had this problem, especially with German characters but I get round it by always using the same method for any string input output or matches and ensuring that Windows is always using the same international settings and fonts for a particular country-based database. Obviously you can\'t do this here as you don\'t know what international setting is used for the string matching or the string input/output. The real problem is that Microsoft is American English based at its core. For British users this only means spelling some words like colour and programme differently but for languages with accented or even different characters this can be much worse.


Sorry if this isn\'t much help but it may at least explain the problemThis might help or try googling unicode for more information.
ID: 27943 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 27945 - Posted: 17 Apr 2007, 18:46:08 UTC

Following what John said, I wonder whether using this would help?

http://www.atm.ox.ac.uk/user/iwi/charmap.html
Cpdn news
ID: 27945 · Report as offensive     Reply Quote
old_user221094

Send message
Joined: 19 Jan 07
Posts: 9
Credit: 2,233,821
RAC: 0
Message 27947 - Posted: 17 Apr 2007, 19:07:57 UTC
Last modified: 17 Apr 2007, 19:14:36 UTC

You\'re sending UTF-8 (A unicode encoding) to the Boincstats site, but that site is expecting (probably) CP1252 (possibly ISO 8859-1). Solution, EITHER whatever is sending the data to Boincstats needs to send what that site expects (try CP1252), OR (better) Boincstats should support Unicode (UTF-8)

There\'s a Japanese term for this... mojibake :)

--Richard

<edit> Just verified. Boincstats is serving the page up encoded as ISO8859-1. And stuffing UTF-8 into it. A bit naughty!. They should change the content=\"text/html; charset=iso-8859-1\"> at the top of their served pages to content=\"text/html; charset=utf-8\">
ID: 27947 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 27952 - Posted: 17 Apr 2007, 19:41:33 UTC
Last modified: 17 Apr 2007, 19:42:46 UTC

Hi Richard

Does that mean there\'s no real list of code numbers that members can use to make their team names display properly on these boincstats pages?

If you confirm that this is the case, I\'ll post about the problem on the boinc_dev forum.
Cpdn news
ID: 27952 · Report as offensive     Reply Quote
old_user221094

Send message
Joined: 19 Jan 07
Posts: 9
Credit: 2,233,821
RAC: 0
Message 27958 - Posted: 17 Apr 2007, 21:54:31 UTC

Sorry about the delay in reply, had to get my daughter to bed.

I wouldn\'t think so. It\'s definitely UTF-8 that\'s appearing on the boincstats pages and it looks like the correct (2 byte) UTF-8 sequences are being used. Unfortunately the page is being served as an ISO8859-1 page and as a result the 2 byte sequence is not being interpreted as one character, but as two.

I notice that the climateprediction page for that team is also a 8859-1 encoded page, but in this case the correct code values are being used. \'ä\' is encoded as the single byte 0xE4 in 8859-1 and this is being used on the cpdn pages.

I don\'t know how the team name is getting propagated to the boincstats servers, but something in the way has translated that to UTF-8. The encoding for \'ä\' in UTF-8 is the 2 byte sequence 0xC3 0xA4. However if you read that as 8859-1 then instead of translating that sequence into the one character U+00E4 (ä) it gets viewed as the 2 8859-1 characters 0xC3 and 0xA4. 0xC3 is a Ã, 0xA4 is a ¤

To fix the problem you need to make sure that whatever is sending the team names to boincstats is doing so in an encoding that boincstats understands. There\'s nothing at all wrong with UTF-8, and my preferred solution is for boincstats to use UTF-8 in its webpages. Not only would this fix this problem, it\'d also allow teams (and names) to use any character. Such as Japanese or Korean characters... Which is quite impossible in 8859-1, there\'s only 256 characters in that characterset, as opposed to about 1.1 million in Unicode... (although I think only about 150,000 are currently in use)

--Richard
ID: 27958 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 27959 - Posted: 17 Apr 2007, 22:17:37 UTC

There are other stats sites, so I guess that a check on how they\'re handling this is also needed. It may just be BOINCstats.

ID: 27959 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 27981 - Posted: 18 Apr 2007, 20:53:56 UTC

With a lot of help from Richard, I\'ve now posted about this problem on the boinc_dev forum

http://boinc.berkeley.edu/dev/forum_thread.php?id=1734
Cpdn news
ID: 27981 · Report as offensive     Reply Quote
[BOINCstats] Willy

Send message
Joined: 12 Aug 04
Posts: 36
Credit: 488,399
RAC: 0
Message 28078 - Posted: 23 Apr 2007, 20:50:12 UTC
Last modified: 23 Apr 2007, 20:50:31 UTC

I think the problem is in the way CPDN is exporting stats.

In the XML file are these lines:
<team>
 <id>5624</id>
 <type>6</type>
 <name>Universit&#239;&#191;&#189;t der Bundeswehr M&#239;&#191;&#189;nchen</name>
....


Notice the HTML codes. When you put the team name in a html file and view it in a browser it translates to the wrong characters seen on BOINCstats.

The wrong characters are also seen on other stats sites.
BOINCstats
ID: 28078 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28079 - Posted: 23 Apr 2007, 23:40:01 UTC
Last modified: 23 Apr 2007, 23:48:27 UTC

Hi Willy

What I don\'t understand is why the combinations that the member types for ä and ü, which must be two different combinations, both translate to the same string &#239;&#191;&#189;. This looks like a list of 3 items.

Richard Rodway and I submitted this problem as boinc Trac ticket #57

http://boinc.ssl.berkeley.edu/trac/query

We thought this was a boinc problem rather than a defect in the cpdn (and other project) software. I think I\'d better ask Milo in Oxford to have a look at this thread.


Cpdn news
ID: 28079 · Report as offensive     Reply Quote
old_user221094

Send message
Joined: 19 Jan 07
Posts: 9
Credit: 2,233,821
RAC: 0
Message 28712 - Posted: 15 May 2007, 15:14:57 UTC

Fossilised reply, but just for interest. That encoded sequence in the XML is the UTF-8 encoding of the Unicode U+FFFD, which is the \'replacement\' character. It\'s used when you are trying to convert something to Unicode and that conversion failed. So in otherwords, whatever is generating that XML is trying to translate the a umlaut and u umlaut to UTF-8 and failing (maybe because it\'s assuming ASCII source or something?)
However this doesn\'t explain what actually ended up in BOINCstats. Somehow the \'real\' data got through to it, otherwise we\'d have seen � in the team name on the pages, not ä (for the a umlaut)

As a matter of interest I had a look through some Japanese team names. Most just use English names (probably because they worked out that Japanese names didn\'t work :)) I didn\'t find any with correctly displaying Japanese names, I did find some with names displaying the same symptoms as we see here (UTF-8 displayed as 8859-1)

All of this is too late to be of any interest I suspect, I\'ve been way way too busy recently.

--Richard
ID: 28712 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28713 - Posted: 15 May 2007, 19:29:56 UTC
Last modified: 15 May 2007, 19:38:13 UTC

Well, it looks as if we wasted our time submitting the problem to the wrong people/place. And Milo didn\'t get an answer to the query he added either. Here\'s the fate of our ticket - wontfix.

https://boinc.berkeley.edu/trac/ticket/57

Does anyone know who might be willing and able to fix this defect? I\'ve reopened the ticket to ask.
Cpdn news
ID: 28713 · Report as offensive     Reply Quote

Questions and Answers : Getting started : teamname - encoding problem

©2024 cpdn.org