Questions and Answers : Getting started : teamname - encoding problem
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Nov 06 Posts: 3 Credit: 52,258 RAC: 0 |
Hello, my team has a problem encoding the name \"Universität der Bundeswehr München\" http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624. Within the Boincstats the name is \"Universität der Bundeswehr München\" http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624 But it should belong to the right team: http://www.boincstats.com/stats/boinc_team_graph.php?pr=bo&id=27493 With QMC my team had the same problem. Only an administrator of QMC was able to solve this. Can anybody help? |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
For ä have you tried both ALT + 132 on the number keypad, and ALT + 0228? For ü try both ALT + 129 and ALT + 0252. Let us know whether one of those combinations works. (You also have to press the Number Lock key.) Cpdn news |
Send message Joined: 15 Nov 06 Posts: 3 Credit: 52,258 RAC: 0 |
I tried these combinations, with the result that the teamname was written in an other way but not the correct one. Can you help? Thank you |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Sorry about that. On this page I see it correctly: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624 Try this instead. *Windows Start menu *Programs *Accessories *System Tools *Character map (or you may need to click Start, Run, then type Charmap) In the character map, you then need to *choose a font *double-click the character you want *click Copy *return to your document *click Paste If that doesn\'t work, I\'ll ask JohnofWem or Richard Rodway to come and make suggestions. I think they\'ll know what to do. My team description doesn\'t display correctly either. (I didn\'t write it.) The word should be món, but ó shows as a square. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=201 Cpdn news |
Send message Joined: 15 Nov 06 Posts: 3 Credit: 52,258 RAC: 0 |
I tried this, but the problem is still the same. Within Climate Prediction the name is still correct, but within the Boincstats it is not correct. It is still written in the wrong way. You can see it here: http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624 But it should belong to the right team: http://www.boincstats.com/stats/boinc_team_graph.php?pr=bo&id=27493 For us it does not matter in which way the team-name is written within Climate-Prediction. For us it is important that the results of Climate Prediction belong to the right team within the boincstats. Thank you for your help. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Here\'s a live link where the problem is obvious http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624 I\'ll see if I can get John and/or Richard to advise. Cpdn news |
Send message Joined: 15 Feb 06 Posts: 16 Credit: 7,232,179 RAC: 9,636 |
Here\'s a live link where the problem is obvious Hmm. Looks like a misinterpretation of the unicode characters between the Windows fonts and international universal fonts. There is often more than one way of getting the same symbol even in the same font set. They look the same to the viewer but the underlying string of bytes representing the characters is different and often, as in this case, there are two bytes for a single non-Latin character, the first byte indicating where to start counting in the table of more than 256 characters. All standard ASCII characters are numbered between 32 and 127 with extended ones up to 255. Some of these extended ones are accented letters but these could be repeated later in the table, often more than once if they are used in different language sets; this is probably where the problem arises. I have had this problem, especially with German characters but I get round it by always using the same method for any string input output or matches and ensuring that Windows is always using the same international settings and fonts for a particular country-based database. Obviously you can\'t do this here as you don\'t know what international setting is used for the string matching or the string input/output. The real problem is that Microsoft is American English based at its core. For British users this only means spelling some words like colour and programme differently but for languages with accented or even different characters this can be much worse. Sorry if this isn\'t much help but it may at least explain the problemThis might help or try googling unicode for more information. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Following what John said, I wonder whether using this would help? http://www.atm.ox.ac.uk/user/iwi/charmap.html Cpdn news |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
You\'re sending UTF-8 (A unicode encoding) to the Boincstats site, but that site is expecting (probably) CP1252 (possibly ISO 8859-1). Solution, EITHER whatever is sending the data to Boincstats needs to send what that site expects (try CP1252), OR (better) Boincstats should support Unicode (UTF-8) There\'s a Japanese term for this... mojibake :) --Richard <edit> Just verified. Boincstats is serving the page up encoded as ISO8859-1. And stuffing UTF-8 into it. A bit naughty!. They should change the content=\"text/html; charset=iso-8859-1\"> at the top of their served pages to content=\"text/html; charset=utf-8\"> |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Richard Does that mean there\'s no real list of code numbers that members can use to make their team names display properly on these boincstats pages? If you confirm that this is the case, I\'ll post about the problem on the boinc_dev forum. Cpdn news |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
Sorry about the delay in reply, had to get my daughter to bed. I wouldn\'t think so. It\'s definitely UTF-8 that\'s appearing on the boincstats pages and it looks like the correct (2 byte) UTF-8 sequences are being used. Unfortunately the page is being served as an ISO8859-1 page and as a result the 2 byte sequence is not being interpreted as one character, but as two. I notice that the climateprediction page for that team is also a 8859-1 encoded page, but in this case the correct code values are being used. \'ä\' is encoded as the single byte 0xE4 in 8859-1 and this is being used on the cpdn pages. I don\'t know how the team name is getting propagated to the boincstats servers, but something in the way has translated that to UTF-8. The encoding for \'ä\' in UTF-8 is the 2 byte sequence 0xC3 0xA4. However if you read that as 8859-1 then instead of translating that sequence into the one character U+00E4 (ä) it gets viewed as the 2 8859-1 characters 0xC3 and 0xA4. 0xC3 is a Ã, 0xA4 is a ¤ To fix the problem you need to make sure that whatever is sending the team names to boincstats is doing so in an encoding that boincstats understands. There\'s nothing at all wrong with UTF-8, and my preferred solution is for boincstats to use UTF-8 in its webpages. Not only would this fix this problem, it\'d also allow teams (and names) to use any character. Such as Japanese or Korean characters... Which is quite impossible in 8859-1, there\'s only 256 characters in that characterset, as opposed to about 1.1 million in Unicode... (although I think only about 150,000 are currently in use) --Richard |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There are other stats sites, so I guess that a check on how they\'re handling this is also needed. It may just be BOINCstats. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
With a lot of help from Richard, I\'ve now posted about this problem on the boinc_dev forum http://boinc.berkeley.edu/dev/forum_thread.php?id=1734 Cpdn news |
Send message Joined: 12 Aug 04 Posts: 36 Credit: 488,399 RAC: 0 |
I think the problem is in the way CPDN is exporting stats. In the XML file are these lines: <team> <id>5624</id> <type>6</type> <name>Universit�t der Bundeswehr M�nchen</name> .... Notice the HTML codes. When you put the team name in a html file and view it in a browser it translates to the wrong characters seen on BOINCstats. The wrong characters are also seen on other stats sites. BOINCstats |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Willy What I don\'t understand is why the combinations that the member types for ä and ü, which must be two different combinations, both translate to the same string �. This looks like a list of 3 items. Richard Rodway and I submitted this problem as boinc Trac ticket #57 http://boinc.ssl.berkeley.edu/trac/query We thought this was a boinc problem rather than a defect in the cpdn (and other project) software. I think I\'d better ask Milo in Oxford to have a look at this thread. Cpdn news |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
Fossilised reply, but just for interest. That encoded sequence in the XML is the UTF-8 encoding of the Unicode U+FFFD, which is the \'replacement\' character. It\'s used when you are trying to convert something to Unicode and that conversion failed. So in otherwords, whatever is generating that XML is trying to translate the a umlaut and u umlaut to UTF-8 and failing (maybe because it\'s assuming ASCII source or something?) However this doesn\'t explain what actually ended up in BOINCstats. Somehow the \'real\' data got through to it, otherwise we\'d have seen � in the team name on the pages, not ä (for the a umlaut) As a matter of interest I had a look through some Japanese team names. Most just use English names (probably because they worked out that Japanese names didn\'t work :)) I didn\'t find any with correctly displaying Japanese names, I did find some with names displaying the same symptoms as we see here (UTF-8 displayed as 8859-1) All of this is too late to be of any interest I suspect, I\'ve been way way too busy recently. --Richard |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Well, it looks as if we wasted our time submitting the problem to the wrong people/place. And Milo didn\'t get an answer to the query he added either. Here\'s the fate of our ticket - wontfix. https://boinc.berkeley.edu/trac/ticket/57 Does anyone know who might be willing and able to fix this defect? I\'ve reopened the ticket to ask. Cpdn news |
©2024 cpdn.org