![]() |
|
Spaces home QunheIBMer的共享空间PhotosProfileFriendsMore ![]() | ![]() |
QunheIBMer的共享空间This is my work website, you are welcome.
程序员眼中的女人。---转抄。有的女人就像Windows 虽然很优秀,但是安全隐患太大。 有的女人就像UNIX 她条件很好,然而不是谁都能玩的起。 有的女人就像C# 长的很漂亮,但是家务活不行。 有的女人就像C++,她会默默的为你做很多的事情。 有的女人就像JAVA,只需一点付出她就会为你到处服务。 有的女人就像JAVA script,虽然对她处处小心但最终还是没有结果。 有的女人就像汇编 虽然很麻烦,但是有的时候还得求它。 有的女人就像 SQL,她会为你的发展带来莫大的帮助。 爱情就是死循环,一旦执行就陷进去了。 爱上一个人,就是内存泄露,你永远释放不了。 真正爱上一个人的时候,那就是常量限定,永远不会改变。 女朋友就是私有变量,只有我这个类才能调用。 情人就是指针用的时候一定要注意,要不然就带来巨大的灾难。 It is time to the end of the chinese lunar year. Every year I wanna write a summary and a plan to the next year, however, to the abnormal 2007, I have a lot of things to say. Personally, I don't want to write and just wait, wait the time flies in the slow speed. The next year-2008 is a big year to China and also to me. Without a plan, and just hope the life becomes better and better. Want to rewrite the life of 2007, If I could! Unicode 简介--Cited via others' blog.Unicode是一种字符编码规范 。 先从ASCII说起。ASCII是用来表示英文字符的一种编码规范,每个ASCII字符占用1个字节(8bits) 因此,ASCII编码可以表示的最大字符数是256,其实英文字符并没有那么多,一般只用前128个(最高位为0),其中包括了控制字符、数字、大小写字母和其他一些符号。 而最高位为1的另128个字符被成为“扩展ASCII”,一般用来存放英文的制表符、部分音标字符等等的一些其他符号 这种字符编码规范显然用来处理英文没有什么问题 。(实际上也可以用来处理法文、德文等一些其他的西欧字符,但是不能和英文通用),但是面对中文、阿拉伯文之类复杂的文字,255个字符显然不够用 于是,各个国家纷纷制定了自己的文字编码规范,其中中文的文字编码规范叫做“GB2312-80”,它是和ASCII兼容的一种编码规范,其实就是利用扩展ASCII没有真正标准化这一点,把一个中文字符用两个扩展ASCII字符来表示。 但是这个方法有问题,最大的问题就是,中文文字没有真正属于自己的编码,因为扩展ASCII码虽然没有真正的标准化,但是PC里的ASCII码还是有一个事实标准的(存放着英文制表符),所以很多软件利用这些符号来画表格。这样的软件用到中文系统中,这些表格符就会被误认作中文字,破坏版面。而且,统计中英文混合字符串中的字数,也是比较复杂的,我们必须判断一个ASCII码是否扩展,以及它的下一个ASCII是否扩展,然后才“猜”那可能是一个中文字 。 总之当时处理中文是很痛苦的。而更痛苦的是GB2312是国家标准,台湾当时有一个Big5编码标准,很多编码和GB是相同的,所以……,嘿嘿。 这时候,我们就知道,要真正解决中文问题,不能从扩展ASCII的角度入手,也不能仅靠中国一家来解决。而必须有一个全新的编码系统,这个系统要可以将中文、英文、法文、德文……等等所有的文字统一起来考虑,为每个文字都分配一个单独的编码,这样才不会有上面那种现象出现。 于是,Unicode诞生了。 Unicode有两套标准,一套叫UCS-2(Unicode-16),用2个字节为字符编码,另一套叫UCS-4(Unicode-32),用4个字节为字符编码。 以目前常用的UCS-2为例,它可以表示的字符数为2^16=65535,基本上可以容纳所有的欧美字符和绝大部分的亚洲字符。 UTF-8的问题后面会提到 。 在Unicode里,所有的字符被一视同仁。汉字不再使用“两个扩展ASCII”,而是使用“1个Unicode”,注意,现在的汉字是“一个字符”了,于是,拆字、统计字数这些问题也就自然而然的解决了 。 但是,这个世界不是理想的,不可能在一夜之间所有的系统都使用Unicode来处理字符,所以Unicode在诞生之日,就必须考虑一个严峻的问题:和ASCII字符集之间的不兼容问题。 我们知道,ASCII字符是单个字节的,比如“A”的ASCII是65。而Unicode是双字节的,比如“A”的Unicode是0065,这就造成了一个非常大的问题:以前处理ASCII的那套机制不能被用来处理Unicode了 。 另一个更加严重的问题是,C语言使用'\0'作为字符串结尾,而Unicode里恰恰有很多字符都有一个字节为0,这样一来,C语言的字符串函数将无法正常处理Unicode,除非把世界上所有用C写的程序以及他们所用的函数库全部换掉 。 于是,比Unicode更伟大的东东诞生了,之所以说它更伟大是因为它让Unicode不再存在于纸上,而是真实的存在于我们大家的电脑中。那就是:UTF 。 UTF= UCS Transformation Format UCS转换格式 它是将Unicode编码规则和计算机的实际编码对应起来的一个规则。现在流行的UTF有2种:UTF-8和UTF-16 。 其中UTF-16和上面提到的Unicode本身的编码规范是一致的,这里不多说了。而UTF-8不同,它定义了一种“区间规则”,这种规则可以和ASCII编码保持最大程度的兼容 。 UTF-8有点类似于Haffman编码,它将Unicode编码为00000000-0000007F的字符,用单个字节来表示; 00000080-000007FF的字符用两个字节表示 00000800-0000FFFF的字符用3字节表示 因为目前为止Unicode-16规范没有指定FFFF以上的字符,所以UTF-8最多是使用3个字节来表示一个字符。但理论上来说,UTF-8最多需要用6字节表示一个字符。 在UTF-8里,英文字符仍然跟ASCII编码一样,因此原先的函数库可以继续使用。而中文的编码范围是在0080-07FF之间,因此是2个字节表示(但这两个字节和GB编码的两个字节是不同的),用专门的Unicode处理类可以对UTF编码进行处理。 下面说说中文的问题。 由于历史的原因,在Unicode之前,一共存在过3套中文编码标准。 GB2312-80,是中国大陆使用的国家标准,其中一共编码了6763个常用简体汉字。Big5,是台湾使用的编码标准,编码了台湾使用的繁体汉字,大概有8千多个。HKSCS,是中国香港使用的编码标准,字体也是繁体,但跟Big5有所不同。 这3套编码标准都采用了两个扩展ASCII的方法,因此,几套编码互不兼容,而且编码区间也各有不同 因为其不兼容性,在同一个系统中同时显示GB和Big5基本上是不可能的。当时的南极星、RichWin等等软件,在自动识别中文编码、自动显示正确编码方面都做了很多努力 。 他们用了怎样的技术我就不得而知了,我知道好像南极星曾经以同屏显示繁简中文为卖点。 后来,由于各方面的原因,国际上又制定了针对中文的统一字符集GBK和GB18030,其中GBK已经在Windows、Linux等多种操作系统中被实现。 GBK兼容GB2312,并增加了大量不常用汉字,还加入了几乎所有的Big5中的繁体汉字。但是GBK中的繁体汉字和Big5中的几乎不兼容。 GB18030相当于是GBK的超集,比GBK包含的字符更多。据我所知目前还没有操作系统直接支持GB18030。
问题一: 我很早前就发现Unicode、Unicode big endian和UTF-8编码的txt文件的开头会多出几个字节,分别是FF、FE(Unicode),FE、FF(Unicode big endian),EF、BB、BF(UTF-8)。但这些标记是基于什么标准呢? 问题二: 0、big endian和little endian “endian”这个词出自《格列佛游记》。小人国的内战就源于吃鸡蛋时是究竟从大头(Big-Endian)敲开还是从小头(Little-Endian)敲开,由此曾发生过六次叛乱,其中一个皇帝送了命,另一个丢了王位。 我们一般将endian翻译成“字节序”,将big endian和little endian称作“大尾”和“小尾”。 1、字符编码、内码,顺带介绍汉字编码 GB2312(1980年)一共收录了7445个字符,包括6763个汉字和682个其它符号。汉字区的内码范围高字节从B0-F7,低字节从A1-FE,占用的码位是72*94=6768。其中有5个空位是D7FA-D7FE。 GB2312支持的汉字太少。1995年的汉字扩展规范GBK1.0收录了21886个符号,它分为汉字区和图形符号区。汉字区包括21003个字符。2000年的GB18030是取代GBK1.0的正式国家标准。该标准收录了27484个汉字,同时还收录了藏文、蒙文、维吾尔文等主要的少数民族文字。现在的PC平台必须支持GB18030,对嵌入式产品暂不作要求。所以手机、MP3一般只支持GB2312。 从ASCII、GB2312、GBK到GB18030,这些编码方法是向下兼容的,即同一个字符在这些方案中总是有相同的编码,后面的标准支持更多的字符。在这些编码中,英文和中文可以统一地处理。区分中文编码的方法是高字节的最高位不为0。按照程序员的称呼,GB2312、GBK到GB18030都属于双字节字符集 (DBCS)。 有的中文Windows的缺省内码还是GBK,可以通过GB18030升级包升级到GB18030。不过GB18030相对GBK增加的字符,普通人是很难用到的,通常我们还是用GBK指代中文Windows内码。 这里还有一些细节: GB2312的原文还是区位码,从区位码到内码,需要在高字节和低字节上分别加上A0。 在DBCS中,GB内码的存储格式始终是big endian,即高位在前。 GB2312的两个字节的最高位都是1。但符合这个条件的码位只有128*128=16384个。所以GBK和GB18030的低字节最高位都可能不是1。不过这不影响DBCS字符流的解析:在读取DBCS字符流时,只要遇到高位为1的字节,就可以将下两个字节作为一个双字节编码,而不用管低字节的高位是什么。 2、Unicode、UCS和UTF Unicode也是一种字符编码方法,不过它是由国际组织设计,可以容纳全世界所有语言文字的编码方案。Unicode的学名是"Universal Multiple-Octet Coded Character Set",简称为UCS。UCS可以看作是"Unicode Character Set"的缩写。 根据维基百科全书(http://zh.wikipedia.org/wiki/)的记载:历史上存在两个试图独立设计Unicode的组织,即国际标准化组织(ISO)和一个软件制造商的协会(unicode.org)。ISO开发了ISO 10646项目,Unicode协会开发了Unicode项目。 在1991年前后,双方都认识到世界不需要两个不兼容的字符集。于是它们开始合并双方的工作成果,并为创立一个单一编码表而协同工作。从Unicode2.0开始,Unicode项目采用了与ISO 10646-1相同的字库和字码。 目前两个项目仍都存在,并独立地公布各自的标准。Unicode协会现在的最新版本是2005年的Unicode 4.1.0。ISO的最新标准是10646-3:2003。 UCS规定了怎么用多个字节表示各种文字。怎样传输这些编码,是由UTF(UCS Transformation Format)规范规定的,常见的UTF规范包括UTF-8、UTF-7、UTF-16。 IETF的RFC2781和RFC3629以RFC的一贯风格,清晰、明快又不失严谨地描述了UTF-16和UTF-8的编码方法。我总是记不得IETF是Internet Engineering Task Force的缩写。但IETF负责维护的RFC是Internet上一切规范的基础。 3、UCS-2、UCS-4、BMP UCS有两种格式:UCS-2和UCS-4。顾名思义,UCS-2就是用两个字节编码,UCS-4就是用4个字节(实际上只用了31位,最高位必须为0)编码。下面让我们做一些简单的数学游戏: UCS-2有2^16=65536个码位,UCS-4有2^31=2147483648个码位。 UCS-4根据最高位为0的最高字节分成2^7=128个group。每个group再根据次高字节分为256个plane。每个plane根据第3个字节分为256行 (rows),每行包含256个cells。当然同一行的cells只是最后一个字节不同,其余都相同。 group 0的plane 0被称作Basic Multilingual Plane, 即BMP。或者说UCS-4中,高两个字节为0的码位被称作BMP。 将UCS-4的BMP去掉前面的两个零字节就得到了UCS-2。在UCS-2的两个字节前加上两个零字节,就得到了UCS-4的BMP。而目前的UCS-4规范中还没有任何字符被分配在BMP之外。 4、UTF编码 UTF-8就是以8位为单元对UCS进行编码。从UCS-2到UTF-8的编码方式如下: UCS-2编码(16进制) UTF-8 字节流(二进制) 例如“汉”字的Unicode编码是6C49。6C49在0800-FFFF之间,所以肯定要用3字节模板了:1110xxxx 10xxxxxx 10xxxxxx。将6C49写成二进制是:0110 110001 001001, 用这个比特流依次代替模板中的x,得到:11100110 10110001 10001001,即E6 B1 89。 读者可以用记事本测试一下我们的编码是否正确。 UTF-16以16位为单元对UCS进行编码。对于小于0x10000的UCS码,UTF-16编码就等于UCS码对应的16位无符号整数。对于不小于0x10000的UCS码,定义了一个算法。不过由于实际使用的UCS2,或者UCS4的BMP必然小于0x10000,所以就目前而言,可以认为UTF-16和UCS-2基本相同。但UCS-2只是一个编码方案,UTF-16却要用于实际的传输,所以就不得不考虑字节序的问题。 5、UTF的字节序和BOM Unicode规范中推荐的标记字节顺序的方法是BOM。BOM不是“Bill Of Material”的BOM表,而是Byte Order Mark。BOM是一个有点小聪明的想法: 在UCS编码中有一个叫做"ZERO WIDTH NO-BREAK SPACE"的字符,它的编码是FEFF。而FFFE在UCS中是不存在的字符,所以不应该出现在实际传输中。UCS规范建议我们在传输字节流前,先传输字符"ZERO WIDTH NO-BREAK SPACE"。 这样如果接收者收到FEFF,就表明这个字节流是Big-Endian的;如果收到FFFE,就表明这个字节流是Little-Endian的。因此字符"ZERO WIDTH NO-BREAK SPACE"又被称作BOM。 UTF-8不需要BOM来表明字节顺序,但可以用BOM来表明编码方式。字符"ZERO WIDTH NO-BREAK SPACE"的UTF-8编码是EF BB BF(读者可以用我们前面介绍的编码方法验证一下)。所以如果接收者收到以EF BB BF开头的字节流,就知道这是UTF-8编码了。 Windows就是使用BOM来标记文本文件的编码方式的。 6、进一步的参考资料 我还找了两篇看上去不错的资料,不过因为我开始的疑问都找到了答案,所以就没有看: "Understanding Unicode A general introduction to the Unicode Standard" (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter04a)
时间PP : 可是时间可以拿来做什么呢?呵呵
Me:时间不是做的,而是干的。青春呀!
PP:汗……做和干不一样吗?
Me:语气不同,心态不同!
Me:坐着做着时间就没有了!
Me:敢着干着时间就充裕了!
Me:而且是属于自己的时间。
Life is not easy to stand.Too much things happening in this month, I am so glad to know the fact : "Life is not easy and I have to try my best.".
All the things happens, I think that I am a lucky dog with a group of good friends.
However, I can not push my luck, for I know I am only a small boat.
It is too hard for me to sail on the terrific oceans and seas, I have to take care of myself.
Keep an active eye to the changable world.
Thanks all, thank my dear friends who helped me to pass over this darkest period.
Thanks, thank you, My dear buddies and sisters, thank you my friends.
I will try my best to be mature, I won't let you down, my family.
Stay foolish, stay hungry.Stay active and passive. Swim and English--the 2 I love most.Swim and English Author: Adam Zhang Just as Karl Marx said: “All common things in the world have relationships between each other.” And Today, I will show you the relationship between swim and English---the 2 I love most.
Swim is a skill of sport; English is an ability of communication.
There are different styles of swim: breaststroke; backstroke; butterfly and etc; There are distinctive accents of English: British; American; Indian and so on.
It is hard to take a breath in the water; It is not easy to answer a question in English.
Rhythm and co-ordination is very important to swim; Intonation and grammar is the key to English.
Swim lets you enjoy yourself in the water; English makes you meet friends in the world.
With swim, you gain a strong body and can survive in the southern flood; With English, you win a huge opportunity and can thrive in the outsourcing field.
For learning, coaches raise you up to an excellent swimmer; Teachers bring you up to an erudite linguist.
Fish swim; British speak (English).
For us Chinese, at first most of us take that there is no environment as an excuse. However, we can win the Gold of swim in the Olympics, so trust that, hold the excellent English is not a dream!
To learn to swim, we change the plain ground to a swimming pool; imagine we were in water to learn and practice the action; and finally, feel the improvement of our swimming ability, and then next improved round. For English, it is the same, change our common room to a language laboratory; imagine speaking to a foreigner when talking against the wall; feel the achievement of our English. Imagine, change and feel. This is a running circle. Besides Testing Career, swim and English—they are almost the same. (Note: This is the passage I used in the final match of the 3rd English Contest, however, the passage is a little longer, sorry to lose completing it on time. Keep on and come on.) ----Swallow my pride, prove my worth and start again and again. Imagine, change and feel about testing career.Imagine, Change and feel about testing career Author: Adam Zhang
There are 3 words which are providing me much more than I expected; they are imagine, change and feel—these three great words give me more and more passion in my daily life. Finally they made me to love my career as a tester deep inside every cell of my body.
Imagine is a verb about future, for it paints a brilliant and glorious picture of my life. Imagine is a verb about sightseeing, for it changes different offices into the beautiful scenic spots. Imagine is a verb about imagination, for you can imagine all the things in this imaginative world.
Change is a verb about attitude, for it changes our female colleagues’ attitude about bugs, they love bugs instead of avoiding bugs. When they come across with a bug, maybe they are also scared enough to shout, however, they shout like “Wow, golden beetles” instead of sharp voice “Help!” Change is a verb about life, for it changes the lives of people who are willing to start their career; it changes the bachelors’ bachelor status.
Feel is a verb about collection, it get the results of imagination and change together. Feel as you imagine; feel as you change; feel as your clients feel; …Finally, feel the happiness in your success and feel the success in your happy life.
Let’s take I-phone as an example, its success does not depend on Technology, however, i-phone is the beautiful and wonderful kid of these 3 words—imagine, change and feel: Imagine a new model; change the operation and feel the wonderful feelings of cell phone both by the developers and users. Feel and enjoy; and then imagine and change again. (Note: This is my passage in the trial match of 3rd English Contest of Beyondsoft and also a summary of my test career in the past 2 years.) 两年来对北京的感受.转眼间,时光飞逝,来北京的日子已经两年多了,细细想来,感慨颇多,多少人来人往,多少悲欢离合,化入时间的隧道,转成记忆的思绪. Daylight saving time in 2007.The U.S. Energy Policy Act of 2005, passed by the U.S. Congress July, 2005, extended Daylight Saving Time (DST) in the U.S. by approximately four weeks. As a result, beginning in 2007, DST will start three weeks earlier on March 11, 2007, and end one week later on November 4, 2007, resulting in a new DST period that is four weeks longer than previously observed. These four weeks are referred to in this article as the "extended DST period". Visit MSN Encarta for more general information on DST.
Unless certain updates are applied to your computer, it is possible that the time zone settings for your computer's system clock may be incorrect during this four week period. This depends on where you live and which time zone you have selected. To see the time zone settings on your computer, follow these directions. When your time zone settings are incorrect your clock may be off by one hour, and certain applications running on your Windows based computer may not display the correct time. To address this, Microsoft is providing many free updates and tools that will update your system automatically. While the change in daylight saving time applies to U.S. and Canada, the change may impact customers based outside North America. Companies or organizations with operations, customers or vendors based in North America may be affected. In addition customers who interact or integrate with systems that are based in North America or rely on date/time calculations may be impacted. Customers who live outside North America and yet are impacted should follow the guidance provided on this site to prepare for the adjusted daylight saving time. Cited from www.microsoft.com Select your star sign, and remember the names of Zodiac.<<中文在于体会,英文在于联想>>
---词根和联想在语言学习中很有用,
我的印度同事Vivek同样靠联想记单词:
How to remember the names of Zodiac: Aries(a for the first letter),Taurus(from the sound I can image how strong the man is), Gemini(double i), Cancer(for My own Zodiac and I feel angry about another meaning of cancer and then remember it),Leo(for the soldier's name 艾奥利亚 ), Virgo(for virgin), Libra( this is from bra link with its two shields), Scorpio(remember it by mind), Sagittarius( horse and people so it is the longest word in Zodiac, and S for shoot ,tarius link to Taurus cow with horse ),Capicron( the sheep need corn under the cap),Aquarius(qu from liquid),Pisces( Ph for f and then you can translate it to fishes), that is great, after this imagination in one night, I remember these words in my heart!That is great!
If you were born on the first or last day of a Sun sign, in astrological terms you were born on a cusp. If that's the case, you will probably benefit from reading your own Sun sign and the Sun sign that ends or begins right before or after your date of birth. For example, if your birth date is 22 December, your Sun sign is Capricorn, but you probably have some Sagittarian traits as well. BTW, there is a very good dictionary <<Word Power Made Easy>>, try your best to use this book. 2006年的一些高兴事.2007的钟声已经在身后飘荡, 春节的脚步不断临近, 毕业一年多了, 在这忙碌而曲折, 精彩与痛苦的2006, 我的本命年总算过去了, 双手欢送, 希望来年能够想2007一样, 我能有我妻, 在京有房, 出行有车, 过一个快快乐乐的新一年. 虽然难过的事挺多, 但在此只谈些高兴的事, 因为相信高兴更能带来高兴.事件不分先后, 随心而出.
就写这么多吧, 高兴的事得偷着乐! 哈哈.展望过去,指引来年,希望自己不断腾飞. Eagle fly high in the sky!
| |||||||||||||||||||||||||||||