The first five things that all developers should know about Unicode and use to prevent fraud

In November 2017, the BBC reported a news of fake WhatsApp. The fake app seems to belong to the same developer name as the official app. It turns out that these scammers bypass the validation by adding Unicode nonprintable spaces to the developer's name. More than 1 million fake apps were downloaded before Google Play maintainers discovered it.

Unicode is an extremely valuable standard that allows computers, smartphones, and watches to display the same message in the same way around the world. Unfortunately, its complexity makes it a gold mine for scammers and mischiefs. If a giant like Google can't resist the basic problems caused by Unicode, then for a smaller company, this may be like a battle to lose. However, most of these issues are centered around several exploits. Here are the top five things that all Unicode developers should know about and use to prevent fraud.

1. Many Unicode code points are not visible

There are some zero-width code points in Unicode, such as zero-width connectors (U+200D) and zero-width non-connectors (U+200C), which all imply a hyphen tool. Zero-width code points have no visible effect on the screen display, but they still affect string comparisons. This is why the scammers of the WhatApp app can be undiscovered for such a long time. Most of these characters are in the general punctuation area (from U+2000 to U+206F). In general, there is no reason to allow anyone to use the code points of this area in the identifier, so they are the least difficult to filter. However, there are some other special codes that are not visible outside the area, such as the Mongolian vowel separator (U+180E).

In general, it is dangerous to do simple string comparisons of uniqueness constraints with Unicode. There is a possible workaround that limits the character set that is allowed to be used as an identifier and any other data that can be abused by scammers. Unfortunately, this is not a complete solution to the problem.

2. A lot of code points look very similar

Unicode strives to cover all the symbols in the world of written languages, and there must be many characters that look similar. Humans can't even distinguish them, but computers can easily identify differences. A surprising abuse of this problem is mimic. Mimics is an interesting application that replaces common symbols used in software development, such as colons and semicolons, with similar Unicode characters. This can create confusion in the code compilation tool, leaving a face-to-face developer.

The top five things that all Unicode developers should know about and use to prevent fraud

The problems caused by similar symbols are much more than simple mischief. The fancy name is homomorphic attacks. Using these vulnerabilities can lead to serious security issues. In April 2017, a security researcher successfully registered a domain name that looked very similar to apple.com by mixing letters from different character sets, and even got an SSL certificate for it. All major browsers happily display the SSL padlock, listing the domain name as a secure domain name.

Similar to mixing visible and invisible characters, there is no reason to allow identifiers, especially domain names, to use mixed character set names. Most browsers have taken action to display mixed character set domain names as hexadecimal unicode values ​​to penalize them so that users are not easily confused. If you display identifiers to users, such as in search results, consider a similar approach to avoid confusion. However, this is not the perfect solution. Some domain names can easily be constructed from a single area of ​​a non-Latin character set, such as ap.com or chase.com.

The Unicode Consortium has published a table of confusing characters that may serve as a good reference for automatically checking for potential scams. On the other hand, if you're looking for a quick way to create doubts, check out Shapecatcher. It's a wonderful tool that lists visually Unicode symbols like pictures.

3. Normalization is not so standardized

Normalization is very important for identifiers like usernames, helping people enter values ​​in different ways, but in a consistent way. A common way to standardize identifiers is to convert all characters to lowercase, making sure that JamesBond and jamesbond are the same.

Because there are so many similar characters and cross sets, different languages ​​or unicode processing libraries, different normalization strategies may be applied. If it is standardized in several places, it can potentially pose a security risk. In simple terms, don't assume that the lowercase conversion is the same in different parts of the application. Mikael Goldmann from Spotify wrote an incident analysis on this issue in 2013 after one of their users discovered a method of stealing accounts. An attacker can register a variant of another person's username, such as BIGBIRD, which translates to the standard account name bigbird. The different layers of the application have different normalization of words, enabling people to register to imitate the account and reset the password of the target account.

4. The screen display length is independent of the memory size

When using the base Latin character set and most European character sets, the space occupied by a piece of text on the screen or paper is roughly proportional to the number of symbols, roughly proportional to the memory size of the text. This is why EM and EN become popular length units. But when using Unicode, any kind of assumptions like this can become dangerous. There are cute characters like Bismallah Ar-Rahman Ar-Raheem (U+FDFD), which are longer than most English words and can easily exceed the assumed visual frame on the page. This means that any kind of automatic line break based on the length of a string character, or a text break algorithm can be easily fooled. Most terminal programs require fixed-width fonts, so if you display them in these programs, you will see that the right quotation marks are completely marked in the wrong location.

The top five things that all Unicode developers should know about and use to prevent fraud

One problem with this problem is the zalgo text generator, which adds garbage around a piece of text, allowing these things to take up more vertical space.

Of course, the problem of the entire invisible code point causes the screen length to be independent of the memory size. So just the right thing with the input area can be long enough to destroy a database area. Filtering invisible characters to circumvent the problem is not enough, as there are many other examples that don't take up their own space.

Mix the Latin characters that occupy the space on the previous letter (such as U+036B and U+036C) so that you can write multiple lines of text in a line of text (Nu036BOu036C produces NͫOͬ). The 吟诵 mark is used to indicate the tone of the Hebrew Bible verse. These 吟诵 markers can be stacked indefinitely in the same visual space, meaning they can be easily abused and write a lot of information into a character space on the screen. Martin Kleppe uses the 吟诵 mark to encode the implementation of Conway's Game of Life for the browser. Look at the source code of the page, you will be shocked.

5. Unicode is much more than passive data

Some code points are used to affect the way the output characters are displayed, meaning that users can not only copy and paste data, but also type processing instructions. A common prank is to use right-to-left overlay (U+202E) to change the text direction. For example, search for Ninjas with Google Maps. The query string actually translates the direction of the search word, although the search area on the page shows "ninjas", but actually searches for "sajnin".

This exploit is very popular, and there is a corresponding comic on the XKCD comics website.

Mixing data and processing instructions -- code that can be executed efficiently -- is not a good idea, especially if the user can type directly. This is a big problem for any user typing included in the page display. Most web developers know to clean up user typing by removing HTML tags, but also need to be aware of the Unicode control characters in the typing. This is a simple way to deal with any swearing or content filters - just reverse the words and add right-to-left overlays at the beginning.

Right-to-left hacks may not embed malicious code, but if you are not careful, it can disrupt content or flip the entire page. A common way to defend against this problem is to put the user-provided content into an input area or text area so that processing instructions does not affect the page.

Another particular problem with processing instruction pairs is the font change picker. To avoid creating separate code for each color emoji, Unicode allows the use of a font transform picker to blend basic symbols and colors. White flag, font change picker and rainbow will produce a rainbow color flag. But not all transformations are possible. In January 2017, a bug in iOS unicode processing allowed a prankster to simply send a specially crafted message to remotely cause the iPhone to crash. This information includes a white flag, a font change picker and a zero. iOS CoreText is in panic mode, and wants to pick the right transformation, which leads to the embarrassment of the OS. This prank works in direct messages, group chats, and even sharing contact cards. This issue also affects the iPad, and even some MacBooks. The target group targeted by the prank is powerless in the face of system collapse.

Similar bugs occur every few years. In 2013, there was an Arabic character handling bug that caused OSX and iOS to crash. All of these bugs are deeply buried in the OS text processing module, so typical client application developers simply can't hide them.

Offline UPS

Offline

Compact size

Intelligent CPU Control

Boost and buck AVR for voltage stabilization

Off model charging

cold start function

Auto restart while AC is recovering

4

Offline Ups,Apc Offline Ups,Offline Ups For Home,600Va Offline Ups

zhejiang ttn electric co.,ltd , https://www.ttnpower.com

This entry was posted in on