Towards the methods of analysis malicious applications for Android operating system

. It is considered to the problem of analysis of Android applications to study a malicious behaviour. The methods of analysis are presented, the general method, which combines different analysis techniques (static, dynamic, decompilation, debugging, logging) is proposed, and information of our software based on it is given.


Necessity for the work
In the recent years, mobile phones with a mobile operating system are widely being used, and now we observe the explosive growth of mobile devices in the world. According to forecasts [1], in 2018 the number of smartphones in the world will be over 50% of the total number of phones (see fig. 1). More and more people do their daily tasks by using software for phones, including making the financial transactions. However, the computer literacy for today's smartphone users is not keeping pace with the progress in the field of mobile devices. It can mean that soon we are going to have about 2.5 billion potential victims of intruders; they would use mobile phones as an instrument to steal private data and money. It is known that in the past year hackers in Russia were stolen 349 million rubles [2] from the owners of phones under Android OS (about 2 rubles in mean from each Russian citizen), which is five times more than a year ago. Fig. 1. Number of smartphones in the world will be doubled for the last 5 years [1] Choosing the Android OS by intruders is primarily caused by its prevalence, availability of cheap Chinese phones under it. Here we have a problem: typical users (especially from the countryside) are not even aware that they work with a minicomputer which can hook a trojan program and such program could get access to user data, receive commands to work following the hacker's request. According to a Kaspersky's report (see fig. 2), the most frequent malware types now are "RiskTools" (special software or software with vulnerabilities used by intruders to enter and control the system), "Trojan-SMS" (trojans for sending SMS to short numbers) and some other trojan types including "Trojan-Banker" (trojans for stealing money from a bank account). In 2015, local representatives of the Police addressed to the Department of Applied Mathematics at Altai State Technical University in order to develop methods, algorithms, and software for the analysis of the smartphones injured by the trojans (like types as "SMS" and "Bankers"), to do digital forensic science to analyze the malicious applications. We have a situation now: there are no generally accepted methods to do it.

The goals of the analysis
Android OS itself is a special Linux Kernel, which contains Dalvik virtual machine for execution of Java applications using their own API. The application for Android is an apk (Android Package) file, which is a zip archive and it contains some compiled Java-classes of the application in the form of .dex file (Dalvik EXecutable), resources and application's descriptor AndroidManifest.xml within. A trojan hereis a dual-purpose Android application. It looks like an ordinary application but it was created to steal data or money from victims. A victimis a people who handed his smartphone with some trojans to the Police for the analysis since he had suffered by unknown intruder's actions to withdraw his money via the phone. According to the problem described in the previous section, the goals of the analysis are:  identify malicious applications among given applications from given phones;  make some proof of harmfulness of given application;  study algorithms of the work of applications without having sources of it;  find the remote hosts which application communicates to, discover sending content, format and protocols);  discover user's private data leaks;  compare various malicious applications and group them;  develop and test a general method to make suchlike analysis. So, we need dynamic and static methods for analysis the behavior of Android applications. About 50 real recent malicious applications from the real victims (who had lost the money because of intruder's actions) were analyzed and identified, and methods for it were developed.

Related work
There is a high number of research papers devoted to malicious applications for Android but most often to identify whether or not the given application is a trojan. For example, in the work [3] authors explain how to detect a malicious program for Android with using machine learning and static analysis methods. But it is only a part of current research, the main our goal is to make the analysis of the behaviour and prove the harmfulness by the application's actions. The fact of malware can be detected by using VirusTotal tool [5], which checks the signature of given application by a lot of modern anti-viruses. The Pennsylvania State University, Duke University and Intel lab have developed the series of patches [4] for Android Kernel and libraries to track the data passing through the functions calls in running Android applications to prevent privacy leaks (taints). This approach can be applied to our research with some changes to do dynamic analysis of application's behavior.

Malicious applications identifying
The detection of malicious applications in a phone is not so difficult. As a rule, such dual-purpose applications have a small size, a name and an icon disguised as the popular applications (WhatsApp, Skype, Flash Player, Kaspersky, Sberbank Online, etc.). Since the majority of identified by this research viruses carry out sending messages in a background to short phone numbers or performing USSD-requests, it is possible to identify them by looking at the permissions of applications. If the Android app uses some API, it must request the appropriate permissions from the system. As a rule, such a request is made during the installation of the application; however, users just ignore the warnings, even if they say that the application will send SMS that can cost money or it is going to control the phone as an administrator. Permissions are set in AndroidManifest.xml and can be seen in the information about an application. An example of malicious application's permissions is given on table 1. The application can be sent for an analysis to one of the anti-viruses (for example, DrWeb online), as well as viewed in the Virus Encyclopedia for its possible actions. Now, we are using the online tool VirusTotal [5], which accumulates information about the signatures of malware files from various antivirus software. However, applications are modified, and some recent viruses could not be recognized, so this analysis is made for educational purposes only. When the malicious application is detected, a further analysis on the phone would be inappropriate. As a rule, the application as an apk file is retrieved, and the browser's logs and download history in the phone are analyzed to search from where such application could be obtained. Next, the application is analyzing on a desktop computer running Android OS emulator (included in the Android Studio application development tools). Application analysis is complicated because we deal with a compiled application without the presence of the source code.

Low-level debugging the Android applications
Without the presence of the source code, there is a way to understand the logic of the applicationto analyze it with using a debugger. Starting with version 6.6, an interactive debugger IDA Pro [6] supports debugging applications for Dalvik virtual machine. It is possible to trace the logic of the application, explorer function calls from a Java library, set breakpoints on them and analyze the data transferred as parameters, etc.

Рис. 3. Исследование графа поведения в IDA Fig 3. Discovering a behaviour graph with IDA
With using IDA we can use a feature "proximity browser" for some kind of static analysis (see fig. 3). Once .dex file is loaded, the app creates a graph containing functions, variables, other objects and the relationships between them. Without actual running the debugging it is possible to find a suspicious function call (like sending the SMS) and view a path for its execution. With using the debugger we can identify, for example, process of interaction with an intruder's server (in this case, addresses and sending data were not explicitly programmed in the code but collected in the values of local variables), and this approach was used to prove the harmfulness work with SMS messaging with a remote banking in one trojan application. However, using this method of analysis requires knowledge of the virtual machine assembler and approaches of low-level debugging, the process of analysis is long and complicated. In addition, IDA Pro is a paid product. The future research work may be applied here to create own solution to detect calls from application's disassembled code started from some usable functions from Android API liked by the intruders (send and receive SMS, networking, etc) and programmable create a graph to do future analysis.

Analysis of network interactions in the Android applications
To determine what data is passed over the network to/from the analysing application, we can propose a very simple wayto install a proxy server on a host that is running the phone emulator, and make the settings in the Android emulator phone instance to specify the address and port of the proxy server. Fig. 4

. A proxy server offers to monitor the traffic
After that, all the HTTP connections will go through a proxy, which may log us information is sent and received by the applications on the phone (the process is shown on fig. 4). As a result, we can identify intruder's domains, through which the trojan applications receive data and transmit the control commands.
Having the recorded network data, we also can match various versions of the malware, identifying the groups of trojans by comparing the remote hosts and interactions data.

Tracing the application's logic with a patched OS kernel
To understand the logic behavior of Android applications, it is decided to use the method of dynamic analysis with logging tools built into Android. However, the standard "adb logcat" command creates a very sparse dump of the application's operations during the run and not intended for the analysis. The TaintDroid project [4] was created to tell the user about private data leaks during the work of Android OS applications in the form of push notifications on the top of a phone screen. It patches Android kernel and library code and inserts the own code to reveal and process taint data. They had to do a lot of work to pass some additional arguments between every library function call in order to collect the data (a cut from the patch for the file vm/InlineNative.cpp is shown on table 2).

Рис. 5. Пример пути СМС с использованием DroidBox Fig. 5. An example of an SMS flow with using DroidBox
After doing it they have inserted their code to check the fact of opened files, network access, used encryption algorithms and data pass to encrypt/decrypt functions, sending and receiving SMS (a number and a text of the message), phone calls from applications, class loading into memoryjust what is necessary for the analysis of malicious code. Next, we found Droidbox [7]a framework for the dynamic analysis of applications for Android, which is built on the top of TaintDroid's patches and consists of two parts:  a Python script to start and produce results;  a modified OS kernel, containing TaintDroid's patches and additional patches for improving the logging tools. These patches work as follows: because of some added additional lines to the OS code to preserve TaintDroid data to the log (see fig. 5), information of applications' behavior now can be accessible with 'adb logcat' command and can be processed with some external software. It should be stressed that some viruses perform checks the OS version of Android, the phone name, phone number, the IMEI, in order to determine the work in the emulator with standard predefined values and possible to stop working after such detection. So, we have to use one's own binary patches to the system image with the replacement of the required parameters with some new values without having to recompile the entire kernel.

Decompilation and source code analysis
If the log during the time of the analysis of the application shows us no activity (for example, the server that receives a command is not working now), we can try to decompile the application code to observe the logic. As every Java application could be decompiled to a source code (especially if the application developer did not use any obfuscation algorithms after building the code), we can try to decompile the application and get some semblance of its source code. For such decompilation firstly we apply dex2jar program [8] (to convert from a.dex archive inside of an .apk file to a set of .class files containing compiled Java code), and Java decompiler program (a tool for process .class and obtain .java files from them). Because of a difficult Java structure, currently there is not exist a good decompiler for all cases, so we propose to use some decompilers simultaneously (FernFlower [9], CPR [10], Jadx [11], Procyon [12]), and to select the best result after decompiling. The results of decompiling to a source code can be used against intruders to prove the harmfulness of the application. However, if the application file structure is just restored (though not their contents), then it can also be used for comparing the trojans from the various crime groups. The process of restoring normal application code and understanding what it doesis a handwork, but not a complicated enough. The issue of combating obfuscation requires special studies. For example, this is a fragment of decompiled code that possible had been obfuscated before distributed, it is unreadable and currently the method of source code observing is not used for such files:

The general method
The proposed general method is given on fig. 6. Suppose we have a phone with some possible trojans. We look at the applications, get them from the phone and select them to make the analysis by some indirect indications (like name, icon, size, permissions). And possible we try to check some applications in the antivirus databases. If we found a suspicious application, we are going to make a comprehensive analysis. The app in binary classes form could be checked by a static analyzer for possible security vulnerabilities (currently the static analyzer has not been selected yet). After, the app goes to the decompilation, and if the result of it is good the code behavior can be observed from the sources.
In addition, sources could be statically analyzed by the static analyzer, which works with .java sources. If the decompilation has failed, the app goes to the debugger and the behavior graph, which is received after disassembling is observing. For analysis the dynamic behavior we use the emulator with a modified OS kernel for logging, it is patched to change the IMEI, phone and so on, and proxy to detect a network activity. Some applications cannot be run from the emulator for various reasons, they could be run in the real phone with using the proxy monitoring. After finishing the process, we will have some data about the application's behavior. The completeness of the resulting data depends on the application but in the almost all cases of analysis that we did, the information about the application is sufficed to threat is it a malware or not.

Developing an application based on proposed methods of the analysis
Our students have developed [13] a small aggregated single-window application that performs semi-automatically the functions described here, without having to manually run all the utilities, copying files, and so on. It can be used by a specialist of digital forensic science. The application communicates with the Android Platform Tools for managing connected phone, the emulator and examines the logs. The app allows (see fig. 7):  getting the permissions and copy the application from an Android phone. y malicious applications among given applications from given phones;  make requests to the VirusTotal to check the application in the antivirus databases;  start the Android emulator with the patched kernel and system image, make the dynamic analysis of the application, display the collected logs;  decompile the application by the described decompilers and navigate through the decompiled files;  run the integrated proxy server and monitor application's networking activity.  Later in this application, we plan to generate reports in the standard form for digital forensic science.

Results and conclusions
 The article described the basic methods of analysis the behavior of malicious applications for Android. The general analysis process is proposed. Currently, the results are actually being used to make proofs of committing computer crimes (it is a research work by the contract).  An extension of this work will be a higher automation of the analysis as well as analysis of the applications that have sophisticated protection, some strong research work for call graph building from the disassembled and obfuscated code, providing patches for modern Android kernels.