An Interactive Specializer Based on Partial Evaluation for a Java Subset

. Specialization is a program optimization approach that implies the use of a priori information about values of some variables. Specialization methods are being developed since 1970s (mixed computations, partial evaluation, supercompilation). However, it is surprising, that even after three decades, these promising methods have not been put into the wide programming practice. One may wonder: What is the reason? Our hypothesis is that the task of specialization requires much greater human involvement into the specialization process, the analysis of its results and conducting computer experiments than in the case of common program optimization in compilers. Hence, specializers should be embedded into integrated development environments (IDE) familiar to programmers and appropriate interactive tools should be developed. In this paper we provide a work-in-progress report on results of development of an interactive specializer based on partial evaluation for a subset of the Java programming language. The specializer has been implemented within the popular Eclipse IDE. Scenarios of the human-machine dialogue with the specializer and interactive tools to compose the specialization task and to control the process of specialization are under development. An example of application of the current version of the specializer is shown. The residual program runs several times faster than the source one.


Introduction
The method of program specialization known as partial evaluation was invented more than 30 years ago along with the achievement of the famous result [1], [2] of evaluation of the First, Second and Third Futamura projections [3]- [5] for a tiny List subset. The first round of research was completed in early 1990s when the main textbook on partial evaluation had been published [2]. A lot of programming problems were found to be solved by program specialization (the most known being the generation of a compiler from an interpreter by the Second Futamura Projection) and the emergence of a new class of program development tools based on specialization were expected. Some other program specialization techniques, e.g., supercompilation [6], [7], has been developed in parallel as well. However, it is surprising that even after three decades these promising methods have not been put into the wide programming practice. One may wonder: What is the reason? Our hypothesis is that the main expectation that governed the development of specializers was wrong. The developers of these methods hoped that specializers could work in fully automatic mode and they just needed to invent some finitely many features and improvements that solve the problem, after which "the great goal" would be achieved and happy programmers started using the new tools. They expected that specializers could work in the similar "black-box mode" as optimizing compilers. However this did not happen. The time and space complexity of the program transformations that were necessary for specialization, turned out to be much higher than the complexity of program optimizations that can be used as "black boxes" with short and predictable run time and consumed memory. We argue that automatic methods of program optimization have reached certain inherent limits. In order to develop and use more powerful tools, we must give up the expectations that the program analysis and transformation systems will operate in automatic mode without human intervention. Program specializers possess too many degrees of freedom and choice, which cannot be resolved by the algorithms of their kind and, therefore, should use human help. Based on this observation, we put forward the goal of construction of an interactive specializer embedded in a habitual integrated development environment (IDE) such as Eclipse [8]. Eclipse provides a rich open-source toolkit referred to as Java development tools (JDT) [9], which allows a developer to deal only with essential tasks of analysis, visualization and transformation of Java code. Adequate humanmachine dialogue tools to control the specializer and deal with the results of specialization are to be developed. We would like to emphasize that there is strict separation of concerns between the machine and the human: the specializer guaranties the functional equivalence of program transformation and the user is responsible for the control of the specializer in such a way that it produces the code that satisfied user's goals and needs (which the machine does not know).

Fig.1. Source code of Ackermann function
We think that partial evaluation is better suited than other specialization methods (like supercompilation) for human-machine dialogue organized in such a way that the user comprehends what is happing in the specializer, receives valuable and interesting information about his code, is capable of adjusting the source code to be better specialized and controls the specializer. The reason is that the method of partial evaluation consists of two stages:  binding-time analysis (BTA) of source code that selects the parts of the code that are to be evaluated at specialization time, and  residual program generation (RPG) that follows the information supplied by BTA, performs specialization proper and produces the resulting code (referred to as residual). A pleasant feature of BTA is that its result (called BT annotation) may be naturally shown on the source code by highlighting and due to such visualization the residual code is intuitively predictable. We hope that this will allow for easy adoption of specializers as new programming tools by rank-and-file programmers. Terminological remark. In the theory of partial evaluation the parts of source code to be evaluated during specialization are called static. The other source code that is transferred to the residual program (residualized) is referred to as dynamic. The term static conflicts with the static modifier in Java and the term dynamic may be confused with the run-time notions. That is why we avoid using these words in the partial evaluation sense and use abbreviations S and D instead, e.g., S-annotation, D-annotation, S-code, D-code, S-part and D-part of a program. The contributions of this paper are as follows.
 We show the first results of development of the Java specializer, where partially evaluated code is restricted to operations on primitive types.
 We demonstrate the work of the specializer by an example of specialization of the Ackermann function with respect to the first argument.
 We discuss some of the details of implementation in Eclipse and the methods and features to be implemented in future.

Fig. 2. Residual code of Ackermann function
The outline of the paper is as follows. In Section 2 we present the basics of partial evaluation for Java by an example of specialization of the Ackermann function. In Section 3 a bird-eye view of the implementation of the specializer in the Eclipse IDE is presented. Section 4 contains a survey of related works in comparison with our specializer. In Section 5 we conclude. The method A implements the Ackermann function and the method test invokes it with the first constant argument 3. The Java annotation @Specialize at the method test specifies that it should be specialized, i.e., its body is to be replaced with the residual code and the specialized versions of the methods that it invokes are to be generated and added to the program. The names of the methods A and test in their headers are marked in orange in order to show that they are involved in BTA. The bodies of these methods are analyzed and annotated: green highlighting marks S-parts of code. (You see gray highlighting in fig. 1 if you read this paper in a monochrome print).

Binding-Time Analysis
The BTA algorithm for variables and operations of primitive types is rather straightforward. First, all constants are annotated with S. Then recursively: a subexpression containing only S-parts becomes S; a local variable declaration and an assignment with S right-hand sides become S; a method parameter that correspond to S arguments at all points of invocation becomes S; in case of conflict of several invocations of the same method the conflicting parameter becomes D; a conflict on several assignments to a local variable turns it to D as well; an if statement with the S conditional expression is annotated with S regardless of the annotation of its branches (this means that if-else will disappear while one of the branches will be residualized); other control statements are analyzed and annotated similarly. When the recursion reaches the fixed point, the remaining parts of code are annotated with D. D-parts are not highlighted in Figure 1. This mode of operation of BTA, when each code fragment gets univocal annotation S or D, is referred to as monovariant. The more general mode when several versions of annotation are allowed is called polyvariant. The current version of BTA is monovariant. In future we plan to implement polyvariant BTA for classes and reference types according the theory developed in [10]- [18]. Monovariant BTA on primitive types can be defined formally as abstract interpretation on a lattice with 3 elements: undefined < S < D.
As an illustration of monovariance, notice that in figure 1 method A is invoked 3 times in the source code, one of which has both S arguments, another 2 invocations have the first S argument and the second one is D. The first invocation is processed in the same way as the other two with the second S argument assigned to the D formal parameter.

Residual Program Generation
At the generation stage, partial evaluation starts from the method with the @Specialize annotation and recursively visits all invoked methods in turn.
Notice that, since all statements and methods with side effects are considered D and hence are residualized rather than executed at specialization time, the order of specialization of methods does not matter. For each of the specialized methods, several residual versions can be producedone for each combination of values of S arguments. They got different names of the form (in the current version): source-name_number. They have only those parameters that correspond to D parameters in the source code. The current version of the specializer can loop forever if infinitely many values of S arguments are generated. The production version of the specializer should contain special debugging means to gracefully leave such situations. This is our future work. In Figure 2 there are 4 versions of residual method A corresponding to values 0, 1, 2, 3 of its first argument. Notice that because of monovariance the invocations A_2(1), A_1(1), and A_0 (1) have not being evaluated, since the constant 1 correspond to the D parameter of method A.

Running Source and Residual Programs
We have chosen this example for presentation, since it demonstrates all main features of the current version of the specializer. We did not expect a significant speed-up as it seemed that asymptotically the number of method invocations was almost the same and the invocations were the most expensive operations in this example. Thus we were very surprised when the speed-up was about 3 times. The obtained acceleration can be explained by several reasons. First, calculation showed that the specialized version performs 1.86 times less Java byte code instructions. Second and more important, it is natural to suppose that the JIT compiler in JVM performs inlining of those specialized method that are simpler and more compact than in the source code. This example illustrates the principle, which we observed many times in experiments with various specializers: a specializer does not replace the classic optimizing compilers. Rather, we observe "composition" of optimizations by a specializer and a low-level optimizing compiler and hence multiplication of speedups. Residual code produced by specializers is more amendable for classic optimizations than code written by a human being. We may conclude that specialization opens up additional opportunities for program optimization.

Architecture of Specializer
The specializer has been implemented in the Eclipse development environment (IDE) [8].
The IDE has open source code and provides points and tools to extend it. The basis for Eclipse extension is the concept of a plug-in. Each plug-in is an archive JAR file containing a so-called manifest, a set of files describing the dependencies of the plug-in and the possibility of its extension (extension points).
Other plug-ins can add their functionality to these extension points. For example, one might want to add his toolbar extensions to an already implemented toolbar plug-in. A small tool is usually implemented as a one plug-in, while a large one is often provided as a set of plug-ins. Our specializer is implemented as three Eclipse plugins.
The specializer consists of the following plug-ins:  a plug-in SpecCore is the core of the specializer, which implements its main functionality;  a plug-in SpecMarkers is responsible for text highlighting in the Eclipse editor in accordance with the annotation produced by the SpecCore plug-in;  a plug-in SpecMenus implements interactions with various menus (including context menus) and toolbars to provide a user-friendly interface.
The SpecCore implements the binding-time analysis (BTA) and the generation of a residual program. When analyzing the source program the plug-in SpecCore uses the abstract syntax tree (AST) built by the Eclipse Java development tools (JDT). JDT is a set of plug-ins that provides us with an easy way to manipulate Java source code. The second of the three plug-ins that form the specializer is the SpecMarkers plugin. It is responsible for highlighting the source code, which allows the programmer to see which parts of the program are evaluated at specialization time and which are residualized. This helps him to understand how to change the code to provide better specialization. The last part of the specializer is the SpecMenus plug-in. This plug-in uses the extension points of other plug-ins to add the necessary elements to some menus. It adds two new buttons to the main toolbar of Eclipse: Enable/Disable the highlighting and the "Generate optimized Java files" button. Also this plug-in adds items to the context menu of the Project Explorer and Package Explorer views.

Related Work and Comparison
A lot of works are devoted to partial evaluation for functional languages. The book [2] summarizes the first wave of development of this method. Later on, research into partial evaluation for imperative "Algol-like" languages [19], [20] and C [21] was performed. In early 1990's, the first (to our knowledge) specializer for C was developed, called C-MIX [21], [22]. Chapter 11 of the book [2] contains its detailed presentation. C-MIX specializes a program in three stages. The first stage is the analysis of references. For each reference variable, a set of the variables that it could refer to is built. If the analysis finds that several reference variables can refer to the same memory, they are labeled identically. The second stage is the construction of a binding-time annotation of the source code. References to the same memory area are annotated identically. In case of conflicts, the annotation is reduced to D as usual. The third stage is the generation of the residual program. Specialization of reference types in Java can be similar to elaboration of pointers in C-MIX. However, Java stricter typing and managed run-time can be leveraged for deeper specialization. The current version of our specializer annotates all reference variables D and, therefore, they are left unchanged. Our future work is to add the binding-time analysis of reference types. Unlike C-MIX, we expect that our specializer will still work in two stageswithout the reference analysis phase. Further development of ideas of C-MIX led to the creation of a specializer of programs written in C, called Tempo [23], [24]. This specializer is much like C-MIX. The next important step was the development of the first specializer for an objectoriented language -JSpec for Java [25]. JSpec uses the Harissa compiler [26] to translate the Java program into C. Then the Tempo specializer mentioned above transforms the program. The obtained C-representation of a specialized Java program is mapped back into Java using the Assirah translator [25]. Finally, the AspectJ tool weaves the specialized program with the source program to get the executable Java bytecode. The main limitation of JSpec is that it is capable of partially evaluating only immutable classes and objects, while mutable objects are always residualized. Our goal is to waive this restriction. The most advanced (to our knowledge) partial evaluation method for objectoriented languages like C# and Java has been developed in CILPE [10]- [18], a partial evaluator for Common Intermediate Language (CIL), the bytecode of the Microsoft .NET Framework. It supports almost all of the basic constructs of objectoriented languages such as C# and Java. In CILPE, a new concept of a binding-time heap (BT heap) has been introduced. A BT heap is an abstract description of the state of a run-time heap, which allows us to separate reference type data into evaluated at specialization time and residualized ones and to avoid the use of the reference analysis stage as in C-MIX. As a result of specialization, some of the objects are no longer created in the residual program, and if necessary, local variables are used instead of object fields. We will base on the results of this research in our future work to implement BTA of classes and partial evaluation of objects.
A relatively new specializer of Java programs is Civet [27]. Civet is based on a socalled Hybrid Partial Evaluation (HPE) approach. Specialization in HPE is performed in online mode, i.e., in one pass, while the programmer can specify which parts of the program have S-annotation. On the contrary, in our specializer we choose the offline approach, i.e., the residual program is built at the stage of generation of the residual program after the completion of the binding-time analysis, where information about the S-parts of the program is collected automatically rather than specified by the user as in Civet 1 . PE-KeY [28] is a partial evaluator for Java programs based on the KeY verification system [29]. PE-Key works in two stages. At the first stage, the program is executed in a symbolic form with the application of a special set of rules. At the second stage, a residual program is synthesized, while the rules are applied in the opposite direction. The PE-KeY approach is similar to the classical offline specialization that our specializer uses: a specialized program is produced in two stages. However, in the first stage of PE-KeY, the program is executed symbolically, while our bindingtime analysis performs an abstract interpretation of the program. In addition, due to limitations of the KeY verification system, PE-KeY does not support floating-point arithmetic, while our specializer supports. JSpec, Civet, PE-Key deal with objects at specialization time, while the current version of our specializer annotates classes and variables of reference types with D and thus residualizes them unchanged. The extension of our specializer to partial evaluation of classes and objects is our future work. The specializers considered above interact with the user through the command line, so it's extremely difficult to use them. In order for the specialization to be widely used, it is required to develop the methods of interaction with the user and to embed the specializer into an integrated development environment convenient for the programmer, what we are implementing in our specializer. This is a crucial difference.
We know about just one work on partial evaluation carried out in a practical setting the GraalVM toolkit in Oracle Labs [30], [31]. The toolkit is designed for defining domain-specific languages by interpreters and, nevertheless, achieving highperformance by using a specializer. The first Futamura projection provides an opportunity for such acceleration (see [3], [4] and [2, Chapter 1.5.1]): given a program and an interpreter that executes the program, GraalVM specializes the interpreter with respect to a part of the given program and produces the machine code of this part. The resulting code is executed much faster than the original one in the interpreter. The main goal of GraalVM is to provide a technology similar to justin-time (JIT) compilation for the developer of a programming language without the need to implement the complex machinery of JIT. The interpreter specialization in GraalVM is not automatic and uses prompts by the interpreter developer. This case of implementation of partial evaluation confirms that practical application of specialization requires guidance from the programmer. We conduct our research in the same direction: methods and tools are being developed to provide the programmer with information about program behavior under specialization and levers to control the partial evaluation processes.

Conclusion
In this paper we put forward the task of development of an interactive specializer.
We argue that the current stage of program specialization methods has reached certain limits because the previously implemented specializers do not imply the participation of the user in the process of specialization. Our specializer uses the offline partial evaluation approach, where the program transformation if performed in two stagesbinding-time analysis (BTA) and residual program generation (RPG). We briefly described the architecture of our interactive specializer in the Eclipse development environment. We illustrated the work of the specializer with the famous example of the Ackermann function and the result of its specialization with respect to its first argument. The specialized program runs several times (about three) faster than the original one. We see the following directions for further development of the specializer:  to develop and implement binding-time analysis and residual program generation for classes and objects;  to implement interactive tools for composing a specialization task and controlling the process of binding-time analysis and residual program generation;  to implement tools to visualize the correspondence between source and residual code;  to demonstrate that a well-developed specializer can convert well-structured high-level human-oriented code, which can not be automatically parallelized, into code that can be parallelized by existing methods and tools;  to prepare demo programs that benefit from specialization, for example, building a compiler from an interpreter;  to generalize the binding-time analysis from monovariant to polyvariant;  to develop an interactive tracer (similar to run-time debuggers) that allows the user to observe the analysis and generation processes in order to improve the behavior of his code under specialization. Аннотация. Специализация -это оптимизация программ на основе использования наперёд заданной информации о значении части переменных. Методы специализации программ развиваются с 1970-х годов (смешанные вычисления, частичные вычисления, суперкомпиляция). Однако удивительно, что после трёх десятилетий разработанные специализаторы до сих пор не достигли того уровня, когда они станут пригодны для широкого практического применения. Возникает вопрос: в чём же причина? Наша гипотеза состоит в том, что задача специализации требуют гораздо большего участия человека в управлении процессом специализации, анализе результатов, проведении компьютерных экспериментов, чем в случае обычной оптимизации программы в компиляторах. Требуется погружение специализаторов в привычные для программистов интегрированные среды разработки, включая создание соответствующих диалоговых средств. В данной статье описываются результаты разработки и реализации методов интерактивной специализации на основе частичных вычислений для подмножества языка Java. Реализация выполнена в рамках популярной среды разработки (IDE) Eclipse. Разрабатываются сценарии человеко-машинного диалога с подсистемой специализации, интерактивные средства для составления задания на специализацию и управление процессом специализации. Приводится пример успешного применения разработанного специализатора. Остаточная программа работает в несколько раз быстрее чем исходная.