FRIS Language Service for Extended Fortran Support in Microsoft Visual Studio

. This report deals with the construction of the language service for extended support of the Fortran programming language in the integrated development environment (IDE) Microsoft Visual Studio. The model and general approach for language service construction is offered. The proposed general model of a language service consists of five key blocks: the IDE integration block; the analysis block; the recognized elements storage block; the elements serialization/deserialization block; the elements view model block. The IDE integration block connects a language service with a basic IDE infrastructure. It’s responsible for subscription of Language Service for text editing events and for providing corresponding responses. The Analysis block is responsible for accomplishing lexical, syntactic and semantic analysis. It gathers all needed information about the elements of a programming language and puts it into the recognized elements storage block. The second task of this block is to provide information for syntax highlighting of edited text. The Recognized elements storage block is like a database of all elements needed for the Language Service operation. In general case, it is kind of a symbol table. The storage block could be filled from two sources: from analysis block, as a result of analysis of a source files, and from elements serialization/deserialization block, as a result of deserialization from a previously existing specialized program description in the case of using model of API (Application Programming Interface) for arbitrary programming libraries. The elements serialization/deserialization block performs two functions. Firstly, it allows saving the content of programming projects as XML files for description of API and documentation comments. Secondary, The proof of operability of proposed general model of a language service is given on the example of the FRIS language service developed by author. The material could be equally applied for construction language services both for other programming languages and for other development environments.


Introduction
Fortran [1], [2] is one of the first high-level programming languages. It was created in the 50s of XX century and it was intended for development of programs for scientific calculations. Fortran is still used by its intended purpose in the development of simulation programs. Nowadays the most widespread Fortran standard is Fortran 2003 [2] (however there is the Fortran 2008 standard, and the Fortran 2015 standard is in development stage). It cardinally differs from previous standards because it introduces the support of object-oriented programming in a Fortran language. This feature changes the language syntax, where many new statements are added in conjunction with new conceptions. Definitely, such modernizations are necessary, but at the same time they are objectively making the language more complicated. However these difficulties may be hidden or even eliminated, if the Fortranprogrammer will have appropriate assistance from the IDE in which he writes his programs code. The most widely used IDE on Windows is Microsoft Visual Studio. It is extensible and allows adding practically any feature into it. As an example, Visual Studio may be extended to support various programming languages. The most widely used Visual Studio integrations of the Fortran language are being developed in Intel [3] and PGI [4] in conjunction with corresponding compilers. However the supported features of those integrations significantly inferior to integrations developed by Microsoft, e.g. for C# programming language. Primarily it applies to the support of InlelliSense [5] technology, which consists of the following features: List Members, Parameter Info, Quick Info и Complete Word (table 1). It must be noted that in all implemented IntelliSense features, excluding those for intrinsic procedures, there is essentially absent any description of the elements except for their definitions. This great difference between Fortran support and support for languages, developed by Microsoft, became a key factor for author in the decision to implement the FRIS (Fortran Intelligent Solutions) language service, that is intended to cover this gap and implement all IntelliSense features to support Fortran-programmer in effective development of programs.

Making model of a language service
Language service [6] is responsible for providing language-specific support for editing source code in the Visual Studio IDE, or, generally speaking, in any IDE. Basic language service must by definition [7] to provide a program syntax highlighting, all other features, including the IntelliSense support, are extra (or extended) features. The main question that must be answered at first when starting a new language service development is what features are needed for a programmer. After that, those features must be ranked by priority (or by usability). Next, it is needed to identify the sources of data that must be used in the implementation of the language service. The main data source for any language service, no doubt, is source files containing programs text on a target language, but in some cases additional data sources may be needed. The next stage is to estimate implementation complexity of needed functions. This estimation may include as the IDE restrictions to different components of a language service, and the analysis complexity of the target programming language itself. After this the aggregate language service model is constructed, that reflects its major structural elements and interconnections between them. This report contains generalized and optimal, in author's opinion, language service model, which provides extended support for a target programming language. When the aggregate language service model is constructed, each of its structural elements is detailed according to specific requirements to implementation of different features, and also depending on the restrictions of the target programming language. Next in the report each of aforementioned steps in making language service will be examined in details, on example of the Fortran programming language, but the given material, without loss of generality, could be applied to any other programming language.

Analysis of requirements and the necessary features
The first thing, that definitely wants to see any programmer is a program syntax highlighting, for keywords, data type names, string literals, comments and so on. At the same time, it's important to provide the ability to configure such highlighting, for example, for significant to user procedure names and data type names of program libraries, say, OpenMP, MPI, and others. Such syntax highlighting helps to focus attention on the most important details. The second thing, that is important to a programmer, is the amount of provided context help, that at least must consist of the definition for a programming language element with which programmer works or wants to work (in the case of word completion lists). But in most cases the element definition is not enough to understand, how exactly the element must be used, as an example, a procedure that has more than a dozen parameters, some of which may be optional. In this case it's necessary to accompany the element definition with some meaningful description. When the data that must be provided to user, and, respectively, that must be collected and stored, are identified, the sources, from which this could be obtained, must be analyzed.

Analysis of data sources
The most obvious way to get the definitions of programming language elements is the analysis of program source files. The form of such definitions is fixed in the programming language standard, e.g. in the Fortran standard. The meaningful description of the elements may be obtained, if to complement the program text with comments in a special formdocumentation comments. The XML documentation comments are the standard for Visual Studio. So, the program text contains two languages: the base language -Fortran, and the embedded languagedocumentation comments language. It should be noted, that Fortran has a distinctive feature in using of the programming libraries. There are three ways to connect the programming library to the main Fortran project:  with source code files, that contains the library API, including procedure definitions, data types definitions and so on;  with compiled binary files of Fortran modules, that have a closed format, which understandable just by compiler. Those files also contains the library API definitions;  without any descriptions of library API. In such case the compiler will deduce the outer interfaces for used procedures, and will try to resolve external references by their names. In the first case, it is possible to analyze source file that contains the library API and get all necessary information from it, but in the other two cases, it's impossible to do so, and it's necessary to provide other mechanisms to get such information. As a basis for implementation of this task, was taken the idea that is used in the program for automatic documentation generation for so called managed applications -Sandcastle [8]. It uses two files for generation of program documentation: one with the API description, and the other with the documentation for the API. Fortran isn't managed language, so it's impossible to use the standard Sandcastle API format for description of its elements. Therefore the model for description Fortran API was developed in FRIS for this purpose. It is the XML file in the special format, which contains a description of main Fortran elements. FRIS can save (serialization) the structure of elements, which is obtained from the analysis of program texts, into XML format and restore (deserialization) Fortran elements from their XML representation. The XML model for Fortran documentation comments is also developed, including the features for its serialization and deserialization. This will allow to develop a special Sandcastle plug-in, and to use files of Fortran API and documentation comments description to automatically generate a developer or/and user help files.

Analysis of main operating characteristics of a language service
When developing a language service it's necessary to take into account that analysis of program texts will operate in a real time. This means that in most cases the text under analysis will be in the lexical, syntactic or semantic incorrect state, in terms of programming language specification. This peculiarity must be considered in the construction of corresponding analyzers. The second peculiarity is in the fact that the analysis for a syntax highlighting is carried out in Visual Studio line-by-line (one line a time). The analyzer, colorizer in terms of VS, is transmitted for analysis a string of text and the analyzer state in which it was at the end of analysis of the previous line. This means that the corresponding analyzer must be constructed with the ability to save its state in any time and to restore its work from any such state. This approach makes it possible to carry out incremental analysis, which is very important for large source files (approx more than 10000 lines). Then, when some lines are changed, it's necessary to analyze just the changed lines, but not a whole file. The third peculiarity that must be considered to create effective full-text analyzers is the need to take into account the state of source files. In terms of using program project source files in the IDE, file could be in a one of two essential states:  opened in editor;  doesn't opened in editor. In the first case, it's needed to accomplish full-text analysis of a source files, but in the second one it's possible to accomplish a simplified analysis to collect information about just externally visible program elements. For example, it's not necessary to analyze whole body of procedure, because information, say, about its local variables could be needed to user just in a moment of editing a procedure body, which automatically transfers file with procedure to the sate "opened in editor", and consequently, the other analysis rules will be applied to it. Thus the requirement to analyzer to operate in two modes, for convenience "full" and "simplified" analysis, will significantly increase the analysis speed of programming project source files.

General model of a language service
The author proposes the following general model for building any language services, which is the result of summarizing author's experience in developing FRIS ( Fig. 1  The IDE integration block contains interfaces implementation, which are required for interaction with IDE. It's responsible for subscription of a language service on the text editing editor events, and for corresponding responses, for example, for syntax highlighting and information providing for work of IntelliSense features. The analysis block is responsible for lexical, syntactic and semantic analysis. When it receives events from the IDE integration block, it performs appropriate actions. For example, in response to file open event or text changed event, it will provide the information for syntax highlighting. It's also responsible for providing source files analysis depending on their states. The recognized elements storage block is central data storage about all elements, necessary for language service. In general case, it is kind of a symbol table. The storage block could be filled from two sources: from analysis block, as a result of analysis of a source files, and from serialization/deserialization block, in the case of using model of Fortran API for any program libraries.
The elements serialization/deserialization block performs two functions. Firstly, it allows saving the content of programming projects as XML files for description of Fortran API and documentation comments. Secondary, it allows restoring the content of programming projects from their XML models. This approach reflects the dual nature of programming projects. Thus, for author of programming project, for example, program library, it is accessible in source files and it is perceived as "internal", but for a user of this library, it is perceived as "external", and its source files may be inaccessible to user. The elements view model block is a link, a kind of adaptor for elements of storage block to their representation needed by IDE integration block. Thus, recognized elements may contain some information that is not necessary to IntelliSense technology features, or on the contrary, does not contain some needed information. The elements view model is playing this interconnection role. It contains data types that are wrappers for elements of storage block, which fulfils requirements of the IDE integration block. There is also implemented various functions of filtering and selecting of different kinds necessary information. It could be said, that the storage block is like a database, and the view model block is like a data selection procedures.

IDE integration block
The IDE integration block connects a language service with a basic IDE infrastructure. In the case of Visual Studio, the base language service must implement the IVsLanguageInfo [9] interface. This interface is responsible for providing information about target language including its name, associated file extensions, and component for a syntax highlighting (colorizer). Colorizer must to implement the IVsColorizer [10] interface, which is responsible for providing character-by-character information about colors of buffered program text representation in memory. In order to provide the IntelliSense technology support it is needed to implement 5 additional interfaces [11]: IVsCodeWindowManager, IVsMethodData, IVsCompletionSet, IVsTextViewFilter and IOleCommandTarget.
To simplify for developers the task of creating new language services, and the other tasks of Visual Studio extension, Microsoft created MPF (Managed Package Framework) [12] library, which supplies a set of base classes that implements many needed interfaces, and thus provides to developers the ability to implement only the features that is needed to them. Let's take a brief look at the key classes that are necessary for the implementation of the language service and its various features. The LanugageService abstract class provides basic implementation of a language service. It contains a number of abstract methods responsible for different features of a language service, such as syntax highlighting, and initialization of full-text source files analysis in order to provide information for various IntelliSense features, and so on. The Source class is a source file abstraction in terms of a language service. It is used to store all information about edited file, as well as for interoperability with other language service model classes, which require information about current source file. In particular, it contains an instance of the Colorizer class, which is responsible for syntax highlighting. The Colorizer class implements IVsColorizer interface. This class is used by the core editor of IDE for providing of syntax highlighting in current source file. For even more flexibility and abstraction MPF Colorizer from concrete programming language, the scanner abstraction is used. The scanner must to implement IScanner interface. Each scanner is essentially a specialized lexical analyzer, which must to be able to save its current state and to restore its state for continuation of analysis as if it is doing a simple linear analysis of character stream.
The AuthoringScope class contains all information about a source file which is the result of parsing of this file. It is the central place for providing information for basic

Analysis block
The FRIS analysis block consists of two sub blocks: analysis for syntax highlighting and full-text analysis (in "full" and "simplified" mode) for a collection of information about elements in a source file. The FRIS analyzers are built with the ability to support sublanguages. In this case, the base language is Fortran, and sublanguages are any other languages, other than Fortran, that are used in the program text, for example, the XML documentation comments language and the OpenMP directives language. Fig. 2 shows the general scheme of working of the analyzers stack, on the example of analysis of a part of XML documentation comment. The base language analyzer (Fortran) generates tokens, which are then passed through a tokens filter. If token matches with one of registered sublanguages, the appropriate analyzer is called. The output is a set of fully recognized tokens for all supported languages. The peculiarity of work of a syntax highlighting analyzing block is that it is essentially some kind of extended version of a lexical analyzer, since there are strict requirements on the speed of operation of a syntax highlighting. Support for arbitrary program library in FRIS is, in particular, in the ability of a visual highlighting of their elements such as procedures, modules, data types, etc. Such highlighting is performed in a syntax highlighting block based on the current context. For any identifier under analysis the check depending on current scope is performed, whether it belongs to arbitrary library, which elements necessary to highlight. Then, if necessary, the identifier is highlighted with a defined earlier color.
The peculiarity of full-text analysis is in the used analysis strategy. Since the analysis is need to be performed in the real time, while the user modifies the text of program, all analyzers must to work in the error suppression mode. It must be noted that Fortran is very complicated language for analysis, because of its lexical and syntactical peculiarities. The most striking examples are:  the ability to use multiline tokens, for example, identifiers. Next is given the sample of a multiline identifier "my_id". The special attention must be given the fact that in between a start and end lines of any multiline lexeme, it is allowed to use comments and blank lines.
!another comment after blank line 5 &id  the absence of reserved keywords. The decision whether identifier is a keyword depends on a context of its usage in a statement. Therefore, it is not statements that are identified by keywords, as in languages with reserved keywords, but the keywords are identified by statements. Taking into account that analysis is performed in a real time, it is impossible to determine the identity of incomplete statement. For example, it is unclear, whether "if" is a keyword that belongs to conditional statement, or it is a name of an array, in the following part of statement: "if(".

Fig. 2. The general analyzers operation scheme
The emphasized peculiarities greatly complicate the development of analyzers for Fortran. But all of them are taken into account in FRIS. In particular, the optimistic parsing strategy is used. The parser processes a source file statement-by-statement. For every statement the abstract syntax tree (AST) is built. If the statement could not be matched, e.g. as a result of that the user just not has time to completely type it; the special AST is generated for it, which includes all mismatched tokens. In conjunction with a parser the full AST builder is operating (Fig.3). It builds the full AST from the individual statement ASTs. It also stores the AST that is already built. The builder task is to track operations of opening and closing of syntactical contexts, in particular their optimistic completion. For example, if now the operator "if(…)then" is analyzed, then according to standard, it could be completed only by "endif" statement. However, the user could not have enough time to fully type this statement, then the builder will interpret the "end" statement as a completion of a "if(…)then" operator. Similarly to it, if in the end of parsing of source file the stack of open contexts of the builder is not empty, then they are completing in a special modecompletion by the end of the file. It is also have ability of priority processing of high level element statements. For example, if the subroutine element is processed now, and as a result of a parsing the function element definition statement is discovered, then the current subroutine element is being completed with a special flag, and the function element processing is being started.

Fig.3. The FRIS parser operation scheme
Thus, the parser is always outputs the correct AST, which has no error nodes. This allows simplifying the semantic analysis algorithm. The semantic analyzer walks the AST and collects information about all needed Fortran elements, which then stores in the recognized elements storage block.

The recognized elements storage block
The recognized elements storage block is a central storage for all known in the current programming project elements (modules, data types, variables, etc.). It is filled from two sources: as a result of a source files parsing, and as a result of deserializing information about arbitrary libraries. This block is essentially a kind of a symbol table. Its design must take into account that information in it will be continuously updating as a result of the user editing of source files.
Consider the proposed generic model of the storage block (Fig. 4). It consists of following parts:  the class for a symbol table description;  the class for an interface description for a typical element of the programming language;  the class for an interface description for a typical scope of the programming language;  the classes describing specific elements of the programming language, that implement interfaces of a typical element and of a typical scope, for elements, which are scopes.

Fig. 4. The model of the recognized elements storage block
The class for a symbol table description must be built as indexed data storage, in order to effectively processing operations of update and elements search. For maximum flexibility it must store the references on the interface for a typical element, instead of references to specific elements. The specific element could be obtained from an abstract interface as a result of type casting. The following scheme of a symbol table is proposed ( Table 2). In this approach, firstly there is an access to all elements (Elements field). Secondly, for any project there is a list of its dependencies from other projects, which allows simplify a search procedure of needed elements, and to exclude from the search result the elements that is not visible in target project. Thirdly, every project contains a dictionary of its source files, and elements, which contained in every file that allows to effectively performing the update operations. The update operation is a result of a source file parsing operation, due to a text changes made by user. Thus, since all elements that are connected with file is known, so their deletion from other dictionaries and insertion a newly recognized elements, is a relatively simple task. Next consider the proposed interface for a typical element of a programming language (table 3). Every element must have at least a name, a scope, where it's defined, a description, for example, that is obtained from documentation comments, and a location. An element location consists from a declaration location and a definition location. Each of which is in turn consists from a file name, and an element region in it. Consider the proposed interface for a typical scope of the programming language (table 4). The scope, in a general case, is a container of elements. Every scope contains a reference to a parent scope and a list of elements that make up this scope. Every specific element of a programming language must be derived from an interface for a typical element, and if it is a scope, from an interface of a typical scope.

The elements serialization/deserialization block
The elements serialization/deserialization block is a key element for the implementation of a mechanism to support arbitrary user libraries. The serialization mechanism performs a saving of a given programming project in a form of two special XML files: description of Fortran API and description of documentation comments. The optional level of refinement could be additionally specified. In the case, when the serialization is performed for creation a developer documentation of a programming project, then all elements are saved, but in the case of creation a user documentation or interface for a programming project as an external library, then just externally visible elements are saved. It should be recalled that for each element in the Fortran module, could be specified the access mode: public or private. The public elements are externally accessible when the module is used, but the private elements could be used just inside the module and inaccessible outside of it. The deserialization mechanism operation is slightly different, because in deserialization there is just one operation modereading all information describing an arbitrary library. In this case, even if there will be provided XML files, that contains full description of arbitrary library, only externally visible elements will be read. This allows reducing the amount of memory needed to store a library description, and also eliminates the need to store elements, which will not be accessed to user under no circumstances, for example, private module elements, or internal elements of procedures.
For serialization and deserialization are used the models for description of Fortran API and XML documentation comments, that is developed by author and are expressed in the form of appropriate XML Schema Definitions (XSD) [13], [14]. Let's consider each of these models. The model of Fortran API (Fig.5) allows describing external interfaces of any library as a Fortran interfaces. The meaning and purpose some of the model elements are given in table 5.  As can be seen from the above figure, tag "apis" contains a description of all project elements. The tag "api" is used for a direct element description. In order to uniquely identify the type of element: a module, a function, a subroutine, a data type and so on, the special switches, like a "moduledata" tag, are used. One more remark should be made regarding the tag "elements", which is used to describe the internal elements of current element. It's allowed to specify here referencesfully qualified element names, and their description place next in a main "apis" tag, and also it's allowed to provide the description of child elements directly in this tag.
It should be noted that description of Fortran API may be used for a creation of Fortran procedure interfaces for their calls from other programming languages, that is solves the inverse problem. Consider the model of documentation comments. It conceptually consists of two interconnected parts: a description of documentation tags for documenting program elements (Fig.6), and a description of documentation comments XML file format (Fig. 7). The meaning and purpose of the model elements are given in table 6. For description of any element may be used 4 tags, two of which are high-level: "summary" and "remarks", and other two are nested, it means that they could be used just inside of other tags: "see" and "para". In addition to them, for description of:  derived type parameters is used "typeparam" tag;  arguments of subroutines, functions and entry points is used "param" tag;  result of function is used "result" tag.

The elements view model block
The elements view model block is a link between the IDE integration block and the data storage block. It performs two basic functions: converts a data from a storage block to a form required by the IDE, and performs various search operations in a storage block. The convert operation of stored data to the form required by the IDE produces elements that are complemented by the properties of visual representation. For example, such properties as text color and element icon, which used in various completion lists, are set. In other words, the elements view model block contains various aspects of data presentation to user. Thus the structure of the view model block is analogue to the structure of the storage block. It also defines interfaces for typical presentation elements and scopes, and a set of their specific implementations for each element of the storage block. The second function of this block is the search function. Here are performed various operations of elements resolution in a scope, a search for elements with the specified name and type, etc. That is, it performs the selection of needed elements from the storage block that taking into account a different aspects of a programming language. Then, selected data converted to the form required for user representation.

Proof of concept
The FRIS language service is built on the basis of the general model of a language service, and implements all described blocks. Figures 8-13 are examples of work of its various functions, proving the presented conception of a generalized language service model, including providing extended support for user libraries. Consider the pivot table of the language services from Intel, PGI and FRIS (table 7).