Intelligent Design of Class Structure Model based on Ontological Data Analysis

. This paper investigates a formal approach which supports a critically significant step in object oriented analysis and software engineering. It is proposed to create an object class structure model based on an Ontological Data Analysis of a targeted domain empirical data. This technology is a development of the well-known method of Formal Concept Analysis and is able to work with incomplete (contradictory, inaccurate, vague, etc.) empirical information on domain, naturally supports the construction of arbitrary binary relationships between classes of objects and takes into account available to researcher information about the interconnection between actual for the designer domain objects properties. Multi-valued vector logic models and means are usedin order to factor in the realities of the empirical data accumulation.In concurrence with this a nonstrict formal context is being formed to display the conceptual domain structure. In this context truth values of basic semantic proposition of the form “x object has y property” are presented in a vector form. Its transformation into a binary formal context, for which formal concepts output effective algorithms are known, is done using intellectual alpha approximation algorithm which takes into account typical relationships between the objects properties and, above all, a conceptual conjugation of object properties arising from the fundamental cognitive designer’s procedures – conceptual scaling of the objects properties detected. A properties inclusion partial order between derived from the context formal concepts appears which is known as inheritance of properties in object-oriented analysis. Determined by this ratio a formal conceptclosed lattice is transformed into a model that describes an objects class structure, according to a number of pragmatic design principles of this key software component.


Introduction
Creating a Class Structure Model in object-oriented (OO) analysis and software engineering still remains an expert's experience realization subject [1][2][3][4][5][6][7]. Object and classes are the basis for the all next steps of analysis, however they "are there just for picking" (i.e. naturally appear in a statement of a problem) or are borrowed from colleagues (with or without any modification) [5]. In other words in practice there is no any systematic procedure or formalism supporting the critical for the further software engineering step. At the same time the majority of coryphaeus in OO methodologies pointed out the necessity of a certain conceptual analysis of domain for "concepts" description. That is why a strict mathematical theory Formal Concept Analysis (FCA) [8] enthused object-oriented analysis and software engineering experts. Numerous researches and developments using FCA for creating Class Structure Model were accomplished. For example [9][10][11]. FCA is a theoretically well-founded and actively developing method of data analysis, which reflects the classical approach to a concept as a fundamental epistemological element defined by extent and intent. Let's illustrate FCA's potential by an example of well-known OO methodologies taxonomy generating in terms of their diagram techniques [10]. Table 1 describes the match I between two sets: set of "objects" Gmethodologies and set of "attributes" Mtechniques.
The tuple K = (G, M, I) -Formal Context (FC)puts together the basic data for FCA. Particularly, using FCA methods we can establish from K: wherein: (X, Y) -Formal Concept; X -extent and Y -intent of a concept;

Diagram types
OO methodologies «'» -Galois statements; X ' = {mm  M, g  X: gIm} the set of attributes common to all objects in X; Y ' = {gg  G, m  Y: gIm} the set of objects that have all attributes from Y;  complete concept lattice (B(K); ), in which sub-/super concept relation Table 1 we can extract a set of formal concepts, shown in Table 2. Meaningfully these are all generalizations of OO methodologies in the aspect of diagram techniques. Partial order ("inheritance") between extracted concepts is shown in the grid in Fig. 1. Therefore FCA delivers domain's conceptual structure from available data in the form "objects-attributes". This structure was proposed by FCA's protagonists as a basis for creating a model describing the designed software class structure. However, it emerged that FCA usability is limited.
 Construction of arbitrary relationships between object classes is not supported, except for the generalization relationship "is-a".
 Contradictions in the original dataa set of Basic Semantic Propositions of the form "x object has y attribute" are prohibited. Especially the possibility of taking into account the evidence "for" and "against" the truth of such judgments.  Available to the designer information about the relationship between attributes of objects is ignoredthe so-called attributes' "constraints of existence". Although it is somewhat dampened the interest in FCA in software engineering, the method continued to develop, especially in the field of ontological modeling, for example [12,13]. The main point of this paper is to draw developers' (especially, class structure model designers) attention to Ontological Data Analysis (ODA), the FCA evolution which can process vague and controversial data of modeled reality, discover arbitrary relationships between object classes and consider properties' limits of existence [14][15][16]. The topic of the article comes out in Fig. 2 diagram of ODA realization for class structure model design.

Ontological Data Analysis and Formal Concept Analysis
ODA is a customization and a pragmatic readjustment of FCA. For FCA primary source of initial data is a multi-valued context -"objectsattributes" incidence (OAI) where observed domain objects' attributes of researcher's interest are noted. In ODA the format of OAI is getting more complicated in order to represent domain empiric information, such as multiple independent object's attribute records, discovering the same attribute with procedure sharing, confidence differentiation for different sources of information etc. Besides that, as long as relations presence in ODA is treated as objects' inner attributes demonstration, in OAI special associated attributes-valences pairs are used to represent arbitrary binary relations. This approach allows us to naturally "insert" a modeling of arbitrary relations between objects to FCA [15]. Only "weak" Basic Semantic Propositions' estimations for domain could be extracted from such generalized OAI. These estimations form in ODA a non-strict FC for conceptual framework extraction. Whereas for FCA usage a binary FC is necessary. Therefore ODA offers an approach for generating such FC from initial non-strict FC.

Non-strict Formal Context generation
In OAI (general scientific form for logging empirical information) rows correspond to domain objects, columns correspond to set of objects' attributes that are recorded by measurement procedures available to the analyst.
procedures used to estimate the value of the attribute mj  M, where any procedure pr(j)k has a degree of confidence in its results t(j)k.
 A = (aij)i=1,…, m; j=1,…, nmatrix of measurements series results Sе of attributes M of objects from sample G * , made using measurement procedures Pr. This matrix elements can be linguistic constants NM, None, Failure and X: Nonea result that demonstrates a finding of a measured attribute value outside of sensitivity threshold and the dynamic range of a measuring instrument; it shows a "semantic mismatch" of the object and the measuring procedure etc; Failurea result that records measurement failure (denial, measurement means malfunction, abstention, etc); NM (not measured)a result indicating that as a matter of fact in this series of measurements corresponding property was not measured; X replaces any symbol of scales of dynamic ranges of measurement procedures Pr. Non-strict FC is a tuple (G * , M, I), where G * -empirical training set of missile defense, M -number of attributes of objects recorded by measuring procedures available to the researcher, I -matrix estimates all the Basic Semantic Propositions, each element bij determined in accordance with the multi-valued logic V TF vector True, False [17]: ij formed certificate confirming the Basic Semantic Proposition and the component False b  ij -denying it. Building a non-strict incidence "objects-attributes" I begins with the transition from the primary data, structured in the form of a matrix A, to their semantic interpretation in the form of non-strict incidence "series-procedures" I': where T, F and N -truth constants V TF logic of "True", "False" and "Neutral" respectively.
Then line I' is transformed into a non-strict incidence "objects-attributes" I by combining the truth values of basic semantic judgments obtained for the object gi in all series, and property mj -all procedures (taking into account confidence in each procedure). Alignment is performed on various compositional rules V TF logic [17].

Creating a binary Formal Contexts
Incidence "objects-attributes" I of non-strict FC can be expanded in his binary alpha-section, for example, In practice, alpha-section I () usually used as an approximation of so called «-approximation» the original was not-strict incidence I. However, this method in the problem of forming a binary FC on its lack of rigor prototype is generally incorrect because the set of measured properties of M may exist a priori relationship "constraints of existence". Characteristic types of this kind of binary relations are considered in [18]. So a couple of properties mj, mk  M, j ≠ k for each object data domain (and hence, for gi  G * ) can be:  inconsistent if, possessing property mj, object gi obviously does not have property mk, and vice versa;  caused if, possessing property mj, object gi indisputably has the property mk, although the reverse may be wrong;  interdependent if possessing property mj, object gi definitely has the property mk, and vice versa. The usual method of alpha-section is insensitive to such relations. Therefore, its application to the formation of a binary FC original non-strict context may lead to a violation of "constraints of existence." The idea of intelligent alpha-sectional non-strict FC is available for the formalization of context "constraints of existence" as a single predicate "-section correctly" with argument "Threshold  of confidence in the source data" followed by the identification of the tolerance range , delivering such a predicate True.
In general, set the specified area for non-strict FC is very difficult; it is possible and that it is empty. Therefore, to solve the problem correctly binary approximation non-strict FC in the ODA path is a reasonable compromise. Work with a common threshold of confidence  proposed to replace the manipulation of a set of thresholds of confidence in the data fragments that describe each object gi  G * at the level of each separately taken "constraints of existence". A very important case is when the inconsistently of attributes is the result of a fundamental cognitive procedure, known in FCA as a conceptual scaling [8]. This case is considered in [16], where proposed the method of rational alpha-section nonstrict FC.

Formation of Class Structure Model
Analysis of binary FC allows deduce all the formal domain concepts. Formal concepts are partially ordered by inclusion of extensions (the extension of the concept -a set of objects, which are described by means of this concept) and form a complete lattice [8]. To use this result in the design of the software necessary to transform formal concept lattice in Class Structure Model. Formal concepts according to the formation of their extensions are divided into three types:  The concepts of the first type describe objects really exist in the analyzed domain. These concepts define a class of objects that deserve the naming of "fundamental".
 The concepts of the second kind -only generalize other notions. In software design these classes are known as "virtual".
 The third type of concepts is characterized by combining these features concepts first and second kinds. When designing the Class Structure Model pragmatic considerations require confine fundamental and virtual classes of objects. In general, you can specify the following principles of formal concept lattice transformations in Class Structure Model:  all the concepts of the lattice are candidates for fundamental classes of the model;  the fundamental class becomes the minimum (in the terminology of lattices) concept containing the object in its extension;  attribute is preserved to the maximum of the concepts contained this attribute in its intension;  the highest concept lattice (his sign -power extension equal to the of objects) is certainly excluded from the model, if its intention is empty;  the smallest concept lattice (his sign -the power intention equal to the power set of attributes) are known to be excluded from the model if its extension is empty;  analysis of candidates in the fundamental classes begins with the smallest concept, and conducted by levels nearest super-concepts. Algorithm which follows those principles is shown in table 3.

Table 3 -Concept lattice conversion into a class structure model algorithm
Step Operation   1 The original version of the model is formed as a copy of the formal concept lattice.

2
In the model is searched the greatest concept.
If the intension of this concept is empty, it is excluded from the model with break his ties with sub-concepts.

3
In the model is searched the smallest concept.
If extension of the smallest concept is empty:  this concept is excluded from the model with the breaking its ties with super-concept;  a set of candidates in fundamental classes is formed of his closest super-concepts. If extension of the smallest concept is not empty, then a set of candidates in fundamental classes is formed of one smallest concept. 4 Loop through a set of candidates.

4.1
For each super-concept of the candidate under consideration excludes objects from extension that are within the extension of this candidate (the extension super-concept is always not less than the extension sub-concept).

4.2
In consideration of the candidate from the intension excludes any attribute that is part of the intension of at least one super-concept (a combination of all super-concept's intension is always not more than concept intent, which they are).

4.3
If the candidate has no sub-concepts, it is recorded as the fundamental class. In such case one of two alternatives is implemented:  if the candidate has no sub-concepts, it is recorded straight as a fundamental;  otherwise for this candidate creates a new sub-concept, in which the extension is transferred (and only extension) of the candidate. This new sub-concept is fixed as the fundamental class of objects. The intension of such fundamental class is empty. The candidate is retained in the model as a virtual class with an empty extension.

4.4
Promising set of concepts-candidates is unalterably filling with super-concepts of a current candidate.

5
Promising set of candidates is being reduced: remains only root concepts of generalization relationship, which is determined in a promising set of concepts-candidates. 6 If a set of promising candidates is not empty, then algorithm repeats from Step 4.

7
Classes with an empty extent and intent are excluded from a formed set. These could be only intermediate (i.e. not root or node class) classes of developed taxonomy.
In Fig. 3 class taxonomy after converting the formal concept lattice, shown in Fig. 2. Conspicuous is the fact that concepts 13 and 17 (highlighted in Fig.2) are absent in this taxonomy. Both of these concepts are losing their extent and intent after the conversion. Besides that in Fig. 3 the concept number 5 determines a fundamental class (all similar classes are highlighted) in order to describe the intent (and only intent) for which a special virtual class 05 implemented into the model.

Conclusion
Formal Concept Analysis (FCA) has shown its benefits in many application areasincluding the field of Software Engineering. Its use is especially valuable in the early stages of software development associated with the identification of a domain object types (classes) and relationships between these types. Methodical equipment of the Ontological Data Analysis significantly expands and strengthens these advantages:  can deal with incomplete and contradictory information about the data domain, namely a situation is typical for the beginning of the software life cycle;  organically describes and analyzes arbitrary relations between classes of domain;  take into account numerous priori known analyst relationship between the properties of domain (actually an additional cognitive resource that did not use the classic FCA). Finally, the arsenal includes ODA pragmatically oriented algorithm for transforming formal concept lattice model in describing the structure of the classes. Formed model differs in that only describes two kinds of classes with a fundamentally different technical realization.