Abstract
Multiview models, as the name stipulates, are models capturing real-world system perceived from different points of view (perspectives), typically engaging locally available features (attributes, input variables). When considered together, a collection of multiview models has to be aggregated. Multiview models also arise in the presence of data with a massive number of variables when building a monolithic model involving all attributes is neither feasible nor computationally sound. In this paper, two categories of scenarios have been formulated and discussed by focusing on fuzzy rule-based architectures. An important task when building an aggregate of multiview models is to equip the overall global model with a sound measure of quality, by using which, one can efficiently assess the relevance of the individual results produced by the rule-based models. It is, therefore, advocated that the quality of the results can be quantified by an output information granule rather than a single numeric outcome. In the two scenarios outlined above, the results produced by a family of multiview models are aggregated with the use of the augmented principle of justifiable granularity-one of the fundamentals of Granular Computing. It is also advocated that the diversity of the results delivered by multiview models can be captured and quantified in the granular form of the produced result. The related optimization criterion along with the associated optimization process are discussed.
With the visible advancements and a broad spectrum of applications of rule-based models and fuzzy rule-based models, in particular, two open design questions start to surface more vividly:
(1) Highly-dimensional data. The first design question is about developing rules for highly dimensional data. Such data come from problems in which a large number of independent variables are encountered. While large masses of data with quite a limited number of variables are manageable by engaging specialized computing environments (e.g., Hadoop or Spark), the high dimensionality of input space implies an eminent problem whose essence arises due to a so-called concentration effec
(2)Variety of data sources. The second compelling question is implied by practical scenarios where the knowledge about some system (phenomenon) arises from different sources. The system can be described by different features (attributes) depending upon situations and available resources used to collect data. Some variables could not be accessible as there are no sensors or there are limited abilities to access data. This gives rise to a collection of so-called multiview model
In both categories of situations described above, we encounter a collection of results, which differ from each other. They require to be reconciled in some fusion (aggregation) process giving rise to a result of a global nature. We advocate that in light of the existing diversity of multiview models or models built on a basis of subsets of features, the result can be described as an information granule whereas the granularity is reflective of the existing differences among locally obtained data being then subject to aggregation.
There are two main objectives of this study:
(1) To construct a suite of low-dimensional models to alleviate the detrimental aspect of the concentration effect. In this study, we resort to building a slew of one-dimensional rule-based models for which the design overhead becomes minimal.
(2) To develop a mechanism of granular aggregation of results provided by multiview models.
Along with the above objective, two general design schemes are sought, which could be schematically portrayed as follows
(1)Individual models-aggregation-granular evaluation of aggregation result.
(2) Individual models-granular evaluation of the models-aggregation-granular evaluation of aggregation result.
The paper is structured as follows. Section 2 is devoted to the design of one-dimensional rule-based models; we discuss their characteristics and several development alternatives calling for different levels of optimization activities. In Section 3, we highlight the relational property of the data associated with the number of input variables; this property is also quantified. The principle of justifiable granularity is covered in Section 4. The augmentation of the principle including preference profiles is included in Section 5. The aggregation operators are discussed in Section 6. The overall architecture involving granular aggregation and its refinements are discussed in Section 7 and 8, respectively.
In the study, we consider all data assuming values in the unit interval.
These single-input rule-based models come in the following for
if x is Ai then y=fi(ai, x) | (1) |
i=1, 2, …, c. Here Ai is a fuzzy set defined in the input space while fi is a local function forming the conclusion of the rule. The vector of parameters of the function is ai. The rule-based models are endowed with the inference (mapping) mechanism carried out as follows:
y = | (2) |
In what follows, with an ultimate intent to eliminate any optimization overhead associated with the design of the rules, we outline a design process which is effortless not calling for any optimization procedures. Consider the data coming in the form of input-output pairs D = (xk, targetk), k=1, 2, …, N.
(1)Ais are triangular fuzzy sets with 1/2 overlap between the adjacent membership functions. The modal values of these fuzzy sets are distributed uniformly across the input space.
(2)The conclusion part is a constant function, namely f(x; ai)=yi* where these constants are determined on a basis of the data. We have
yj*=/ | (3) |
One can easily show that under these assumptions, the input-output mapping of the above rule-based model is nonlinear and realizes a piecewise linear function; refer to

Fig. 1 Input-output piecewise-linear characteristics of the fuzzy rule-based model
The nonlinear model (function) produced in this way is fully described by the coordinates of the cutoff points (mj,yj*), j=1, 2,…,c. These points are specified by the modal values of the fuzzy sets and the constant conclusions.
Interestingly, the rule-based model can be regarded as a result of multiple linearization of unknown mapping where the linearization is completed around the modal values of the fuzzy sets Ai. Linearization is commonly used in coping with nonlinear problems; the multi-linearization (viz.linearization with several linearization (cut off) points at the same time) arises as an efficient strategy. Not engaging any optimization process (which eliminates any computing overhead), we form the rules.
If required, the improvement of the performance of such rule-based model can be achieved in several ways:
(1) Increasing the number of rules which amounts to the increase of the number of the linear segments used to approximate the nonlinear function
(2) Optimizing membership functions of Ai; their location across the unit interval could be optimized
(3) Optimizing constant conclusions of the rules. Here we can determine the constants by solving the following optimization problem:
Q = | (4) |
where the minimization is carried out by adjusting the values of y
y = | (5) |
(4) Incorporation of a polynomial format of the functions forming the conclusion parts of the rules. Considering Ais as above, when the conclusion of the rule is a polynomial of order p, the input-output characteristics of the model is a polynomial of order p+1. It is easy to demonstrate this. For any x in the interval [mi, mi+1] there are two rules activated; mi and mi+1are the modal values of the corresponding fuzzy sets. This yields the output in the following way y= miPi(x, ai)+ mi+1Pi+1 (x, ai). Note that mi and mi+1 are linear functions of x, say mi= b0i+b1ix and mi+1= b0,i+1+b1,i+1x. Therefore y = (b0i+b1ix )·Pi(x, ai)+ (b0,i+1+b1,i+1x )Pi+1 (x, ai). It is apparent that y is a polynomial of order p+1.
The construction of a one-dimensional model does not require any design effort and associated computing overheard. There are some limitations. When dealing only with a single input variable when forming input-output model, we encounter an emerging relational phenomenon of data. This means that for two very closely located inputs, the outputs could vary significantly. In particular, because all but a single variable is considered, one might encounter an extreme situation when for the same input we have two or more different outputs. Note that if the distance between xk and xl becomes smaller in comparison to the distance between yk and yl, we say the data exhibit a more visible relational character. In limit, if for the same xk and xl(xk=xl) we have yk different from yl, and these data cannot be modeled by a function but a relation.
We are interested in the quantification of the degree to which extent the data are of relational nature, viz.
rel= degree (data exhibit relational character)
For instance, intuitively we envision that the data in
relkl= | (6) |

Fig. 2 Input-output data and their relational nature
where a small value of δ, δ>0, prevents from the division by zero. Note that when |xk—xl| becomes lower for the same value of |yk—yl| higher than |xk—xl|, this increases the value of the relationality degree. The global index is determined by taking a sum of relkl for the corresponding pairs of the data, namely
rel= | (7) |
The higher the value of rel, the more evident the relational nature of the one-dimensional data D.
Note that this index exhibits some linkages with the Lipschitz constant K expressing the relationship |yk-yl|<K|xk-xl|.
To illustrate the relational performance of some data, we consider publicly available datasets coming from the Machine Learning Repository https://archive.ics.uci.edu/ml/index.php.For instance, for concrete data,

Fig. 3 Concrete data: rel

Fig. 4 Abalone data: rel

Fig. 5 Superconductivity data: rel
The principle of justifiable granularity guides a construction of information granule based on available experimental evidenc
In a nutshell, when using this principle, we emphasize that a resulting information granule becomes a summarization of data (viz. the available experimental evidence). The underlying rationale behind the principle is to deliver a concise and abstract characterization of the data such that (1) the produced granule is justified in light of the available experimental data, and (2) the granule comes with a well-defined semantics meaning that it can be easily interpreted and becomes distinguishable from the others.
Formally speaking, these two intuitively appealing criteria are expressed by the criterion of coverage and the criterion of specificity. Coverage states how much data are positioned behind the constructed information granule. Put it differently, coverage quantifies an extent to which information granule is supported by available experimental evidence. Specificity, on the other hand, it is concerned with the semantics of information granule stressing the semantics (meaning) of the granule. We focus here on a one -dimensional case of data for which we design information granule.
Coverage and specificity
The definition of coverage and specificity requires formalization which depends upon the formal nature of information granule to be formed. As an illustration, consider an interval form of information granule A. In case of intervals built on a basis of one-dimensional numeric data (evidence) y1, y2, …, yn, the coverage measure is associated with a count of the number of data embraced by (contained in) A, namely
cov(A)= card {yk |yk A}/n | (8) |
card (.) denotes the cardinality of A, viz. the number (count) of elements yk belonging to A. In essence, coverage exhibits a visible probabilistic flavor. Let us recall that the specificity of A, sp(A) is regarded as some decreasing function g of the size (length, in particular) of information granule. If the granule is composed of a single element, sp(A) attains the highest value and returns 1. If A is included in some other information granule B, then sp(A) > sp(B). In a limit case, if A is an entire space of interest sp(A) returns zero. For an interval-valued information granule A = [a, b], a simple implementation of specificity with g being a linearly decreasing function comes as
sp(A)=g(length(A))=1 | (9) |
where range stands for an entire space of interest over which intervals (information granules) are defined.
The criteria of coverage and specificity are in an obvious relationship, as shown in

Fig. 6 Relationships between abstraction (coverage) and specificity of information granules of temperature
From the practical perspective, we require that an information granule describing a piece of knowledge has to be meaningful in terms of its existence in light of the experimental evidence and at the same time, it is specific enough. For instance, when making a prediction about temperature, the statement about the predicted temperature 17.65 is highly specific but the likelihood of this prediction being true is practically zero. On the other hand, the piece of knowledge (information granule) describing temperature as an interval [—10, 34] lacks specificity (albeit is heavily supported by experimental evidence) and thus its usefulness is highly questionable, as such this information granule is very likely regarded as non-actionable. No doubt, some sound compromise is needed. It is behind the principle of justifiable granularity.
Witnessing the conflicting nature of the two criteria, we introduce the following product of coverage and specificity:
V = cov(A)sp(A) | (10) |
The desired solution (viz. the developed information granule) is the one where the value of V attains its maximum. Formally speaking, consider that an information granule is described by the vector of parameters p, V(p). In case of the interval, p=[a,b]. The principle of justifiable granularity applied to experimental evidence returns to an information granule that maximizes V, popt = arg pV(p).
To maximize the index V through the adjusting the parameters of the information granule, two different strategies are encountered
(1) a two-phase development is considered. First a numeric representative (mean, median, modal value, etc.) is determined. It can be sought as an initial representation of the data. Next, the parameters of the information granule are optimized by maximizing V. For instance, in case of an interval [a, b], one has the bounds (a and b) to be determined. These two parameters are determined separately, viz. the values of a and b are determined by maximizing V(a) and V(b). The data used in the maximization of V(b) involves these data larger than the numeric representative. Likewise V(a) is optimized based on the data lower than this representative.
(2) a single-phase procedure in which all parameters of information granule are determined at the same time. Here a numeric representative is not required.
The two-phase algorithm works as follows. Having a certain numeric representative of X, say the mean, it can be regarded as a rough initial representative of the data. In the second phase, we separately determine the lower bound (a) and the upper bound (b) of the interval by maximizing the product of the coverage and specificity as formulated by the optimization criterion. This simplifies the process of building the granule as we encounter two separate optimization tasks
(11) |
We calculate cov([r, b]) = card {yk| yk ∈[r,b]}/n. The specificity model has to be provided in advance. Its simplest linear version is expressed as sp([r, b]) = 1— |b—r|/(ymax—r). By sweeping through possible values of b positioned within the range [r, ymax], we observe that the coverage is a stair-wise increasing function whereas the specificity decreases linearly, (see

Fig. 7 Example plots of coverage and specificity (linear model) regarded as a function of b
The determination of the optimal value of the lower bound of the interval a is completed in the same way as above. We determine the coverage by counting the data located to the left from the numeric representative r, namely cov([a, r])= card {yk| yk ∈[a,r]}/n and compute the specificity as sp([a, r]) = 1— |a—r|/(r—ymin).
The algorithmic essence of the principle is captured in

Fig. 8 Two step-design of interval information granule
As a way of constructing information granules, the principle of justifiable granularity exhibits a significant level of generality in two essential ways. First, given the underlying requirements of coverage and specificity, different formalisms of information granules can be engaged. Second, experimental evidence could be expressed as information granules articulated in different formalisms based on which certain information granule is being formed.
The principle of justifiable granularity highlights an important facet of elevation of the type of information granularity: the result of capturing a number of pieces of numeric experimental evidence comes as a single abstract entity—information granule. As various numeric data can be thought as information granule of type-0, the result becomes a single information granule of type-1. This is a general phenomenon of elevation of the type of information granularity. The increased level of abstraction is a direct consequence of the diversity present in the originally available granules. This elevation effect is of a general nature and can be emphasized by stating that when dealing with experimental evidence composed of a set of information granules of type-n, the result becomes a single information granule of type (n+1).
The principle of justifiable granularity can be further extended by introducing a so-called error profile.
The coverage criterion conveyed by Eq.(8) reflects how information granule covers the data. Quite often in modeling environment one obtains information granule whose quality has to be evaluated vis-à-vis error viz the difference between the numeric result of the model and the numeric experimental data expressed as ek = targetk —yk where yk is the k-th result produced by the model. Obviously, we anticipate that a sound model should produce values of error close to zero however a notion of close to zero requires more elaboration. Here a concept of error profile comes into the picture.
The error profile f(e) is an information granule (typically, a fuzzy set) whose membership function describes in detail to which extent the error is acceptable. Some examples of the profiles are shown in

Fig. 9 Examples of error profiles f(e)
The error profile is made a part of computing concerning the coverage criterion, namely in calculations of b one has
(12) |
where incl (x, [a, b]) is a Boolean predicate such that it returns 1 if x is included (covered) in the interval and 0 otherwise. The computing of specificity is carried out as before.
The principle can also involve weights and in this situation, they are a part of the coverage criterion. The values of the weights are determined based on the performance of the individual local models. Given that their performance is described by the index Q1, Q2,…, Qn (those could be the results of computing the RMSE value produced for each model), the weight can be made a decreasing function of the Qk, say wk= exp(—(Qk—Qmin)/(Qmax—Qmin) with Qmin and Qmax being the values of Q for the best and the words model.
The coverage expression comes in the form of
(13) |
The accommodation of both the weights and the error profile gives rise to the following expression:
(14) |
The data y1, y2,…, yn are to aggregated (fused). Formally, an aggregation operation agg: [0,1
(1) Boundary condition agg(0,0,…0) =0, agg(1, 1, …,1) =1
(2) Monotonicity agg
(15) |
Triangular norms and conform
(16) |
When p is a certain parameter, the class of generalized mean is made a generalized class of operators. The averaging operator is idempotent, commutative, monotonic and satisfies the boundary conditions agg (0, 0, …, 0) =0,agg(1, 1, …,1) =1.
Depending on the values of the parameter p, there are several interesting cases
p =1 arithmetic mean agg(y1, y2, …, yn)
p 0 geometric mean agg(y1, y2, …, yn)= (y1y2…yn
p =—1 harmonic mean agg(y1, y2, …, yn)=
p maximum agg(y1, y2, …, yn)=max (y1, y2, …, yn)
p minimum agg(y1, y2, …, yn)=min (y1, y2, …, yn)
The overall architecture of the global model is shown in

Fig.10 Granular aggregation of multiview models
Here we encounter n one-dimensional rule-based models M1, M2, …, Mn followed by the aggregation module where the results are aggregated. In contrast to commonly studied methods of aggregation, an important point is that the aggregation of numeric results gives rise to an information granule. The granularity of the results is crucial to the evaluation of the quality of the overall architecture. Again, here we look at the coverage and specificity criteria as means to evaluate the obtained result.
In more detail, let us consider that the models were constructed based on the input-output data (xk, targetk) where xk is an n-dimensional vector of inputs, k=1, 2, …, N. The model Mi is constructed by taking the i-th coordinate of the input data, viz (xki, targetk). The coverage is expressed as:
(17) |
while the specificity is given in the form of
(18) |
The quality of the architecture is expressed as the product of these two criteria; cov*sp the higher the product, the better the quality becomes.
The one-dimensional models are not ideal (especially because of the relational format of the data). We augment the numeric output of the model by its granular extension by admitting that it comes as an interval of some level of information granularity e spread around the numeric result produced by the rule-based model. Information granularity is reflective of the diversity of the results.
Let us refer to the piecewise linear characteristics of the model, (see

Fig. 11 Piecewise relationships with interval-valued cutoff points
Recall that the input-output relationship is fully described by the cutoff points (mj, y
(19) |
This yields the granular output Y for given x
(20) |
stands for the interval multiplication. This gives rise to the following interval Y= [
For any xk in the data set, we determine the corresponding Yk and next compute coverage and specificity, V = cov*sp. Evidently V is a function of e so its value has to be optimized by searching for the maximal value of V, eopt = arg maxeV(e).
Progressing in this way with all the one-dimensional models we obtain associated optimal levels of information granularity, e1, e2, …, en. As a result, for any x, these models return Y1, Y2, ..Yn. This gives rise to the augmented architecture illustrated in

Fig. 12 Augmented architecture of the model: note elevation of type of information granularity when progressing towards consecutive phases of aggregation
Noticeable is a fact that the arguments entering the aggregation process are information granules themselves. This implies, in light of the principle of allocation of information granularity, the result becomes an information granule of higher type than the arguments being aggregated, viz. in this case so-called granular intervals. They are intervals whose bounds are information granules themselves. One can denote the granular interval as
The detailed calculations concerning the granular bounds of the information granule are carried out by engaging lower bounds of Y1, Y2, …, Yn. Likewise the computing of the upper bound of
In his paper, we formulated and delivered a solution to the problem of granular aggregation of multiview models and identified a sound argument behind the emergence of the problem. It is demonstrated that a highly dimensional problem can fit well the developed framework. It has been advocated that the aggregation mechanism producing information granules helps quantify the quality of fusion and reflect upon the diversity present among the results produced by the individual models. The principle of justifiable granularity becomes instrumental in constructing information granules. It is also shown that the elevation of the type of information granularity (to type-1 or type-2) becomes reflective of the increased level of abstraction of modeling.
There are several directions worth pursuing as long-term objectives. While in this study, for illustrative purposes, we engage interval calculus to realize the discussed architecture, other alternatives of formal frameworks, say fuzzy sets and rough sets, are to be discussed. The general architecture remains the same; however, some interesting conceptual and computing insights could be gained in this way. Architecturally, we studied a two-level topology: a collection of one-dimensional rule-based models followed by an aggregation module. An interesting alternative could be to investigate low-dimensional rules (with two or three conditions, which is still feasible) and ensuing hierarchical structures along with a granular quantification of the model.
References
DYCKHOFF H, PEDRYCZ W. Generalized means as a model of compensative connectives[J]. Fuzzy Sets & Systems, 1984, 14:143. [百度学术]
HOLTE R C. Very simple classification rules perform well on most commonly used datasets[J]. Machine Learning, 1993,11: 63. [百度学术]
KNOX S W. Machine Learning: A concise introduction[M]. Hoboken: Wiley & Sons, 2018. [百度学术]
KOSHELEVA O, KREINOVICH V. Measures of specificity used in the principle of justifiable granularity: A theoretical explanation of empirically optimal selections[C]//2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018: 1-7. [百度学术]
MENGER K. Statistical metric spaces[J]. Proc. Nat Academy of Sciences (USA), 1942, 28:535. [百度学术]
PEDRYCZ W, GOMIDE F. An introduction to fuzzy sets. analysis and design[M]. Cambridge, MIT Press, 1998. [百度学术]
PEDRYCZ W, HOMENDA W. Building the fundamentals of granular computing: A principle of justifiable granularity[J]. Applied Soft Computing, 2013, 13:4209. [百度学术]
PEDRYCZ W. Granular computing[M]. Boca Raton: CRC Press, 2013. [百度学术]
PEDRYCZ W, WANG X. Designing fuzzy sets with the use of the parametric principle of justifiable granularity[J]. IEEE Trans. on Fuzzy Systems, 2016, 24:489. [百度学术]
SCHWEIZER B, SKLAR A. Probability metric spaces[M]. New York: North Holland, 1983. [百度学术]
TAKAGI T, SUGENO M. Fuzzy identification of systems and its applications to modeling and control[J]. IEEE Trans Syst Man Cybernetics, 1985, 15:116. [百度学术]
WALKER E A, NGUYEN H T, WALKER C L. A first course in fuzzy logic[M].
WANG X, PEDRYCZ W, GACEK A, et al. From numeric data to information granules: A design through clustering and the principle of justifiable granularity[J]. Knowledge-Based Systems, 2016, 1011:100. [百度学术]
ZHANG Zhongjie, HUANG Jian. Stabilizing the information granules formed by the principle of justifiable granularity[J].Information Sciences, 2019, 503:183. [百度学术]
ZHAO J, XIE X, XU X, et al. Multi-view learning overview: Recent progress and new challenges[J]. Information Fusion, 2017, 38:43. [百度学术]