A benchmark proposal for design pattern detection
Francesca Arcelli
[email protected]
Marco Zanoni
Christian Tosi
Università degli Studi di Milano Bicocca
Dipartimento di Informatica Sistemistica e Comunicazione
Viale Sarca, 336 20126 Milano, Italy
Design pattern detection is a topic which received a great
interest during the last years. Finding design patterns (DP)
[3] in a software system can give very useful hints on the
comprehension of a software system and on what kind of
problems have been addressed during the development of
the system itself; their presence can be considered as an indicator of good software design. Moreover, they are very
important during the re-documentation process, in particular when the documentation is very poor, incomplete or not
Several design pattern detection approaches and tools
have been developed both for forward and reverse engineering aims and involving different techniques for the detection
such as fuzzy logic, constraints solving techniques, theorem provers, template matching methods and classification
techniques (i.e. [6], [7], [4], [5], [2]). In spite of the many
approaches proposed, the results obtained are quite unsatisfactory and different from one tool to the other.
Many tools find many false positive instances but other
correct instances are not found. One common problem in
the design pattern detection is the so called variant problem:
design patterns can be implemented in several ways, often
very different from one another. The main variants for each
pattern are described in the catalog of [3], others are applied
when the context of application requires it. These variations
cause the failure of most pattern instances recognition using
rigid detection approaches, which are based only on canonical pattern instances.
Moreover no real benchmark has not yet been proposed
to compare design patterns detection tools. If one tries to
compare design pattern detection tools on the same system,
usually retrieves very different results and often it is not possible also to replicate and obtain the results described by the
authors of the tool. In spite of the validity of the results obtained by one tool respect to the other, one relevant problem
is the lack of a real benchmark to be easily used to compare
the results in a sound way.
We face these problems since we are developing a tool
called MARPLE (Metrics and Architecture Reconstruction
plug-in for Eclipse) [1] whose main aims are related to design pattern detection (DPD) and software architecture reconstruction. For what concerns to DPD Marple is characterized by the following steps:
• the detection of sub components or micro architectures
which give useful hints on the DP detection and, which
aim to mitigate the variant problem
• the detection af all the possible DP candidates performed by a module called Joiner whose results are
characterized by high recall values
• the refining of the previous results through data mining techniques, in particular through a step of clustering and a step of supervised classification (in particular
through naı̈ve bayes and support vector machines classifiers); in this way we are able to reduce the output
size and to sort the results by their relevance through
the Classifier module
In this work we would like to propose a benchmark and
an approach to be used to compare DPD tools. In this way
we aim to find some mechanisms to obtain safe results and
for making them and the DPD tools available to the community in a easier way. The adoption by the DPD community
of a benchmark could improve the cooperation among the
researchers and the reuse of tools written by other instead
of the development of new ones.
Our benchmark proposal is based on the definition of a
standard for the representation of the results of DPD tools.
Having a common standard will permit to write applications
that are able to compare the results coming from different
What we have to know about a design pattern to represent it?
A design pattern is an organized set of classes working
together. These classes respect the pattern’s design rules
[3], and each one has a specific role assigned. So a representation of a design pattern instance must contain the classes
belonging to it and the role assigned to each one.
Another issue is that in a design pattern some roles can
be played by more than one class. So we need a way to specify these types of situation. We propose a tree organization
for the classes of design patterns. In this representation it
is possible to specify one-to-one or one-to-many relationships between the roles. Each pattern has a root that can be
composed of one or more roles.
• DPDef : it contains the name of the defined design pattern and is associated to the root of the level definition
The instance of a design pattern must follow the definition and is a more complicated structure linked to the definition in all its levels. In Figure 2 we see the UML class
diagram of the instance model.
name : String
1 - roledef
1 - roleassociation
className : String
A benchmark proposal for DP Detection
Each design pattern must follow a definition, a schema.
As introduced we model DPs as trees, so the schema is the
one represented in Figure 1.
- leveldef
- roleassociations
- level
contains levels
- root
- instances - level
- level
- parent level
- child level
- root
- dpinstance
- dpinstance
- dpdef
composed by
* - role
name : String
Figure 1. UML for the dp definition model
The DP is essentially defined by a name and a tree of
level definitions. Each level definition is a container of roles
that belong to that level and of child level definitions. The
defined classes are:
• RoleDef : it represents a role belonging to the design
pattern we are defining; it is characterized only by its
identification name.
• LevelDef : it is a container of roles that will have to be
in a one-to-one relationship when associated, and can
have child level definitions, implementing the tree.
Figure 2. UML for the dp instance model
In the instance model new classes are defined:
• DPInstance: it refers to a dp definition and it is connected to the root level instance, that must follow the
• Level: it is a container of level instances that must follow the associated level definition.
• LevelInstance: it is a set of role associations and links
to its child levels; the correct child levels for an instance are the ones linked to the level defs that are child
of the instance parent level definition.
• RoleAssociation: it expresses the assignment of a class
to a specific role.
The model could be used to compare DP instances detected by different tools, following the same definition: at
each level all the result differences can be clearly seen. In
the same way it is possible to compare an instance to a validated set of known instances. If a common role naming
will be adopted it will be possible to compare also instances
coming from slightly different DP definitions (obviously if
they define the same pattern).
In Figure 3 we show an example of definition for the
abstract factory pattern.
AbstractFactoryDef : DPDef
name = "Abstract Factory"
- root
AF : RoleDef
name = "Abstract Factory"
L1 : LevelDef
- role
of other tools and discuss about the experimentations. This
aspect will require an extension of the model in order to be
able to keep track of the user who submitted the instances,
to be able to tag if each instance is a good one or not (or
partially), and so on.
Our final intent does not only regards the tool competition but also the creation of a container for design patterns that, through the users’ voting, will permit us to build
a large and “community validated” dataset for tool testing
and benchmarking.
All of these reasons convinced us that this proposal is
essential for this research area because it allows the real
sharing of information and knowledge among all research
groups interested in design patterns for both reverse and forward engineering.
- parent level
- child level
- parent level
L3 : LevelDef
- parent level
- role
AP : RoleDef
- child level
L4 : LevelDef
name = "Abstract Product"
- child level
L2 : LevelDef
- role
- role
CP : RoleDef
CF : RoleDef
name = "Concrete Product"
name = "Concrete Factory"
Figure 3. UML Object diagram for the definition of Abstract Factory DP
In Figure 4 we show an example of an instance of the abstract factory pattern that follows the definition depicted in
Figure 3. The models and the examples are represented using UML but it’s simple to define them through for example
Conclusion and Future Developments
The model proposed in this paper is only a draft and
could be improved and changed in the next future. This
represents essentially the way we think about a design pattern’s structure. We will need for an XML schema for the
definition of patterns and pattern instances, in order to have
easy exchangeable data.
We hope that this proposal will allow us and the community to have a standard for the representation of the results of
a design pattern detection tool and a way to compare them.
We would like to realize a public service which will permit
users to submit their results, compare them with the results
[1] F. Arcelli, C. Tosi, M. Zanoni, and S. Maggioni. The marple
project - a tool for design pattern detection and software architecture reconstruction. In Proceedings of the International
Workshop on Advanced Software Development Tools and
Techniques (WASDeTT 2008), Paphos, Cyprus, July 2008.
[2] J. Dietrich and C. Elgar. Towards a web of patterns. Web
Semant, 5(2):108–116, 2007.
[3] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: elements of reusable object-oriented software.
Addison-Wesley Professional, 1995.
[4] Y.-G. Guéhéneuc. Ptidej: Promoting patterns with patterns.
In Proceedings of the 1st ECOOP workshop on Building a
System using Patterns. Springer Verlag, 2005.
[5] J. Niere, W. Schäfer, J. P. Wadsack, L. Wendehals, and
J. Welsh. Towards pattern-based design recovery. In ICSE
’02: Proceedings of the 24th International Conference on
Software Engineering, pages 338–348, New York, NY, USA,
2002. ACM.
[6] N. Shi and R. A. Olsson. Reverse engineering of design patterns from java source code. In ASE ’06: Proceedings of the
21st IEEE/ACM International Conference on Automated Software Engineering, pages 123–134, Washington, DC, USA,
2006. IEEE Computer Society.
[7] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T.
Halkidis. Design pattern detection using similarity scoring.
IEEE Transactions on Software Engineering, 32(11):896–
909, 2006.
Figure 4. UML for the example of the Abstract Factory DP instance model
- leveldef
name = "Abstract Product"
AP : RoleDef
- roledef
- roleassociation
className = "AbstractProduct1"
- roleassociations
APi1 : RoleAssociation
L3 : LevelDef
- dpdef
- instances
- level
L3i1 : Level
- sublevels
L4i1i1 : LevelInstance
- instances
- level
L4i1 : Level
- level
- parentInstance
- sublevels
- root
- dpinstance
- leveldef
- sublevels
- level
CPi1 : RoleAssociation
name = "Concrete Product"
CP : RoleDef
- roledef
- roleassociation
- roledef
AF : RoleDef
name = "Abstract Factory"
name = "Concrete Factory"
CF : RoleDef
- roledef
- roleassociation
className = "ConcreteFactory1"
CFi1 : RoleAssociation
L2 : LevelDef
- roleassociations
- leveldef
className = "ConcreteProduct1"
- roleassociations
L4 : LevelDef
L2i1i1 : LevelInstance
- instances
- level
L2i1 : Level
AF1 : RoleAssociation
- roleassociation
className = "Abstract Factory"
- roleassociations
- parentInstance
L1i : LevelInstance
- parentInstance
- dpinstance
AFInstance : DPInstance
L3i1i1 : LevelInstance
- level
AbstractFactoryDef : DPDef
name = "Abstract Factory"

A benchmark proposal for design pattern detection