Open Source AddOn for deduplication, data modelling and advanced HTML reporting


In one of the Sparx sites I am supporting an urgent need existed for deduplicating elements in a large EA repository. In the previous period a large number of Architects worked in the repository and there was no procedure for preventing duplicates. To improve the content of the repository the fist step is to deduplicate content based on scripts.

What is deduplication?

Deduplicating elements is merging two or more elements to one where the child entities of this element are also merged. A duplicatie validation is based on the following:

  • Name
  • Object_type
  • Stereotype
  • Version

When these all match an element is considered a duplicate. The elements need to be exactly the same. Especially for the name this can be a problem. For example eBS and E-BS are not the same and are not consided duplicates

When a deduplication is done the following child elements are merged:

  • Properties or attributes
  • Methods or operations
  • Notes
  • Requirements
  • Scenarios
  • Tagged values
  • Linked files

It is easily possible to extend this list since the code to do the child element transfer from duplicate to original, see therefore the code within the AddOn

Screen shots

Screen 1 Validate

 Before we do the duplication we have the possibily to generate a report of the duplicate elements or we can get a summary overview within this extension (see screen 3).

Screen 2 Deduplicate

This is the tabpage with selecting the merge aspects for the duplicated elements. On the left hand side you see which child entities will be merged. On the righthand side you can select a number of functionalities

  • Create a folder where the processed duplicate will be transferred to
  • Close this window when the processing is ready
  • Do a recursive processing for the different subpackages of the selected package

Screen 3 Validation result

This screen shows the result f the validation list option.It gives a list of the duplicate including the name, the package and the author of the duplicate. This gives you an option to modify the elements before you merge them with the deduplicate routine.

Screen 4 HTML publication

HTML report options for creating a html report based on the packages and the diagrams. Idea is that all the elements are collected based on the packages, subpackages and diagrams within the selected package.

Data modelling

For data modeling an extension is included to generate mappings, mergers and refactoring options. This includes a number of forms in the AddOn but also a number of scripts that are available in an example eap file.

More information on IDEA can be found on the IDEA website.

BizzDesign migration extension

For modelling Enterprise Architectures organisations can choose for multiple tools. In the Netherlands numerous organisations use BizzDesign as modelling tool. However license costs are relatively high and therefore some of the EAxpertise customers are considering to migrate to Sparx. 

The functionality offered by both tools overlap to a great extend and especially in data modelling Sparx has more functionality than BizzDesign. For one of our customers we did a research on the migration options, there are three models in the enterprise architecture:

  • Enterprise Architecture modelled in ArchiMate 3 models. This can be migrated very well with the Model Exchange Format standard in the ArchiMate 3 definition.
  • Business processes modelled in BPMN models This can be migrated with the BPMN exchange format 
  • Data  Models modelled in the Enterprise Data Model. For this part of the model there is no exchange format so we had to develop  something for this.
  • For the last part of the enterprise model we had to develop an extension. The rest of this article described this extension.

We first tried to develop a solution based on a script but we needed a simple user interface to control the steps so we decided to extend the IDEA DLL with this functionality. In the image you get an idea of the functionality

In the screenshot you see the following functionalities:

  • Options to load the data from an excel file. BizzDesign has the possibility to define excel files for the entities that can not be exported by the stanrd XML formats like MEF and BPMN. This applies to the EDM entities
  • Migrating the Entities data from the excel file to the repository of Sparx
  • Migrating the Associations from the excel file to the repository of Sparx
  • The transfer of the Attributes to Sparx
  • The last option needs a little explanation. BizzDesign has the option, just like Sparx to add hyperlinks in the notes that link to other elements in the repository. However they use a different internal format for these hyperlinks. This functionality transforms the format of the hyperlinks to the Sparx format.
  • An extra remark is needed for the counter under the load data button. This is used for loading the worksheet with that ordinal position in the workbook file.

See for the format of the excel sheet the template file.


Installation is relatively easy with the following steps:

  • Download the ZIP file and extract this on your computer
  • Register the DLL which is stored in based on the steps described on the sparx website Please note that you should run the regasm step in the directory of the actively installed DotNet version (last version folder with regasm available as an executable.
  • Furthermore the key in the registery is TEA.TEA.TEAAddIn or TEA.TEA.IDEAAddIn  this depends on the functionalities you want to use see the sample screen below that should match exactly with yours (case sensitive and folder structure
  • Import the duplicatiereport.XML file in the resource as a user report template (including the report fragment). Use The transfer -> Import Reference Data function in EA
  • When you want to generate HTML reports with this DLL please extract the and adapt the template to your own company brand style, a sample is included in in the installation zip
  • When you want the tool to generate PDF files within the HTML generator please extend the resources in EA with extra rtf templates as configured in the htmlgenerator.xml report definitions (please import these in the resources module in EA, use The transfer -> Import Reference Data function in EA)


Usage of this extension for deduplication has the following logical steps:

  1. Select in the project browser the package (and eventually subpackages) you want to deduplicate
  2. Right click the package and select extensions -> TenneT Browser Helper
  3. Validate your pacakges, this means that you get an overview of which elements in the repository are considered as duplicate (screen 1)
  4. Evaluate the results in your report or screen (screen 2)
  5. When you want you can check the content of this list and modify elements when you want
  6. Go to the duplicate tab (screen 2) and configure which child entities you want to merge and select in the right hand box the required functionalities
  7. Press the Deduplicate button and the merge is performed. 
  8. Go to the deduplicate folder and see which elements are merged to an original and are available for delete or archive
  9. Delete or archive the (empty) duplicate elements

Usage of the HTML publication has the following logical steps

  1. Select in the project browser the package (and eventually subpackages) you want to publish
  2. Configure the HTML generator for the
    • HTML file location
    • HTML template
    • Report name within the EA document generator
    • Coverpage name within the EA document generator
    • Generate the HTML site and use the result in your browser.

Formal aspects

This AddOn is developed by Eaxpertise for the IDEA community in the Netherlands. The source code can be downloaded here. The AddOn is available under the EUPL license

For support and adapations of the AddOn you can contact the participants in EAxpertise. However when you want to extend the product by yourselve please feel free to do so. You can find the Visual Studio project here. Sharing your adaptations in this AddOn is stimulated and will be published in the eaxpertise website.




Copyright (c) 2017 Van Roosmalen en Interactory. Ontwerp door FCT.