Introduction to Benerator
Goals
The core goals of Benerator are
- Generation of data that satisfies complex data validity requirements
- Anonymization of production data for showcases and serious performance testing projects
- Efficient generation of large data volumes, scaling up to companies with billions of customers and Big Data projects
- Early applicability in projects
- Little maintenance effort with ongoing implementation through configuration by exception
- Wide and easy customizability
- Applicability by non-developers
- Intuitive data definition format
- Satisfying stochastic requirements on data
- Extraction and anonymization of production data
- Supporting distributed and heterogeneous applications
- Establishing a common data generation platform for different business domains and software systems
Features
Data Synthesization
Performance test data can be completely synthesized. A basic setup can be imported e.g. from DbUnit files, CSV files and fixed column width files. A descriptor file configures how imported data should be processed and adds completely synthesized data. The processed or generated data finally is stored in the system under test.
Production Data Anonymization
Production data can be easily extracted from production systems. Tables can be imported unmodified, filtered, anonymized and converted.
State of the Benerator
Benerator is developed and continuously extended and improved since June 2006. Benerator is mainly used and tested best for the data file and database data generation, for these applications Benerator should help you with almost all your data generation needs out of the box - and extending Benerator for specific needs is easy.
XML-Schema, on the other hand, allows for an extraordinarily wide range of features. Benerator's XML support is limited to features that are useful
for generating XML data structures (no mixed content) and does not yet support all variants possible with XML schema. The elements <unique>
, <key>
and <keyRef>
cannot be handled automatically, but require manual configuration. The following features are not yet implemented: <group>
, <import>
, <all>
and <sequence>
with minCount != 1 or maxCount != 1. If you need support for some of these, please contact us.
Building Blocks
Database Support
All common SQL data types are supported.
Benerator was tested with and provides examples for
- Oracle 19c (thin driver)
- DB2
- MS SQL Server
- MySQL 5
- PostgreSQL 12
- HSQL 2.x
- H2 1.2
- Derby 10.3
- Firebird
Benerator Editions
Benerator comes in different editions which differ by feature set, scalability and performance:
Performance Comparison
The results below show Benerator's generation and anonymization performance on a plain MacBook Air (2020) with standard equipment and Azul Java Virtual Machine (CE = Community Edition, EE = Enterprise Edition):
Benchmark | CE 1.1.2 | CE 2.0.0 | EE 2.0.0 / 1 Thread | EE 2.0.0 / 4 Threads |
---|---|---|---|---|
gen-string.ben.xml | 37 | 58 | 336 | 1,095 |
gen-person-showcase.ben.xml | 26 | 119 | 111 | 327 |
anon-person-showcase.ben.xml | 31 | 120 | 113 | 328 |
anon-person-regex.ben.xml | 346 | 537 | 838 | 1,381 |
anon-person-hash.ben.xml | 386 | 500 | 1,299 | 1,287 |
anon-person-random.ben.xml | 576 | 838 | 1,514 | 1,736 |
anon-person-constant.ben.xml | 2,210 | 2,745 | 2,646 | 2,162 |
The numbers are million entities generated/anonymized per hour. Compared to CE 1.1.2's generation engine, CE 2.0.0 is 1.5-2 times faster and EE 2.0.0 with 4 threads is roughly 4 times faster and scales further with the number of CPUs on your machine.
Benerator Community Edition (CE)
Benerator started as an open-source project and is committed to further improve and extend with and from the feedback of its user base and its contributors. It is the most powerful open-source data generator and is competitive with all commercial products.
However, it has two historic limitations: - No neat graphical user interface - Only single-threaded generation and anonymization
Though, Benerator Community Edition still has an impressive performance.
Benerator Enterprise Edition (EE)
Extends Benerator Community Edition and improves it in many respects. With highly-optimized engine and generation-related components and with multithreaded execution support, its performance on a single machine is about 10x the performance of the Community Edition and Benerator can easily scale over multiple machines in your private cloud / cluster setup.
Improvements against the Community Edition are
- Improved performance on single threading
- Multithreaded data generation and anonymization
- Anonymization Reporting supports you in compliance checking
- Benerator UI: An integrated graphical development environment with editing support, project and task management
- JSON support
- JMS support: ActiveMQ, RabbitMQ and more
- Kafka support
- Industry modules: Logistics, Insurance, Finance, ...