Introduction to Benerator

Goals

The core goals of Benerator are

Generation of data that satisfies complex data validity requirements
Anonymization of production data for showcases and serious performance testing projects
Efficient generation of large data volumes, scaling up to companies with billions of customers and Big Data projects
Early applicability in projects
Little maintenance effort with ongoing implementation through configuration by exception
Wide and easy customizability
Applicability by non-developers
Intuitive data definition format
Satisfying stochastic requirements on data
Extraction and anonymization of production data
Supporting distributed and heterogeneous applications
Establishing a common data generation platform for different business domains and software systems

Features

Data Synthesization

Performance test data can be completely synthesized. A basic setup can be imported e.g. from DbUnit files, CSV files and fixed column width files. A descriptor file configures how imported data should be processed and adds completely synthesized data. The processed or generated data finally is stored in the system under test.

Production Data Anonymization

Production data can be easily extracted from production systems. Tables can be imported unmodified, filtered, anonymized and converted.

State of the Benerator

Benerator is developed and continuously extended and improved since June 2006. Benerator is mainly used and tested best for the data file and database data generation, for these applications Benerator should help you with almost all your data generation needs out of the box - and extending Benerator for specific needs is easy.

XML-Schema, on the other hand, allows for an extraordinarily wide range of features. Benerator's XML support is limited to features that are useful for generating XML data structures (no mixed content) and does not yet support all variants possible with XML schema. The elements <unique>, <key> and <keyRef> cannot be handled automatically, but require manual configuration. The following features are not yet implemented: <group> , <import>, <all> and <sequence> with minCount != 1 or maxCount != 1. If you need support for some of these, please contact us.

Building Blocks

Database Support

All common SQL data types are supported.

Benerator was tested with and provides examples for

Oracle 19c (thin driver)
DB2
MS SQL Server
MySQL 5
PostgreSQL 12
HSQL 2.x
H2 1.2
Derby 10.3
Firebird

Benerator Editions

Benerator comes in different editions which differ by feature set, scalability and performance:

Performance Comparison

The results below show Benerator's generation and anonymization performance on a plain MacBook Air (2020) with standard equipment and Azul Java Virtual Machine (CE = Community Edition, EE = Enterprise Edition):

Benchmark	CE 1.1.2	CE 2.0.0	EE 2.0.0 / 1 Thread	EE 2.0.0 / 4 Threads
gen-string.ben.xml	37	58	336	1,095
gen-person-showcase.ben.xml	26	119	111	327
anon-person-showcase.ben.xml	31	120	113	328
anon-person-regex.ben.xml	346	537	838	1,381
anon-person-hash.ben.xml	386	500	1,299	1,287
anon-person-random.ben.xml	576	838	1,514	1,736
anon-person-constant.ben.xml	2,210	2,745	2,646	2,162

The numbers are million entities generated/anonymized per hour. Compared to CE 1.1.2's generation engine, CE 2.0.0 is 1.5-2 times faster and EE 2.0.0 with 4 threads is roughly 4 times faster and scales further with the number of CPUs on your machine.

Benerator Community Edition (CE)

Benerator started as an open-source project and is committed to further improve and extend with and from the feedback of its user base and its contributors. It is the most powerful open-source data generator and is competitive with all commercial products.

However, it has two historic limitations: - No neat graphical user interface - Only single-threaded generation and anonymization

Though, Benerator Community Edition still has an impressive performance.

Benerator Enterprise Edition (EE)

Extends Benerator Community Edition and improves it in many respects. With highly-optimized engine and generation-related components and with multithreaded execution support, its performance on a single machine is about 10x the performance of the Community Edition and Benerator can easily scale over multiple machines in your private cloud / cluster setup.

Improvements against the Community Edition are

Improved performance on single threading
Multithreaded data generation and anonymization
Anonymization Reporting supports you in compliance checking
Benerator UI: An integrated graphical development environment with editing support, project and task management
JSON support
JMS support: ActiveMQ, RabbitMQ and more
Kafka support
Industry modules: Logistics, Insurance, Finance, ...