Watch out, dirty data! DataCleaner 2.0 is in town!
The Open Source software community eobjects.org is happy to announce the release of DataCleaner 2.0. This release marks the biggest advance in technology and features for the DataCleaner platform throughout the history of the project.
Amongst exciting new features in DataCleaner 2.0 are:
- Data transformations, allowing you to preprocess, extract, refine, combine and calculate data items as a part of your data profiling jobs.
- Filtering, sampling and subflow management, allowing you to define criteria to exclude and include particular items of data.
- Richer reporting with charts, graphs, navigation trees and more.
- A bunch of new data quality functions for date gap analysis, phonetic similarity finding, synonym lookups and more.
- More configuration options and added data quality measures for existing data quality functions like the Pattern finder, String analyzer and more.
- Reusable profiling jobs, where you define your processing flow once and consequently run it on any data.
- Support for MS Excel 2007+ spreadsheets.
For more information about what’s new in DataCleaner 2.0, see the full list of new features in DataCleaner 2.0.
Today it was also announced that Human Inference, the European data quality authority has finished their acquisition of the eobjects.org site, to actively enter the market for entry-level Open Source data quality products. All projects on eobjects.org will remain open source and the benefit for the community and the products are apparent. The release of DataCleaner 2.0 is the first visible outcome of the acquisition, resulting from several months of intense cooperation between Human Inference and the community members, to put together a state-of-the-art data profiling application.
For more information about the eobjects.org acquisition, see the press release on the Human Inference website.
Times are really exciting in the eobjects.org community these days. We hope you’re all as enthusiastic about the new DataCleaner 2.0 as we are. The application is ready for download and for immediate launch through Java Web Start, so visit the DataCleaner website now.
MetaModel 1.5 released. Unify your view on all datastores
MetaModel 1.5, an Open Source Java framework for accessing, exploring and querying different datastores using a unified API, have just been released. MetaModel provides a single view and a SQL/LINQ-like query engine for everything ranging from relational databases, CSV files, Excel spreadsheets, XML files, dBase (.dbf), MS Access (.mdb) and OpenOffice.org (.odb) databases.
The 1.5 release has been more than a year under way, including substantial new features and enhancements. Three major themes influence the new features of the 1.5 release:
Improved datastore compliancy
In addition to the already extensive set of supported datastore types, the following new datastore features have been added:
- Support for Excel 2007+ (.xlsx) spreadsheets has been added.
- Composite datastores have been added, allowing you to define queries that span multiple datastores.
- Excel formula calculation have been added.
Fluent Query Builder API
MetaModel 1.5 retains the existing Querying API, which is extremely flexible but also complex, and therefore quite easy to make mistakes with. But MetaModel 1.5 adds a new layer of abstraction to the Querying API: The Query Builder API. With the Query Builder API you can define queries in an even easier, more safe and elegant way. The goal of the Query Builder API is to leverage the use of the compiler as far as possible for query expression.
An example demonstrates it quite well:
DataContext dc = DataContextFactory.create[your_datastore_type]DataContext (...);
Query q = dc.query()
.from(projects).selectCount().and(community)
.where(license).equals("oss")
.groupBy(community).toQuery();
Interfaces and immutability
Instead of the previous JavaBeans based API, the 1.5 release includes interfaces for just about everything in the library. This means that it is as of now easier to test, integrate and deploy MetaModel. It also allows for better encapsulation internally as well as improved safety by exposing only immutable variants of the data structures (like Table, Schema, Column etc.) that are modifiable only by the framework.
Today it was also announced that Human Inference, the European data quality authority has finished their acquisition of the eobjects.org site, to actively enter the market for entry-level Open Source data oriented applications. All projects on eobjects.org, including MetaModel, will remain Open Source, but heavily enforced by the invested time and resources that Human Inference is adding to these projects.
For more information about the eobjects.org acquisition, see the press release on the Human Inference website.
MetaModel is already in use in a lot of projects, including the DataCleaner data analysis/profiling application and Quipu, the data warehouse generator. It is also in Human Inference’s plans to expand the usage of MetaModel into their enterprise-grade data matching and deduplication applications. If you think MetaModel 1.5 sounds interesting, head over to the website to learn more. MetaModel is available as a Maven artifact or as a traditional download at Google code.

rss