Posts in category community

New book on Open Source Business Intelligence tells the DataCleaner-story

About half a year ago we received an exciting inquiry from Jos van Dongen on behalf of him and his co-author Roland Bouman, telling us that they where writing a new book about Open Source Business Intelligence and in particular Pentaho-based solutions. And for this they where looking into DataCleaner for the data profiling section of the book!

The book is now out! It's called "Pentaho Solutions" and it's published by Wiley Publishing. You can read about it and buy it on their website as well.

The book contains a walkthrough for building a data warehouse using Open Souce tools and in doing so applying DataCleaner for the important job of profiling and validation.

We congratulate Roland Bouman and Jos van Dongen for their great work to promote Open Source Business Intelligence and thank them for mentioning DataCleaner while they're at it!

eobjects.org announces Open Source data quality with DataCleaner 1.5.2

Dear DataCleaner users,

We are happy to announce the release of DataCleaner 1.5.2. Users of DataCleaner 1.5.0 or 1.5.1 won't be able to see a lot of changes in the user interface, but this release actually holds quite a lot of improvements “beneath the surface”:

  • The most notable improvement is in the Value Distribution Profile. Previously this profile consumed quite a lot of memory which could lead to out-of-memory errors in extreme cases. This has been fixed by using on-disk caching with the berkeley db when nescesary.
  • Another notable feature is that we can now distribute DataCleaner as a single JAR file. This means that we will be serving the application as a Java WebStart application (ie. run it as if it's an online application) and we are also considering other distribution options.
  • When starting the application, it automatically downloads regular expressions from the RegexSwap.
  • A bug in regards to matching number-based columns in dictionaries was reported and fixed.
  • A bug in regards to invalid characters in XML-export formats was reported and fixed.
  • When opening files, we are now ignoring suffix case so that .CSV files can be opened as well as .csv.
  • The number of columns shown in the preview window are automatically restricted if there are too many to show on a single screen.

You can download DataCleaner from the downloads page or you can use our new feature: Get it via Java WebStart!

This release underlines the ongoing evolution of DataCleaner to be a more and more professionally capable data profiler and data quality tool. Seeing that DataCleaner is being used in large corporations world wide I wish to address some thoughts that I have been having and that I know users are pondering with: How do you best combine the low adoption cost of Open Source applications like DataCleaner with the high flexibility that most commercial business-software provide? To service this need we've opened up a new division of the company that I work with, Lund&Bendsen. Whether you need to deploy DataCleaner to high-scale installations, integrate the applications with your existing systems or develop customized profiles, validation rules or satisfy other enterprise needs, we offer you first class services and in-depth expertise you wont find anywhere else.

To cut to the chase: DataCleaner 1.5.2 is here and we wish to extends the community development with a professional effort. So don't hesitate to let us know if you see an opportunity to invest. Adding value by targeting your use of the product is in the interest of both customer, developer and community and this is the reason our business is there.

To all you non-business users out there: Sorry for the obvious commercial rant and we hope you all enjoy the newest DataCleaner release.

Best regards,
Kasper Sørensen
Founder of eobjects.org and the DataCleaner project

Data quality pro launches DataCleaner articles

Things are starting to shape up for the big release of DataCleaner 1.5. We are starting off with a bit of excitement around in the data quality community.

data quality pro

Probably the most dedicated online magazine about data quality, data quality pro, have launched a series of articles about profiling, validating and comparing data with DataCleaner. So far an introductory tutorial (including a complete and realistic example data-set) and a background article/interview have been published:

We hope that you will enjoy the articles and we thank data quality pro for their great interest in our community.

First commercial support company for DataCleaner and MetaModel

Today we are announcing the first company, Lund&Bendsen, to officially support DataCleaner and MetaModel on a commercial level. These eobjects.org projects are, as you know, independent projects that are run with the community in mind. But as time goes on they grow and for companies to pick them up and start using them in a commercial setting we also welcome third party commercial support to help spread the projects to environments where community-based support is insufficient.

Lund&Bendsen is a Danish company with a strong expertise in Java development and training. Their service offerings include training, customization, integration and enhancement of DataCleaner and MetaModel so if your company is considering applying DataCleaner they might be interested in hiring some professionals to aid them in the process.

Over time more companies are expected to join in on commercial support for the eobjects.org projects. Keep up to date on the DataCleaner support page and don't hesitate to contact us for any inquiries in this regard either.

Independent analysis firm points at DataCleaner for open source data quality

The Technology Evaluation Centers (TEC) have published an interesting, unbiased and independent analysis of the market for Open Source business intelligence products. We are delighted to see that the article features a section about data quality and that TEC points at DataCleaner as a competent choise within the open source products:

In such situations, where the vendor does not support a specific functionality,
organizations can look to complementary open source solutions; the DataCleaner
project from eobjects.org, for instance, provides functionality to help profile
data and monitor data quality. It also points to a significant advantage with
open source applications: the fact that software is developed by the community
and for the community makes it much simpler to share innovative solutions
quickly and seamlessly.

You can read the whole article by Anna Mallikarjunan from TEC by going to their website (user registration is required).

DataCleaner launches new regex sharing subsite - RegexSwap

Only a few days after the launch of the new DataCleaner website, we are once again ready with new exciting features. This time we are launching the first edition of our new regular expression (regex) sharing subsite called "RegexSwap".

RegexSwap is a specialized forum for sharing, categorizing, commenting and voting on regular expressions that can be used in DataCleaner and other regex-based applications. It is really easy to post your own regular expressions, test them online on the website, comment and vote on the regexes that you have found useful. In time the next releases of DataCleaner will also take advantage of this online "always up to date" regex resource and offer direct integration with RegexSwap.

RegexSwap is still in beta but is ready at a functional level which is why we are launching publically it now. It will recieve dedicated attention in the weeks and months to come.

A new website for DataCleaner

Dear everybody,

As a special christmas present we have been working hard to design a new website for DataCleaner! Hopefully you will all enjoy the new site, which have been designed to further support our community and let it grow by incorporating more features to socialize and share ideas online. So go visit it now at the new URL:

Among the new features are a more personal profile system which is linked to some of the communities that our users already use frequently, namely LinkedIn and SourceForge. We have a whole new media section with cool screenshots and webcasts. We are also redesigning our mailing list structure. Instead of the single mailing list that we have been using so far, we are launching new "announcement" and "dev" mailing lists.

Our goal is to continuously launch new features on the website. The first one being a user survey to gain a better insight into the minds of our users and community. So be sure to fill it out. In the future we will add more exiting features such as online sharing of regular expressions and reference data for DataCleaner dictionaries.

The old website will continue to exist, but primarily as a wiki and bugtracking system. During the next couple of days we will be editing the wiki pages to make them more suitable for wiki-style editing (by everyone) as opposed to the former readonly strategy.

We hope you like our christmas present and that you will let us know. and we wish you all a great 2009. Without a doubt, it will bring exiting times for DataCleaner and the DataCleaner community.

New eobjects hosts, return of continuous integration!

I'm happy to announce that eobjects.org have gotten new hosts and that the troubles that we have been experiencing the last couple of months due to weird server crashed is finally over! My final word on the matter is - getting a large OSS-based J2EE environment up and running on a proprietary power pc platform is kind of a nasty affair! :-) So luckily we've found a better solution. This also means that we can once again say hello to our friend Hudson, the continuous integration system. While it is already online I will be tweaking it for the days to come so look out for periodic builds, test-reports and all that stuff that we all love!

Update: After some initial problems cloning the old environment we have finally ruled out all the small defects I think. So lets have a cheers for our new postgresql server (humly hosting the trac system) and our new Hudson server:

Eobjects announces change in preferred license

We've made a principal decision at eobjects.org to change the preferred license of our projects from the Apache License 2.0 to the Lesser General Public License (LGPL).

The main difference between the two licenses are that the LGPL requires any modifications to be contributed back to the Open Source community (ie. licensed under a similar license; LGPL or GPL). The eobjects.org projects are gaining the obvious advantages of the LGPL by ensuring that improvements are submitted back to the projects. This also means that we don't risk that anyone sell modified versions of our projects. It is still just as appropriate to use the projects as a part of commercial applications, but any modifications must be contributed back to the community.

Initially this change in license will affect the two flagship projects of eobjects.org: DataCleaner and MetaModel. This means that the next versions of these projects (DataCleaner 1.5 and MetaModel 1.1 accordingly) will be LGPL licensed. Also, new projects will be LGPL licensed unless special circumstances suggest otherwise.

Kasper Sørensen presenting DataCleaner at Open Source Days '08

Great news everybody. The Open Source Days '08 conference in Copenhagen will feature a so-called Lightning Speak by Kasper Sørensen on the topic of DataCleaner and the eobjects.org community.

We're really happy to get the message of DataCleaner out to more people and a conference like this is an ideal spot for demonstrations, discussions and experiences. Read more about the lightning speak at Kasper's blog:

Update: The presentation is over and you can now also read the retrospective at Kasper's blog:

eobjects.org have been acquired

During the last year eobjects.dk have grown rapidly and attracted a lot of attention both from Denmark where the community was originally founded, but also internationally from users and contributors in all parts of the world. We believe that this world wide interest in eobjects should be reflected in the website name and address, which is why we have acquired the eobjects.org domain name as of today! Eobjects.dk will still prevail and the domain names are exact aliases but forward on we will undergo a gradual name change from .dk to .org. This will be reflected in several matters;

  • The official name of the website will change to eobjects.org.
  • For the sake of compatibility we will not change the package names of our java classes just yet. Only major version releases will include such changes (ie. wait for DataCleaner 2.0 and MetaModel 1.1).
  • The same principle goes for our Maven artifacts. In time they will probably change though, but this also depends on the repository crew at apache.

We are happy that we now have a domain name that symbolize the international appeal of our software and we hope that it will enforce the community with a likewise global culture and sense of vitality.

Welcome to the new eobjects.dk website

After a great deal of work we're happy to announce the launch of the new eobjects.dk website at our new server host! Thanks to Copenhagen Business School we now have a much better bandwidth available as well as more powerful hardware. Take a look around - a lot of things have changed, but the important stuff is still the same.

  • The most remarkable change is probably what you're looking at right now - the News page! With the News page we'll be sure to keep you updated with all that goes on at eobjects.dk - project releases, roadmap changes, events, visions and goals etc. etc.
  • There's a new left-hand side menu to ease navigation. We've created a new Docs page and a Downloads page for quick access to common inquiries. You'll also notice that the projects have been highlighted in the menu to give a better overview of our work.
  • For contributors and developers, the Hudson continuous integration is still not migrated yet. So we hope you have patience and discipline to live without CI for a couple of days.

We hope that you like the new website. If there's anything you'd like to comment on or anything that doesn't work as it should, please don't hesitate to go to the discussion forums and point it out for us! We will then make sure that the new website lives up to all the hopes we have for it.