Recent posts (max 10) - Browse or Archive for more

DataCleaner 1.5.4 released with dBase and MS Access support

Here it is:  DataCleaner 1.5.4 :)

Although this release is a minor release it contains a few exciting features and fixes:

  • We've updated the MetaModel version to 1.2 which adds support for two new datastores:
    • dBase databases (.dbf files)
    • MS Access databases (.mdb files)
  • We've fixed a bug pertaining to text-file dictionary "file not found" errors.
  • A lot of the other underlying libraries have been updated, providing improvements to performance and stability.

Head on over to the  downloads page to grab the new DataCleaner.

MetaModel 1.2 introduces cross-datastore querying and MS Access and dBase support

We're happy to present a new version of the wonderful  MetaModel component. This version adds a radical new feature: Cross-datastore querying, which means that you can now execute queries that spans multiple datastores (ie. with transparent client-side joining, filtering, grouping etc.). You can check out a simple example of this  at kasper's source (blog).

Version 1.2 also adds support for two long-awaited datastores: Microsoft Access databases and dBase databases. Access support is implemented for MetaModel with a core based on the  Jackcess project. MetaModel's dBase support is based on a derivate of  xBaseJ, courtesy of xBaseJ, American Coders and Joe McVerry.

To look into MetaModel 1.2, here are the crucial resources:

  • Downloadables at  google code.
  • Javadocs  available online.
  • Maven-support out of the box:
    <dependency>
      <groupId>dk.eobjects.metamodel</groupId>
      <artifactId>MetaModel-full</artifactId>
      <version>1.2</version>
    </dependency>
    

With MetaModel 1.2 we're feature-complete with all of the 1.x features of the MetaModel-roadmap. We hope that you will find it to be as great and useful as we ever intended it to be!

DataCleaner 1.5.3 released

After much waiting, we are finally ready to release DataCleaner 1.5.3. Here's the wrap-up on what's been going on:

  • The MetaModel dependency has been upgraded to version 1.1.8, which means:
    • Improved Excel spreadsheet support
    • Improved SQL Server support
    • Improved performance for CSV files
  • Fixed a bug that caused certain database connection errors to be ignored in terms of user feedback.
  • Fixed a bug that caused re-opening of database dictionaries to throw a NullPointerException.
  • Fixed a bug related to dictionary lookups of null values.
  • Added support for Teradata databases.
  • Added connection templates for SQL Server connections.
  • Added support for selection of custom encodings when reading CSV files.
  • Fixed a minor bug relating to reading files on the classpath when running in Java WebStart mode (which manifested in an exception thrown when clicking on "About DataCleaner").

So as you can see, it's been a mix of minor bugfixes and a couple of improvements to compatibility and performance regarding certain datastores. We hope you enjoy this new release of DataCleaner. As always, you can ...

Let us know what you think!

MetaModel 1.1.8 adds better SQL Server support

I'm happy to announce the release of MetaModel 1.1.8.

This release is a minor release with updates only relating to MS SQL Server. The changes are, however, profound in this regard. Microsoft SQL Server JDBC drivers are known to be quirky when it comes to metadata exploration and we are happy to say that MetaModel now addresses these issues. So if you're a MS SQL Server you should be sure to get the latest version of MetaModel!

MetaModel is as always available at the following locations:

  • Downloadables at  google code.
  • Javadocs  available online.
  • Maven-support out of the box:
    <dependency>
     <groupId>dk.eobjects.metamodel</groupId>
     <artifactId>MetaModel-full</artifactId>
     <version>1.1.8</version>
    </dependency>
    

We hope you're all satisified with the improvements of this release and don't hesitate to give us any feedback.

New book on Open Source Business Intelligence tells the DataCleaner-story

About half a year ago we received an exciting inquiry from Jos van Dongen on behalf of him and his co-author Roland Bouman, telling us that they where writing a new book about Open Source Business Intelligence and in particular Pentaho-based solutions. And for this they where looking into DataCleaner for the data profiling section of the book!

The book is now out! It's called "Pentaho Solutions" and it's published by Wiley Publishing. You can read  about it and buy it on their website as well.

The book contains a walkthrough for building a data warehouse using Open Souce tools and in doing so applying DataCleaner for the important job of profiling and validation.

We congratulate Roland Bouman and Jos van Dongen for their great work to promote Open Source Business Intelligence and thank them for mentioning DataCleaner while they're at it!

Explore and query all your datastores with MetaModel 1.1.7

We're pleased to announce the release of MetaModel 1.1.7. The major changes from our latest release is the introduction of two important improvements:

  • Microsoft SQL Server is finally supported and integration tests have been added to our portfolio of tests of supported databases. Thank you to Asbjørn Leeth for the major contributions of this feature.
  • We've added an option to configure the character encoding for opening CSV files.

With the addition of these two improvements we think that we've added some significant "drops in the ocean" on our way of becoming the most comprehensive and advanced framework for object-oriented querying and datastore-independent schema exploration.

If you use Maven, update your dependencies to the following:

<dependency>
 <groupId>dk.eobjects.metamodel</groupId>
 <artifactId>MetaModel-full</artifactId>
 <version>1.1.7</version>
</dependency>

... or if you don't, head on over to our  download site at Google Code and download a copy of the release.

eobjects.org announces Open Source data quality with DataCleaner 1.5.2

Dear DataCleaner users,

We are happy to announce the release of  DataCleaner 1.5.2. Users of DataCleaner 1.5.0 or 1.5.1 won't be able to see a lot of changes in the user interface, but this release actually holds quite a lot of improvements “beneath the surface”:

  • The most notable improvement is in the Value Distribution Profile. Previously this profile consumed quite a lot of memory which could lead to out-of-memory errors in extreme cases. This has been fixed by using on-disk caching with the berkeley db when nescesary.
  • Another notable feature is that we can now distribute DataCleaner as a single JAR file. This means that we will be serving the application as a Java WebStart application (ie. run it as if it's an online application) and we are also considering other distribution options.
  • When starting the application, it automatically downloads regular expressions from the  RegexSwap.
  • A bug in regards to matching number-based columns in dictionaries was reported and fixed.
  • A bug in regards to invalid characters in XML-export formats was reported and fixed.
  • When opening files, we are now ignoring suffix case so that .CSV files can be opened as well as .csv.
  • The number of columns shown in the preview window are automatically restricted if there are too many to show on a single screen.

You can download DataCleaner from the  downloads page or you can use our new feature:  Get it via Java WebStart!

This release underlines the ongoing evolution of  DataCleaner to be a more and more professionally capable data profiler and data quality tool. Seeing that DataCleaner is  being used in large corporations world wide I wish to address some thoughts that I have been having and that I know users are pondering with: How do you best combine the low adoption cost of Open Source applications like DataCleaner with the high flexibility that most commercial business-software provide? To service this need we've opened up a new division of the company that I work with,  Lund&Bendsen. Whether you need to deploy DataCleaner to high-scale installations, integrate the applications with your existing systems or develop customized profiles, validation rules or satisfy other enterprise needs, we offer you first class services and in-depth expertise you wont find anywhere else.

To cut to the chase: DataCleaner 1.5.2 is here and we wish to extends the community development with a professional effort. So don't hesitate to let us know if you see an opportunity to invest. Adding value by targeting your use of the product is in the interest of both customer, developer and community and this is the reason our business is there.

To all you non-business users out there: Sorry for the obvious commercial rant and we hope you all enjoy the newest DataCleaner release.

Best regards,
Kasper Sørensen
Founder of  eobjects.org and the  DataCleaner project

MetaModel 1.1.6 released: Small changes, a bug fixed

We've released yet another version of MetaModel, namely version 1.1.6.

This release contains very few changes to the 1.1.5 release:

  • A convenience method was added to the Query class: select(FunctionType, Column).
  • Upgrading the Apache POI version in MetaModel introduced a few bugs that we did not discover in the 1.1.5 milestone. In 1.1.6 we fixed these bugs and unittesting was significantly improved for this part of the code to prevent any new bugs from emerging.

We hope you enjoy this release and excuse for the hectic release schedule - the before mentioned bug fixes where critical and we hope that you appreciate the quick response from the community.

eobjects.org announces the release of MetaModel 1.1.5

We have just released the newest version of MetaModel, 1.1.5. This release is a minor release which means no API changes, but a few upgrades in terms of performance, flexibility and ease of distribution (full list):

  • The most important upgrade have been to CSV performance. We encountered a bug when querying this type of datastore that meant that the whole DataSet was stored in memory while using it. This has undergone quite some refactoring so that it will now stream through memory as expected, thus keeping the door open for very large CSV files.
  • A minor change in the column naming scheme have been implemented for the Excel-based DataContext's. This means that if the first row of a spreadsheet contains only blank fields, we will automatically assign the names "[column 1]", "[column 2]" etc. accordingly.
  • The  downloadable zip or tar.gz file will now contain a "MetaModel-1.1.5-all.jar" file, which is an assembled jar file containing the classes of all MetaModel modules (core, csv, jdbc, excel etc.), which should substantially ease deployment of the framework.

We hope you enjoy the new release of MetaModel and keep up the good work of providing the valuable feedback that drives development of it.

DataCleaner 1.5.1 released

We're happy to announce the release of DataCleaner version 1.5.1. This release is a minor release, nevertheless containing a few nice features - especially for the users who are enjoying the exporting features that was introduced in 1.5:

  • An additional HTML export format have been added to the built-in export formats (usable when exporting Profiler results in the desktop app and when executing the runjob command-line tool).
  • The export format is now choosable directly in the desktop app.
  • Four new measures where added to the String Analysis profile: avg. chars and max/min/avg white spaces.

The new version of DataCleaner is (as always) downloadable for free on the  downloads page and feedback from users is also greatly appreciated, ie:

We hope that you all enjoy DataCleaner 1.5.1.