Milestone DataCleaner 2.4

Completed 5 months ago (12/14/11 15:05:04)

100%

Number of tickets:
closed:
8
active:
0
Total:
8
DataCleaner-core

3 / 3

DataCleaner-gui

5 / 5

Integration with the EasyDataQuality (aka EasyDQ) cloud platform. Concretely we're aiming at providing a customer data quality solution, which includes:

  • Duplicate detection (aka Deduplication or Fuzzy matching)
  • Address validation and cleansing
  • Name validation and cleansing
  • Phone validation and cleansing
  • Email validation and cleansing

New analysis job components:

  • "Table lookup", which allows looking up (multiple values) in any datastore table (on multiple conditions).
  • "Insert into table" writer, which allows to insert data into eg. database tables and other writable datastore tables (CSV+Excel).
  • Timestamp converter, which allows conversion from timestamp

New datastores supported:

  • MongoDB support (read + write).
  • Streaming XML file support (SAX based).
  • Added support for header line numbering in Fixed Width value files.

Minor updates and bugfixes to DC 2.3:

  • SAS versioning issue (resolve: SassyReader 0.3)
  • CSV writer separator char issue (resolve: MetaModel 2.0.2)

Extensibility and stability:

  • Command line interface now supports specifying jobs variables.
  • UI components for selecting columns, enum values etc. have been refactored and made much easier to extend and combine in custom extensions.
  • Allowed for properties to have custom serialization strategies, eg. for encrypting passwords etc. in job xml files.
Note: See TracRoadmap for help on using the roadmap.