Forgot password? | Forgot username? | Register

Data Cleanup Tool

Data Cleanup Tool

Hi All,

This isn't strictly EMu related but this seemed the most appropriate section to post in.

I came across this really interesting tool that google are developing for manipulating, filtering and analysing dirty or inconsistent data.  There's a video and download link here: http://code.google.com/p/google-refine/

I gave it go and can see it being very useful for data imports.

Cheers,

Warren.

Warren Hindley (Axiell Washington DC)
useravatar
Offline
8 Posts
Male 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

INTERESTING....!

Thanks Warren for this.
I have about 75,000 records in Access, created as part of a rapid data capture project from our hand-written registers.  Reconciliation and reducing redundancy is perhaps the most daunting and tedious task and has put me off till now.
I'll give this a go and will let you know how I get on.

cheers

Dave Smith
Natural History Museum

Dave Smith
Earth Sciences Data Manager
Natural History Museum, London

David Smith
Earth Sciences Data Manager
useravatar
Offline
52 Posts
Male  Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Dave

Do let us all know. I hadn't heard of Google Refine and can see that, in principle, it could be extremely useful for the various pockets of messy data that still lurk in our database. Presumably it would be possible to Export from EMu - Refine - and then Import again?

Thanks

Stephen Johnston
Museum of the History of Science, University of Oxford

Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Has anyone got anything more recent on how useful this tool is/was?

John Peel
Manchester Art Gallery

John Peel
Collection Information Officer | Manchester Art Gallery

John Peel
Collection Information Manager
useravatar
Offline
28 Posts
Male  Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Hi John,

I shall be giving a talk (NO. Risking a demo!!) at the NA EMu Users Conference on the subject, showing how I have used it to manipulate and cleanse data created from rapid entry of hand written registers.  I've also recently discovered the world of 'open data' and API's, although have been frustrated by my programming language illiteracy.  It has huge potential to verify your data against 'authority' datasets (reconciliation) and augmentation by extracting data from them.
John, I know me doing a talk in the States won't help you, but I promise I'll try and write up a synopsis with some useful manipulation expressions and examples.  For someone like me who can't get a grip of MS Access to help with data cleaning, this tool is great.

Dave

Dave Smith
Earth Sciences Data Manager
Natural History Museum, London

David Smith
Earth Sciences Data Manager
useravatar
Offline
52 Posts
Male  Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

David Smith wrote:


Hi John,

I shall be giving a talk (NO. Risking a demo!!) at the NA EMu Users Conference on the subject, showing how I have used it to manipulate and cleanse data created from rapid entry of hand written registers.  I've also recently discovered the world of 'open data' and API's, although have been frustrated by my programming language illiteracy.  It has huge potential to verify your data against 'authority' datasets (reconciliation) and augmentation by extracting data from them.
John, I know me doing a talk in the States won't help you, but I promise I'll try and write up a synopsis with some useful manipulation expressions and examples.  For someone like me who can't get a grip of MS Access to help with data cleaning, this tool is great.

Dave

That all sounds great Dave. Look forward to it especially in regards to the authority datasets. If the Collections Trust does eventually give more of a National steer on which authorities are THE authorities that would make it even more useful.

JP

John Peel
Collection Information Officer | Manchester Art Gallery

John Peel
Collection Information Manager
useravatar
Offline
28 Posts
Male  Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Hi Dave,
Were you able to write up your findings? I'm about to look at this tool, so it would be good to know what your experience of it was and if you have any tips to share from it.
Much appreciated, Lisa.

Lisa Hayes
Museum Collections System Analyst
Macquarie University, Sydney

Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Hi Lisa,
There is a link to David's presentation on the topic on this page at the very bottom:
http://www.kesoftware.com/news-and-even … conference
There isn't a demo, but he talks about the benefits.
Kara

*************
Kara M. Lewis
Collections Information System Administrator/Analyst
National Museum of the American Indian, Smithsonian Institution

Kara Lewis
CIS Administrator/Analyst
useravatar
Offline
39 Posts
Female  Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Hi Kara,
Thanks very much for the link.
Best, Lisa.

Lisa Hayes
Museum Collections System Analyst
Macquarie University, Sydney

Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Hi Lisa (John and Stephen)

Really sorry I haven't got around to writing this up.  But actually a demo is probably the best way to present this, and guess what?  Idigbio have very kindly posted up a tutorial from their Data Carpentry course.

It covers all the key things I mentioned:
- Facets
- Clustering
- Augmenting using web services

Here's the link - http://idigbio.adobeconnect.com/p4dsgo0y77y/

and for more resources there are plenty of links on the Open Refine website and, as mentioned in the demo, there is the Open Refine Google Group.
Regular expressions are a powerful way to explore and clean your data, but if code is an alien language to you then there are a number of cheat sheets available online (e.g. http://arcadiafalcone.net/GoogleRefineCheatSheets.pdf).

Hope this helps.  It may look daunting, but once you get into it and start using it you wont look back. 

Good luck.

Dave

Dave Smith
Earth Sciences Data Manager
Natural History Museum, London

David Smith
Earth Sciences Data Manager
useravatar
Offline
52 Posts
Male  Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Data Cleanup Tool

Hi Dave,
Thanks for the update.  I'll check out the links you recommend.
Cheers, Lisa.

Lisa Hayes
Museum Collections System Analyst
Macquarie University, Sydney

Administrator has disabled public posting. Please login or register in order to proceed.
There are 0 guests and 0 other users also viewing this topic

Board Info

Board Stats
 
Total Topics:
601
Total Polls:
0
Total Posts:
1362
User Info
 
Total Users:
830
Newest User:
Hillery
Members Online:
2
Guests Online:
117