[hackerspaces] Academic scraping
lokkju at gmail.com
Mon Jan 14 18:51:35 CET 2013
Once you have the raw data in a central location, it becomes much easier
for someone specialized in data processing to convert it to usable form -
even if it is hard to parse. It does help to keep the metadata though...
On Mon, Jan 14, 2013 at 12:27 PM, Bryan Bishop <kanzure at gmail.com> wrote:
> Hey all,
> The unspoken truth of programmerhood is that many of us write spiders
> and scrapers. But nobody talks about it. I have done some
> introspection on why these initiatives fail in academic contexts, and
> I think a big reason is because of biting off more than one can chew.
> The other reason is that there's no best practices being passed
> around, and no reusable software distributed (for the most part).
> Maybe instead of never communicating about these ideas, it would be
> better to write them down for ourselves. I suspect that there are many
> individuals that are highly motivated this week to start writing out
> silly curl scripts. A pile of pdfs is fairly useless to the broader
> community (especially without metadata, since OCR so rarely works on
> I'm dropping this here because for whatever reason many of the people
> in the hackerspace community have approached me separately over the
> past few days about starting projects like these. Maybe instead of
> duplicating effort we could figure out ways to suck less?
> - Bryan
> 1 512 203 0507
> Discuss mailing list
> Discuss at lists.hackerspaces.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Discuss