[hackerspaces] Academic scraping

Mon Jan 14 18:27:32 CET 2013

Hey all,

The unspoken truth of programmerhood is that many of us write spiders
and scrapers. But nobody talks about it. I have done some
introspection on why these initiatives fail in academic contexts, and
I think a big reason is because of biting off more than one can chew.
The other reason is that there's no best practices being passed
around, and no reusable software distributed (for the most part).

https://groups.google.com/group/science-liberation-front

Maybe instead of never communicating about these ideas, it would be
better to write them down for ourselves. I suspect that there are many
individuals that are highly motivated this week to start writing out
silly curl scripts. A pile of pdfs is fairly useless to the broader
community (especially without metadata, since OCR so rarely works on
\tau\epsilon\tex).

I'm dropping this here because for whatever reason many of the people
in the hackerspace community have approached me separately over the
past few days about starting projects like these. Maybe instead of
duplicating effort we could figure out ways to suck less?

- Bryan
http://heybryan.org/
1 512 203 0507