[hackerspaces] Academic scraping
kanzure at gmail.com
Mon Jan 14 18:59:38 CET 2013
On Mon, Jan 14, 2013 at 11:51 AM, Lokkju Brennr wrote:
> Once you have the raw data in a central location, it becomes much easier for
> someone specialized in data processing to convert it to usable form - even
> if it is hard to parse. It does help to keep the metadata though...
One of my favorite scraping methods at the moment is phantomjs, a
headless wrapper around webkit.
But for academic projects, I highly recommend zotero's translators.
Here's why. There's already a huge userbase of zotero users actively
updating these scrapers. When they break, they fix them immediately.
to the pdf but also the maximum amount of metadata. With the help of
the zotero/translation-server project, they can be used headlessly.
I have a demo of this working in irc.freenode.net ##hplusroadmap
(paperbot), he just grabs links from our conversation and posts the
pdfs so that we don't have to ask each other for copies.
1 512 203 0507
More information about the Discuss