Take 2: Download script for springerlink.com Ebooks

NOTE: This script is apparently against the licensing contract between universities and Springer, see: http://www.bib.hm.edu/aktuelles/news/newsdetail_9984.de.html

NOTE 2: I do not maintain this script anymore. Please look for an alternative.

Seems like quite some people are interested in my bash script for downloading ebooks from http://springerlink.com.

That script has some quirks, the greatest of all that it was written in bash which makes it kind of hard to implement new features. And one which was requested was support for books which span multiple pages on SpringerLink.

So here I present springer_download.py - a Python rewrite which should handle all the old links and some more. This is the very first program I’ve written in Python. And since it has to run on the Zedat servers it’s limited to Python 2.4.x without any fancy shmancy additions (a pity, since I’d love to use urlgrabber or pycurl).

the script

You can find the sources on GitHub: http://milianw.github.com/springer_download/

I plan to put all my future code snippets in public repositories on GitHub. That way you can easily track changes and stay up to date. GitHub also has a nice “download” feature which you can use to get the current version. You can find my profile and my repositories at http://github.com/milianw

Note: This script is intended to be run under Linux or other *nix’es which fulfill the requirements (Python 2.4.x, iconv and pdftk). Windows is not supported.

TODO

introduce multithreading for faster / simultaneous downloads
add speed to progressbar
use progressbar in source-downloader
use one git-repo per project (makes links work properly)

Comments

Want to comment? Send me an email!

Comment by Felix Krull (not verified) (2012-11-08 12:40:00)

Hi! IMHO ist das Skript nun nicht mehr einsatzfaehig. Die haben die Seiten und damit auch das Namensschema verändert. Es sollte anpassbar sein, aber ich kann es nicht.

Grüße Felix

Comment by Anonymous (not verified) (2012-11-12 08:42:00)

they pretty suck. damn, can someone help or does someone has got some tips how i can fix the code. i would appreciate verily, thx anyhow.

Comment by Anonymous (not verified) (2013-04-29 08:11:00)

it’s easy. Download firefox and take the download extension called Downthemall. Right click on the page ( in firefox ) click on downthemall and put the tick on filter of pdf docs and click on start. You will wonder how easy it was… :)

Comment by Anonymous (not verified) (2012-11-12 09:13:00)

Try out this fork: https://github.com/MalteSchledjewski/springer_download

Comment by Anonymous (not verified) (2012-11-12 15:27:00)

i am baffled and amazed. ingenious. thx mate, verily. well i can’t offer a spoon but maybe an impalpably and puny contribution to the table manners (don’t know if i am right resp. up2date with the possibly new cobweb): try not to come up to 75 files each 1/2 hour.

Comment by Anonymous (not verified) (2012-12-10 02:35:00)

There’s a remake of Milian Wolff’s great tool available from this website: <http://tovotu.de/dev/518-Neuer-SpringerLink- Downloader/>

Use google translate if you don’t understand the German description: http://translate.google.de/translate?sl=de&tl=en&u=http%3A%2F%2Ftovotu.d…

Seems to work great and preserves table of contents as well as page labeling for downloaded books.

Comment by Matze (not verified) (2012-10-22 23:10:00)

Since a few days/weeks the script merges the chapters not correctly! The chapters are in random order in the pdf. Anyone has an idea?

Comment by Matze (not verified) (2012-10-22 23:09:00)

Since a few days/weeks the script merges the chapters not correctly! The chapters are in random order in the pdf. Anyone has an idea?

Comment by Dennis (not verified) (2012-10-22 08:45:00)

Hi, the script always worked flawlessly until I tried to download a book yesterday. It’s not that it’s not running, it just forgets the last chapter (backmatter.pdf) and the coverpage. Best regards.

Comment by Paul (not verified) (2012-10-13 02:22:00)

Vielen, vielen Dank, Milian und ebenso der community, welche emsig nach Verbesserungen trachtet. Das ist eine große Erleichterung (-wie schön, wenn doch ALLE Lektüre online wäre; was die derzeitige Episteme wohl noch länger zu unterbinden weiß)

Bei manchem Buch bekomme ich allerdings derart Fehlermeldung angezeigt: “… found 500 chapters downloading chapter 1/500 httphttp:http:http:http:http:http:http:http:http://springerlink.com/content/pln81m2474hxmpn6/fulltext.pdf -8192000%

ERROR: downloaded chapter http://springerlink.com/content/pln81m2474hxmpn6/fulltext.pdf has invalid mime type text/html - are you allowed to download Wörterbuch der Psychotherapie? ” Weiß wer einen Rat?

Comment by Paul (not verified) (2012-10-13 04:18:00)

oder: “downloading chapter 72/201 http://springerlink.com/content/x5w42567v106m086/fulltext.pdf 100%

ERROR: downloaded chapter http://springerlink.com/content/x5w42567v106m086/fulltext.pdf has invalid mime type text/plain - are you allowed to download Psychopraxis?”

-Es scheint wohl auch nicht unbedingt an der Größe bzw. Anzahl der chapter zu liegen, wie ich anfänglich vermutete (-zumindest gelang eine Datei mit über 400 chapters)

Sofern es glückt, erscheinen dann nur noch manchmal Kapitel in falscher Reihenfolge; aber immerhin sind sie irgendwo im Dokument enthalten. Woran könnte das liegen?

Comment by Paul (not verified) (2012-10-13 02:20:00)

Comment by indianahorst (not verified) (2012-06-06 14:56:00)

Es scheint, dass das Script leider nicht mehr funktioniert… egal ob ich -l LINK oder -c ISBN verwende, ich erhalte immer folgende Fehlermeldung:

$ ./springer_download.py -c 978-3-8348-1937-6 File “./springer_download.py”, line 92 print “fetching book information…\n\t%s” % link ^ SyntaxError: invalid syntax

bzw:

$ ./springer_download.py -l “http://www.springerlink.com/content/978-3-8348-1937-6/#section=1062052&page=1” File “./springer_download.py”, line 92 print “fetching book information…\n\t%s” % link ^ SyntaxError: invalid syntax

Comment by Anonymous (not verified) (2012-08-14 12:36:00)

Vielleicht etwas sehr spät, aber benutzt du Arch Linux (oder andere Distri die ‘python’ standardmäßig mit Python 3 belegt)? Dann musst du das Skript explizit mit python2 aufrufen oder im Skript die erste Zeile anpassen: Ersetze python durch python2

Comment by Anonymous (not verified) (2012-07-09 18:24:00)

Das Skripts funktioniert nach wie vor einwandfrei. Aufruf ist z.b. $ ./springer_download.py -l http://www.springerlink.com/content/978-3-8348-1937-6/

Comment by Anonymous (not verified) (2012-03-10 13:48:00)

Goodmorning guys, I really wanted to download some books from springerlink and my library has the right for it. But as i saw here it is with phyton scripting and linux. Im am totally noob in scripting and im not fomilier with linux. I hope within a few days to have installed linux (UBUNTU) on my laptop. I have already been searched for needed software…. but i dont understand the” iconv” requirement… is this already availiable when you download php5?

I really can use some extra help. The best would be if someone give me steps to follow from installation phyton till downloading a book.

Hope to hear soon from you guys.

Kind regard,s Marius

Comment by ubuntifa (not verified) (2012-02-21 15:33:00)

Thank you for this great little piece of free software. Saved a lot of time!

Comment by Mideag (not verified) (2012-02-10 11:22:00)

Hi Milian thank you very much, this script has saved me several clicks, it has been very helpful :P

Comment by Enno (not verified) (2012-01-13 08:32:00)

Dear Milian, thank you for this nice script which is very useful for me. If I’m outside of the university network I have to use a proxy to get the necessary IP to access springerlink.com. Is there a way to use your script with a proxy? Best wishes, E.

Comment by liob (not verified) (2011-11-20 22:38:00)

Hi,

I have taken the liberty to write a script that allows you to download ebooks from the thieme ebook library as pdf. You can find it at http://github.com/liob/thieme2pdf

Comment by Tatome (not verified) (2011-11-18 13:49:00)

Great script, thanks. This saves a lot of fiddling with bash or clicking on download links on my part.

One thing, though: I monkey-patched the script so it doesn’t bail out if it can’t download a chapter. It would be a nice idea to have a switch for that; it could even insert pages into the merged document where chapters were left out.

Cheers, Johannes

Comment by Anonymous (not verified) (2011-11-01 13:01:00)

Hi,

after creating the bookTitle, around line 121, you should check the length of it to avoid max. filename length problems. See http://www.tutorialspoint.com/python/os_fstatvfs.htm how to get the max. filename length. Would be nice ;-) By the way, good job with this program.

Greetings, Mo

Comment by Anonymous (not verified) (2012-02-04 20:06:00)

I don’t know anything about python but yeah, that would be nice. Or does anyone have another solution when having filename length problems? After succesfully downloading the whole book, the script is unable to save the merged pdf-file (Grundkurs_Statistik_in_den_Sozialwissenschaften_- _Eine_leicht_verstaendliche,_anwendungsorientierte_Einfuehrung_in_das_sozialwissenschaftlich_notwendige_statistische_Wissen.pdf) because the file name ist too long.. any ideas?

Comment by Hans (not verified) (2011-09-07 16:53:00)

Hi, ich bekomme folgenden Fehler nach ein paar Kapiteln (er läd immer 3-5 Kapitel, dann bricht es ab):

     found 24 chapters
    downloading chapter 1/24
    http://springerlink.com/content/978-3-540-40306-7/front-matter.pdf100%
    downloading chapter 2/24
    http://springerlink.com/content/w1j138p3140714v2/fulltext.pdf     100%
    downloading chapter 3/24
    http://springerlink.com/content/x066412311u22858/fulltext.pdf     100%
    downloading chapter 4/24
    http://springerlink.com/content/n1hp12738v607147/fulltext.pdf     100%
    downloading chapter 5/24
    Traceback (most recent call last):
      File "./milian/springer_download.py", line 293, in <module>
        main(sys.argv[1:])
      File "./milian/springer_download.py", line 183, in main
        localFile, mimeType = geturl(chapterLink, "%d.pdf" % i)
      File "./milian/springer_download.py", line 279, in geturl
        lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 237, in retrieve
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 205, in open
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 342, in open_http
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 937, in endheaders
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 797, in _send_output
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 759, in send
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 740, in connect
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 571, in create_connection
    IOError: [Errno socket error] [Errno 60] Operation timed out

Comment by Milian Wolff (2011-09-08 12:25:00)

Hm maybe a new protection from springerlink that does not allow to download many chapters in short time?

I don’t know… Try to add an import time to the other imports at the top of the downloader script. And then add before the return response in def geturl(url, dst) the following: time.sleep(1). Increase the sleep time if it does not work.

bye

Comment by Anonymous (not verified) (2012-10-19 12:48:00)

indeed, it is, so shall i put it into the if-clause or the else-clause?? -because i put it in the else-clause and it worked astonishingly fine, ..so far.. but i rather like to slow it down now.. trust me.

btw, a qualified note: it is better to go not under ~25 sec each when saving some more of these in order to avoid the cobweb and getting unpleasant post. thanks to milian, thanks to the community, for sharing and caring, have a good life, with high regards and honesty, -a friend

Comment by Anonymous (not verified) (2012-10-19 13:21:00)

‘kay, sorry mate for toiling and spamming: it was the if-clause. but why did it solve my problem as well when i put it into the else clause?.. just mumbling..

Comment by Hans (not verified) (2011-09-12 17:30:00)

Thanx dude, it seems to work now. With time.sleep(1) it screened the same error. I increased the level to (10) and now i downloaded 2 books without any error warning.

hans

Comment by Volker (not verified) (2011-09-03 15:24:00)

Hallo, danke für das Skript. Generell funktioniert es sehr gut. Leider muss ich nach jedem Kapitel Benutzername und Passwort (VPN-Zugang) eingeben. Hat jemand das gleiche Problem? Kann man da Abhilfe schaffen?

Grüße Volker

Comment by James Bond (not verified) (2011-07-29 01:36:00)

Hello,

another solution for downloading whole books on Springerlink and save it to a pdf file is the Springerlink-Downloader: http://sebastiankusch.de/springerlink/

Grzz Sebastian

Comment by Brater (not verified) (2011-06-16 10:33:00)

If you don’t like python look here: http://code.google.com/p/springer-loader/ Easy to use!

Comment by Anonymous (not verified) (2011-05-19 20:59:00)

For those having problems running the script under Windows (7?), the problem is the findInPath def, which is called without the extension for iconv, pdftk and stapler (.exe). After adding this to all calls of findInPath the script works perfectly. I’m running it under Windows 7 with cygwin. Finding this problem has taken more than one hour … But was worth it. Maybe you could add an test for Windows and automatically add an “.exe” in this case.

Thanks for your work!

Comment by Blubbafett (not verified) (2012-05-15 18:52:00)

That is it! Thanks for your comment!

And many thanks to this project

Comment by Chris (not verified) (2011-05-02 12:32:00)

Thanks for your script!

Working fine with Cygwin / Windows 7 Don’t forget to install pdftk, iconv and imagemagick (for front cover convert) during setup. I also needed to run rebaseall (cygwin/bin/ash.exe -> bin/rebaseall) before python decided to work…

Comment by Anonymous (not verified) (2011-05-13 22:57:00)

Hi Chris, could you please explain what you did in detail? Because I try to make the script work for quite a while now, but I never worked with cygwin or python before. Although I installed pdftk and iconv I always get the errors that I have to install them. And if I comment them out, I get another error. Thanks in advance!

Comment by Anonymous (not verified) (2011-03-10 02:41:00)

Script is working fine here, thanks a lot!

Comment by exitus_ (not verified) (2011-04-04 02:13:00)

thx for the script! it did not work in the first place for me (arch linux, python2.7) but after messing up with the code (i’m not familiar with python..) it turned out i just had to change the first line:

#! /usr/bin/env python2.7

without this change i get this error:

     
    ./springer_download.py --content=978-3-540-32319-8
        File "./springer_download.py", line 88
        print "fetching book information...\n\t%s" % link
        ^
        SyntaxError: invalid syntax     

this maybe because python uses the newest environment (3.2) if there is no version given. i hope this helps for some other users as well!

Comment by Anonymous (not verified) (2011-06-12 01:25:00)

i still get the same error even if i change the line, or what exactly did you do?

Comment by najmi (not verified) (2011-02-20 19:18:00)

Hi, I got this problem

    $ ./springer_download.py -c 978-0-387-09822-7
    fetching book information...
            http://springerlink.com/content/978-0-387-09822-7/contents/
     
    Now Trying to download book 'Data Mining and Knowledge Discovery Handbook'
     
    found 68 chapters
    downloading chapter 1/68
    Traceback (most recent call last):
      File "./springer_download.py", line 293, in <module>
        main(sys.argv[1:])
      File "./springer_download.py", line 183, in main
        localFile, mimeType = geturl(chapterLink, "%d.pdf" % i)
      File "./springer_download.py", line 279, in geturl
        lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
      File "/usr/lib/python2.6/urllib.py", line 239, in retrieve
        fp = self.open(url, data)
      File "/usr/lib/python2.6/urllib.py", line 207, in open
        return getattr(self, name)(url)
      File "/usr/lib/python2.6/urllib.py", line 355, in open_http
        'got a bad status line', None)
    IOError: ('http protocol error', 0, 'got a bad status line', None)

What to do? Thanks!

Comment by Anonymous (not verified) (2011-03-28 00:36:00)

Hi, super Script, läuft ohne jegliche Probleme - vielen Dank für die Entwicklung!

Comment by horst (not verified) (2011-01-24 15:30:00)

Hallo, leider scheint das Skript mit der neuen Springerlink-Seite nicht (mehr) zu funktionieren:

    ./springer_download.py --content=978-3-540-32319-8
      File "./springer_download.py", line 88
        print "fetching book information...\n\t%s" % link
                                                 ^
    SyntaxError: invalid syntax

Beim Aufruf via ./springer_download.py -l http://springerlink.com/content/978-3-8348-0645-1/contents/ (z.B.) erhalte ich den gleichen Fehler.

iconv und pdftk sind installiert.

Comment by Anonymous (not verified) (2011-03-30 13:28:00)

I have the same problem. I have python 2.7, I do downgrade to 2.6 version?

Comment by Manman (not verified) (2011-01-18 22:27:00)

kann mir bitte jemand helfen ich komme bei inconv nicht weiter ist installiert aber es funzt nicht.

Comment by Anonymous (not verified) (2011-01-18 22:24:00)

I hav a problem whit iconv can some one help me pls?

Comment by ขี้ไก่ (not verified) (2014-01-14 18:06:00)

Inspiring story there. What occurred after? Good luck!

Comment by Milian Wolff (2011-01-19 12:08:00)

Without an error description, no one will be able to help you. sigh how can people still post such lame support requests, I don’t get it…

Comment by Anonymous (not verified) (2011-02-15 22:30:00)

There are 10 kind of people. How can you still not get that?

Comment by Chris (not verified) (2010-11-17 22:21:00)

Great stuff, thank you!

Comment by Fruchtpfote (not verified) (2010-11-05 17:07:00)

On Lubuntu 10.10 I got the following error: “/usr/bin/env: python2: No such file or directory”

I had to change the first line of the script from “#! /usr/bin/env python2” to “#! /usr/bin/env python2.6”

The script worked very well! Thank you!

Comment by Frank (Germany) (not verified) (2010-10-12 16:19:00)

Hi,

i´m using yout great script in windows 7 with cygwin and pdftk. Downloadling works, but at the end, i get an error message:

    merging chapters
    Error: Failed to open output file: ...pdf
    No output created
    book XXX was succesfully download, it was saved to /springerlink/.../...pdf
    Traceback (most recent call last):
    File "./springer_download.py", line 202, in main log ("download %s chapters (%2fMiB) of %s\n" % (len(chapters), os.path.getsize(bookTitlePath)/2.0**20, bookTitle))
    File "/usr/lib/python2.6/genericpath.py", line 49, in getsize return os.stat(filename) .st_size
    0SError: [Errno2] No such file or directory: '/springerlink/...pdf

-> “XXX” means in this case the title of the book and “…” the root directory of the springer_download script.

Can anyone help?

Comment by Katzenstreu (not verified) (2010-10-09 16:11:00)

Hi!

Can you help me? Waht seems to be the problem?

    tim@tim-ubuntu:~/SpringerLink/Thermodynamik$ /home/tim/SpringerLink/springer_download-Kapitel-Fix.py -l http://springerlink.com/content/978-3-8348-0645-1/
    fetching book information...
      http://springerlink.com/content/978-3-8348-0645-1/contents/
     
    Now Trying to download book 'Keine Panik vor Thermodynamik! - Erfolg und Spaß im klassischen „Dickbrettbohrerfach“ des Ingenieurstudiums'
     
    found 16 chapters
    downloading chapter 1/16
    http://springerlink.com/content/978-3-8348-0645-1/front-matter.pdf100%
    downloading chapter 2/16
    http://springerlink.com/content/v355572k12871586/fulltext.pdf     100%
    downloading chapter 3/16
    http://springerlink.com/content/m8g5686153341327/fulltext.pdf     100%
    downloading chapter 4/16
    http://springerlink.com/content/u7t6u3202480077h/fulltext.pdf     100%
    downloading chapter 5/16
    http://springerlink.com/content/xj705040025704h6/fulltext.pdf     100%
    downloading chapter 6/16
    http://springerlink.com/content/h116366326741351/fulltext.pdf     100%
    downloading chapter 7/16
    http://springerlink.com/content/lv70414p3w6wj847/fulltext.pdf     100%
    downloading chapter 8/16
    http://springerlink.com/content/wh26051x3t251244/fulltext.pdf     100%
    downloading chapter 9/16
    http://springerlink.com/content/n26612058123u700/fulltext.pdf     100%
    downloading chapter 10/16
    http://springerlink.com/content/r0m7551412594p30/fulltext.pdf     100%
    downloading chapter 11/16
    http://springerlink.com/content/j780016l201n0232/fulltext.pdf     100%
    downloading chapter 12/16
    http://springerlink.com/content/mx82w653w8236kkn/fulltext.pdf     100%
    downloading chapter 13/16
    http://springerlink.com/content/x74781351u2v2413/fulltext.pdf     100%
    downloading chapter 14/16
    http://springerlink.com/content/n480mn7g71k27276/fulltext.pdf     100%
    downloading chapter 15/16
    http://springerlink.com/content/w481235760232v53/fulltext.pdf     100%
    downloading chapter 16/16
    http://springerlink.com/content/978-3-8348-0645-1/back-matter.pdf 100%
    downloading front cover from http://springerlink.com/content/uu5602/cover-large.gif
    http://springerlink.com/content/uu5602/cover-large.gif            100%
    merging chapters
    Error: Failed to open output file: 
       /home/tim/SpringerLink/Thermodynamik/Keine_Panik_vor_Thermodynamik!_-_Erfolg_und_Spass_im_klassischen_,,Dickbrettbohrerfach"_des_Ingenieurstudiums.pdf
       No output created.
    book Keine Panik vor Thermodynamik! - Erfolg und Spaß im klassischen „Dickbrettbohrerfach“ des Ingenieurstudiums was successfully downloaded, it was saved to /home/tim/SpringerLink/Thermodynamik/Keine_Panik_vor_Thermodynamik!_-_Erfolg_und_Spass_im_klassischen_,,Dickbrettbohrerfach"_des_Ingenieurstudiums.pdf
    Traceback (most recent call last):
      File "/home/tim/SpringerLink/springer_download-Kapitel-Fix.py", line 279, in <module>
        main(sys.argv[1:])
      File "/home/tim/SpringerLink/springer_download-Kapitel-Fix.py", line 202, in main
        log("downloaded %s chapters (%.2fMiB) of %s\n" % (len(chapters),  os.path.getsize(bookTitlePath)/2.0**20, bookTitle))
      File "/usr/lib/python2.6/genericpath.py", line 49, in getsize
        return os.stat(filename).st_size
    OSError: [Errno 71] Protocol error: '/home/tim/SpringerLink/Thermodynamik/Keine_Panik_vor_Thermodynamik!_-_Erfolg_und_Spass_im_klassischen_,,Dickbrettbohrerfach"_des_Ingenieurstudiums.pdf'

Comment by Katzenstreu (not verified) (2010-10-09 17:24:00)

Dieser Fehler trat bei ähnlichen Büchern nicht auf. Gib es Probleme die Datei anzulegen, wegen bestimmter Sonderzeichen? Der Download funktioniert. Das mergen wohl auch?! Aber das öffnen der Datei nicht, Zeile 43. Der gleiche Fehler beim Download mit der Option “-c” anstelle von “-l”. ich habe die zwei Tipps von Jochen (To fix the wrong order) in “mein” Skript eingebaut.

Grüße und danke vielmals für die tolle Arbeit!

Tim

Comment by Anonymous (not verified) (2010-09-02 00:11:00)

Hi,

I spot the problem with double-downloading of back-matter.pdf. As a result, 105-152 pp. are inserted right after the TOC, and also at the end of the document (after p.104, where it should be).

Example.

$ ./springer_download.py -c 978-1-84800-912-7 fetching book information… http://springerlink.com/content/978-1-84800-912-7/contents/

Now Trying to download book ‘A Topological Aperitif’

Best regards, Vitaly

Comment by kynan (not verified) (2010-08-26 10:42:00)

Awesome script! Exactly what I had been looking for for quite some time. I even started coding my own script in python, but only got it to sort-of work. Great job!

Comment by Anonymous (not verified) (2010-08-25 11:28:00)

Hi Ein sehr schönes skript. Ist es möglich das der Dateiname der Ausgabedatei auch den Autor enthält?

Comment by Timba (not verified) (2010-08-11 09:28:00)

Das Design von springerlink.com wurde umgestellt. Das Skript bricht sofort ab, da der Buchtitel nicht gefunden wird. Ein Buch wird nun über die isbn auffindbar —> http://www.springerlink.com/content/[isbn]/contents/

Die Funktionalität des Downloads einer Liste von Büchern ist deaktiviert (schade)..

Comment by Milian Wolff (2010-08-12 22:36:00)

should be fixed now.

Comment by Anonymous (not verified) (2010-08-13 16:22:00)

Almost fixed. You should change line 89 to: if match and match.group(2) and match.group(2).strip() != “”:

Cheers!

Comment by Milian Wolff (2010-08-15 17:02:00)

why?

Comment by Seb (not verified) (2010-08-17 16:09:00)

I just tried your great skript. Without the change suggested on the 13th about 30-40% of the tried books failed to download.

Comment by Milian Wolff (2010-08-18 14:50:00)

well, then I committed it - hope it helps.

thanks for the patch

Comment by Anonymous (not verified) (2010-08-20 09:39:00)

I tried the latest version of the script. But i determine that the order the chapters are downloaded and merged is not correct. At the moment the order is:

    Front-Matter
    Back-Matter
    chapter 1
    chapter 2
    ...
    Back-Matter

Why there is a Back-Matter just behind the Front-Matter? Could you please fix it?

Comment by jochen (not verified) (2010-08-29 11:59:00)

To fix the wrong order problem:

change the code from:

        # get chapters
        for match in re.finditer('href="([^"]+\.pdf)"', page):
            chapterLink = match.group(1)

to:

        # get chapters
        for match in re.finditer('class="sprite pdf-resource-sprite" href="([^"]+\.pdf)"', page):
            chapterLink = match.group(1)

I am not a pro coder, but it should fix this.

Comment by jochen (not verified) (2010-08-29 13:03:00)

To fix cover download change the following:

    # coverimage
                match = re.search(r'<div class="coverImage" style="background-image: url\(/content/([^/]+)/cover-medium\.gif\)">', page)
                if match:
                    coverLink = "http://springerlink.com/contents/" + match.group(1) + "/cover-large.gif"

To:

    # coverimage
                match = re.search(r'<div class="coverImage" title="Cover Image" style="background-image: url\(/content/([^/]+)/cover-medium\.gif\)">', page)
                if match:
                    coverLink = "http://springerlink.com/content/" + match.group(1) + "/cover-large.gif"

Comment by Anonymous (not verified) (2010-08-22 10:03:00)

Hi,

thanks for this nice script. It worked for me till the new SpringerLink Webdesign.

With the latest script, i cant download e. g. this book: http://springerlink.com/content/x421j52q667r0077/

using this: /home/malte/Downloads/springer_download.py -l http://www.springerlink.com/content/x421lj52q667r0077/?sortorder=asc

And it doesnt word for any other book i tried. I dont have a version of your script before 13. august, so i even cant try to get 30 to 40 percent..

Malte

Comment by Seb (not verified) (2010-08-22 14:19:00)

Try using the ISBN instead, in your case ‘springer_download.py -c 978-3-540-24309-0’

Comment by Malte (not verified) (2010-08-22 18:38:00)

Hi,

thanks for your fast reply. But it doenst work for me.

root@malte-desktop:/home/malte# /home/malte/Desktop/springer_download.py -c 978-3-540-24309-0 fetching book information… http://springerlink.com/content/978-3-540-24309-0/

ERROR: Could not evaluate book title - bad link http://springerlink.com/content/978-3-540-24309-0/

Im using a VMWare Ubuntu system, and with old verison it worked very well. I can open the pdf file using my browser, so there is no llicense problem?

Malte

Comment by Seb (not verified) (2010-08-25 11:07:00)

For me it works with exactly the commandline you gave: Now Trying to download book ‘Elektronik für Ingenieure und Naturwissenschaftler’ I am not allowed to download the content, but it started downloading front- and backmatter. Do you use the newest version of the script?

Comment by Anonymous (not verified) (2010-06-08 15:49:00)

This script is a great idea and overall works very well!

I encountered one problem though that I was hoping to get some help with. I compiled a list of links to books I want to download and fed them into springer_download.py via a bash script so basically I could download my whole list one after the other, after a while I kept getting the same error asking if I had access to it no matter what book and no matter the access level (this is even after I included a minute inbetween books in the bash script by adding ‘sleep 60’) are you aware of what might be the cause of this issue?

Thanks!

Comment by Dierk E. (not verified) (2010-06-07 21:58:00)

danke für das Skript, funktioniert prima und spart viel Zeit!

Comment by Dierk E. (not verified) (2010-06-07 22:17:00)

Habe einen Fehler gefunden: wenn der generierte Dateiname Zeichen enthält, die auf dem verwendeten Dateisystem nicht erlaubt sind, wird eine Exception geworfen: enthält der Buchtitel und damit auch der Dateiname bspw. ein “ß”, so macht das Skript daraus ein Fragezeichen. Fragezeichen sind in Dateinamen auf FAT32-Dateisystemen aber nicht erlaubt. Vermutlich resultiert das Fragezeichen aus einer fehlerhaften Zeichendekodierung.

Comment by Dierk E. (not verified) (2010-06-10 13:20:00)

…außerdem versucht das Skript, Dateinamen mit enthaltenem Doppelpunkt anzulegen, was auf FAT32-Dateisystemen ebenfalls zu einem Fehler führt. Die nicht erlaubten Zeiche müssten vor Zeile 179 noch auch bookTitlePath entfernt werden.

Comment by Anonymous (not verified) (2010-05-20 01:14:00)

In my browser I can open the links to the chapters. Seemingly, the geturl function to download the chapters has no useragent set or something.

Comment by Anonymous (not verified) (2010-05-20 00:08:00)

Bei dem Buch hier: http://www.springerlink.com/content/t1166x/?p=1b42507155aa4d7387e1980dd0…

funktioniert es auch nicht mehr.

ERROR: downloaded chapter http://springerlink.com/content/q1n2783365424175/fulltext.pdf has invalid mime type text/html - are you allowed to download Funktionalanalysis Sechste, korrigierte Auflage 2007 - Springer Berlin Heidelberg?

bin aber eingeloggt und habe vpn, so dass ich das Buch manuell im browser anschauen und herunterladen kann

Comment by Tony (not verified) (2010-04-29 07:56:00)

My apologies for knowing nothing about Python, but is there an easy way to remove the pdftk part of the script and output all the individual pdfs? Combining the chapters is running into problems with books like this one:

http://springerlink.com/content/v12557/

It has the front-matter.pdf and back-matter.pdf files on every page which leads to a lot of repetition in the output. It would also be handy for those running under Mac OS X as it was a pain to build pdftk from source.

Comment by Anonymous (not verified) (2010-06-06 12:29:00)

there is an pre build version of pdftk avaible at http://www.accesspdf.com/pdftk/. Works like a charm for me (10.6.x) altough it’s built for 10.3 :D just some dylibs …

many thanks to milan for the great script! it comes really really handy some times!

Comment by Anonymous (not verified) (2010-06-05 02:23:00)

Well, python isn’t that hard to read. Just search for the term “pdftk” in the source-/script-file and comment it out (putting a hash-sign infront of the line), then run the script and see if an error comes up. If so, it will direct you to the line where it occurs, so you can investigate further. It’s really simple, try it out :)

Comment by Anonymous (not verified) (2010-04-21 20:52:00)

Anscheinen haben sie ihre Oberfläche geändert. Beispiel:

http://www.springerlink.com/content/l1x446/?p=79dea83afbc4473f8787e407e5…

Hier findet das Skript nur noch die vor und Rückseite. Die weiteren Kapitel sind in Unterkapitel eingeteilt. evtl. könnte ein findiger Programmiere das Skript anpassen?

Wäre ne super Sache!! Besten dank!

Comment by Anonymous (not verified) (2010-04-21 15:13:00)

Möchtest du das gesamte Angebot herunterladen? Das gestattet dir weder Springer noch deine Universität, es steht ja bereits das vollständige Herunterladen eines einzigen Buches in Konflikt mit den Nutzungsvereinbarungen.

Comment by Milian Wolff (2010-04-22 11:44:00)

Wo steht das denn, dass man nicht mal ein vollständiges Buch herunterladen darf? Und nein: Ich möchte natürlich nicht das gesamte Angebot herunterladen und habe das auch immer denjenigen gesagt, die nach etwas derartigem gefragt haben. Das ist völliger Schwachsinn.

Aber ich sehe nicht den Schaden, wenn man sich ein komplettes Buch herunterlädt? Ich muss zugeben, dass ich das auch ohne das Skript tun würde, um an die Bücher und Kapitel zu gelangen, die für eine Vorlesung erfordert werden.

Comment by Angelo (not verified) (2010-04-19 08:22:00)

Hallo Milian! Großartiges Skript! Läuft sehr sehr geil. Gibt es Ansätze bzw. Möglichkeiten das Skript derart zu erweitern, dass alle Bücher mit einer Befehlszeile runtergeladen werden können?

Viele Grüße aus Dortmund unter anderem von Jan S. :) Angelo

Comment by Anonymous (not verified) (2010-04-13 17:38:00)

Hallo, bei mir bleibt das Skript unter Ubuntu 8.04 beim downloaden immer irgendwo hängen (wie auch von Christian im Beitrag vom 02/22/2010).

Comment by Anonymous (not verified) (2010-04-13 20:04:00)

Hab es jetzt noch mehrmals probiert. Einmal hat es das Skript geschafft durchzulaufen ohne hängenzubleiben. War aber nur einmal wo es funktioniert hat bei etwa 20 Versuchen, dass selbe Buch zu downloaden. Ansonsten ist er immer hängen geblieben. Immer an unterschiedlichen Artikeln.

Comment by Anonymous (not verified) (2010-04-08 12:09:00)

Ja, es funktioniert plötzlich nicht mehr. Könnte der Autor das Script an die Veränderungen anpassen? Würde mich sehr freuen!

Comment by Anonymous 5 (not verified) (2010-03-25 14:56:00)

Sie haben so eine Art “Zwischenschicht” bei manchen Büchern. Das Script sucht scheinbar nur auf der initialen Buchseite nach *.pdf Links und lädt sie runter. Springerlink hat jetzt bei manchen Büchern jedes Kapitel noch in eine Art extra Ordner gepackt, wo jeweils nur noch mal die Frontmatter und das Kapitel drin ist (die nennen das dort Teil …). Das wird vom Script bisher nicht berücksichtigt.

Comment by Anonymous 2 (not verified) (2010-03-15 23:27:00)

Hallo,

kann es sein, dass auf der springerlink seite was geändert wurde? Hat bisher gut funktioniert. Scheint aber seit neuestem ein Problem zu haben.

Comment by Anonymous (not verified) (2010-03-13 17:27:00)

hi,

bei mir geht das tool nicht… glaub man muss sich jetzt einloggen. hat jemand eine alternerive? wär super

Comment by Matthias (not verified) (2010-03-12 05:43:00)

Hallo, danke für das Skript, konnte es bisher mangels VPN nicht testen, aufgrund der positiven Resonanz gehe ich aber davon aus, dass es bestens funktioniert. Auch ich wäre an einer Version interessiert, die Bookmarks erstellt und dazu die jeweils passende Beschreibung von Springerlink mit einbindet. Ein Programm zum erstellen von Bookmarks mit Linux ist JPdfBookmarks, Anleitung dazu gibts hier: http://jpdfbookmarks.altervista.org und das Programm selbst unter http://flavianopetrocchi.blogspot.com/2008/07/jpsdbookmarks-download-pag… Wäre super, wenn du das noch einbauen könntest, ich selber komme derzeit leider nicht dazu, habe hier einen eingeschränkten Internetzugang und kann das Programm nicht downloaden. Außerdem geht mein VPN nicht…

Grüße, Matthias

Comment by Seb (not verified) (2010-03-01 14:26:00)

I get an error as well “WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden” what I can do?

The book you are trying to download is called ‘Word 2007’

found 14 chapters downloading chapter 1/14 http://www.springerlink.com/content/m5427g/front-matter.pdf 100% downloading chapter 2/14 http://springerlink.com/content/u838718169040815/fulltext.pdf 100% downloading chapter 3/14 http://springerlink.com/content/t347382q85766802/fulltext.pdf 100% downloading chapter 4/14 http://springerlink.com/content/rm4183k750h28k27/fulltext.pdf 100% downloading chapter 5/14 http://springerlink.com/content/q25m417372705567/fulltext.pdf 100% downloading chapter 6/14 http://springerlink.com/content/t7303462187j3t17/fulltext.pdf 100% downloading chapter 7/14 http://springerlink.com/content/n191l36489484284/fulltext.pdf 100% downloading chapter 8/14 http://springerlink.com/content/wp7t1657u28p4774/fulltext.pdf 100% downloading chapter 9/14 http://springerlink.com/content/l3126077869171g2/fulltext.pdf 100% downloading chapter 10/14 http://springerlink.com/content/j6399r4330572128/fulltext.pdf 100% downloading chapter 11/14 http://springerlink.com/content/p2262884852pl1w4/fulltext.pdf 100% downloading chapter 12/14 http://springerlink.com/content/u647377241368kl7/fulltext.pdf 100% downloading chapter 13/14 http://springerlink.com/content/t5h400553644360l/fulltext.pdf 100% downloading chapter 14/14 http://www.springerlink.com/content/m5427g/back-matter.pdf 100% merging chapters Traceback (most recent call last): File “C:\Dokumente und Einstellungen\Sebastian\Desktop\sp\springer_download.py”, line 238, in main(sys.argv[1:]) File “C:\Dokumente und Einstellungen\Sebastian\Desktop\sp\springer_download.py”, line 147, in main p1 = subprocess.Popen([“echo”, bookTitle], stdout=subprocess.PIPE) File “D:\Python26\lib\subprocess.py”, line 621, in **init** errread, errwrite) File “D:\Python26\lib\subprocess.py”, line 830, in _execute_child startupinfo) WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden

Comment by Milian Wolff (2010-03-01 20:07:00)

I don’t know, I won’t support Windows. Try cygwin as the poster above you said that it works.

Comment by Anonymous (not verified) (2010-02-24 20:31:00)

Ich wollte nur mal rückmelden, dass das Skript mit Cygwin unter Windows 7 hervorragend und ohne Probleme arbeitet. Bei der Installation von Cygwin muss man natürlich darauf achten, die entsprechenden Pakete auszuwählen. Ein großes Danke an den Autor für die Arbeit!

Comment by Katzenstreu (not verified) (2010-10-01 19:29:00)

Kannst du sagen, welche Pakete benötigt werden? Im Internet fand ich leider keine Hilfe.

Comment by Christian (not verified) (2010-02-22 17:26:00)

Nachdem ich es ja unter Windows teilweise zum laufen bekommen habe sind das meine ersten gehversuche mit Linux, aber trotzdem lädt er es nicht runter…. (VPN ist aktiviert)

vll kann mir ja jemand einen Tipp geben.

    ubuntu@ubuntu-desktop:~/Desktop$ ./springer_download.py -l http://springerlink.com/content/h61v67/
    Please wait, link source is being downloaded...
       http://springerlink.com/content/h61v67/
     
    The book you are trying to download is called 'Dubbel'
     
    Please wait, link source is being downloaded...
     http://springerlink.com/content/h61v67/?sortorder=asc&p_o=10
    Please wait, link source is being downloaded...
       http://springerlink.com/content/h61v67/?sortorder=asc&p_o=20
    found 27 chapters
    downloading chapter 1/27
    http://springerlink.com/content/h61v67/front-matter.pdf            95%

^^ hier an der stelle hängt er

Comment by RS(15,11) (not verified) (2010-02-05 20:54:00)

Hallo, bin alter Windows Benutzer und hab mich seit gestern auch wegen deinem Skript in Unix eingearbeitet. Benutze nun Cygwin und mit der neuesten Version des Skriptes läuft alles super. Vielen Dank!!

Comment by Vitaly (not verified) (2010-02-03 01:50:00)

Hi,

thank you for this script.

Since about a week ago, it stopped working though:

………………………………….

$ ./springer_download.py -l http://www.springerlink.com/content/qv89j2/?p=101a335b740a47c7a7578b7d16…

$ Please wait, link source is being downloaded… http://www.springerlink.com/content/qv89j2/

ERROR: Could not evaluate book title - bad link?

Usage: springer_download.py [OPTIONS]

Options: -h, —help Display this usage message -l LINK, —link=LINK defines the link of the book you intend to download -c HASH, —content=HASH builds the link from a given HASH (see below)

………………………………….

This error appears for whatever book I try to download. Is it because they changed directory structure or something else @ Springer?

Thank you, Vitaly

Comment by Milian Wolff (2010-02-03 16:01:00)

Thanks for the heads up, I fixed the code to circumvent this springerlink “protection” (it didn’t accept the default UserAgent that was sent by python…). Should work properly now (assuming you have the rights to access this book, which I / the FU-Berlin) hast not it seems.

Comment by Vitaly (not verified) (2010-02-12 08:47:00)

By the way, I spotted another glitch: if the book name contains the colon sign (‘:’), the book is downloaded OK but cannot be saved, as file name cannot include colons. You could substitute it with dash or something…

Comment by Milian Wolff (2010-02-12 13:59:00)

Can you give me an example? I don’t see why a colon should be removed from a filename, it’s perfectly valid imo. At least on Unix:

    $> touch "asdf:foobar"
    $> ls
    asdf:foobar
    $> rm asdf\:foobar
    rm: remove regular empty file asdf:foobar? y

Comment by Vitaly (not verified) (2010-02-12 08:38:00)

Thanks a lot for prompt response, Milian! It works great now.

Comment by Thomas (not verified) (2010-02-04 16:03:00)

Hi, erstmal danke für das Script, ich benutze es schon ziemlich lange…Ich habe jetzt allerdings auch Probleme beim downloaden. Ich bekomme folgende Fehlermeldung:
Please wait, link source is being downloaded…
http://www.springerlink.de/content/q28652/

The book you are trying to download is called ‘Regelungstechnik 1’

found 15 chapters
downloading chapter 1/15
http://www.springerlink.de/content/q28652/front-matter.pdf -819200%

ERROR: downloaded chapter http://www.springerlink.de/content/q28652/front-matter.pdf has invalid mime type text/html - are you allowed to download it?

“Per Hand” kann ich die pdfs der einzelnen Kapitel allerdings problemlos herunterladen.

Comment by Milian Wolff (2010-02-05 00:01:00)

Hm dann stimmt wohl noch was nicht - muss ich mir mal anschauen. Evtl. wird noch der Referer gecheckt oder sowas - mal gucken was die Leute von SpringerLink sich da ausdenken um es uns Studenten zu erschweren an die Bücher zu kommen… seufz

Comment by Thomas (not verified) (2010-02-05 13:35:00)

Ich habe mal was probiert, scheint sogar geklappt zu haben :)

    #! /usr/bin/env python
     
    # -*- coding: utf-8 -*-
     
    import os
    import sys
    import getopt
    import urllib
    import re
    import tempfile
    import shutil
    import subprocess
     
    class SpringerDownloader(urllib.URLopener):
       version = "Mozilla"
     
    # Set some kind of User-Agent so we don't get blocked by SpringerLink
    class SpringerURLopener(urllib.FancyURLopener):
        version = "Mozilla"
     
     
    # validate CLI arguments and start downloading
    def main(argv):
        if not findInPath("pdftk"):
            error("You have to install pdftk.")
        if not findInPath("iconv"):
            error("You have to install iconv.")
     
        try:
            opts, args = getopt.getopt(argv, "hl:c:", ["help", "link=","content="])
        except getopt.GetoptError:
            error()
     
        link = ""
     
        for opt, arg in opts:
            if opt in ("-h", "--help"):
                usage()
                sys.exit()
            elif opt in ("-c", "--content"):
                if link != "":
                    error("-c and -l arguments are mutually exclusive")
     
                link = "http://springerlink.com/content/" + arg
            elif opt in ("-l", "--link"):
                if link != "":
                    error("-c and -l arguments are mutually exclusive")
     
                link = arg
     
        if link == "":
            error("You have to define a link.")
        if not re.match("https?://(www\.)?springerlink.(com|de)/content/[a-z0-9\-]+/?(\?[^/]*)?$", link):
            error("Bad link given. See LINK below.")
     
        # remove all arguments from link
        link = re.sub(r"/?\?[^/]*$", "/", link)
     
        #make sure the link ends on a slash
        if link[-1] != "/":
          link += "/"
     
        baseLink = link
     
        chapters = list()
        hasFrontMatter = False
        hasBackMatter = False
     
        loader = SpringerURLopener();
     
     
        bookTitle = ""
     
        while True:
            # download page source
            try:
                print "Please wait, link source is being downloaded...\n\t%s" % link
                page = loader.open(link).read()
            except IOError, e:
                error("Bad link given (%s)" % e)
     
            if re.search(r'403 Forbidden', page):
                error("Could not access page: 403 Forbidden error.")
     
            if bookTitle == "":
                match = re.search(r'<h2 class="MPReader_Profiles_SpringerLink_Content_PrimitiveHeadingControlName">([^<]+)</h2>', page)
                if not match or match.group(1).strip() == "":
                    error("Could not evaluate book title - bad link?")
                else:
                    bookTitle = match.group(1).strip()
                print "\nThe book you are trying to download is called '%s'\n" % bookTitle
     
     
            # get chapters
            for match in re.finditer('href="([^"]+.pdf)"', page):
                chapterLink = match.group(1)
                if chapterLink == "back-matter.pdf":
                    hasBackMatter = True
                    continue
                if chapterLink == "front-matter.pdf":
                    hasFrontMatter = True
                    continue
                if chapterLink[:7] == "http://":
                    continue
                chapters.append(chapterLink)
     
            # get next page
            match = re.search(r'<a href="([^"]+)">Next</a>', page)
            if match:
                link = "http://springerlink.com" + match.group(1).replace("&amp;", "&")
            else:
                break
     
        if hasFrontMatter:
            chapters.insert(0, "front-matter.pdf")
     
        if hasBackMatter:
            chapters.append("back-matter.pdf")
     
        if len(chapters) == 0:
            error("No chapters found - bad link?")
     
        print "found %d chapters" % len(chapters)
     
        # setup
        curDir = os.getcwd()
        tempDir = tempfile.mkdtemp()
        os.chdir(tempDir)
     
        i = 1
        fileList = list()
     
        for chapterLink in chapters:
            if chapterLink[0] == "/":
                chapterLink = "http://springerlink.com" + chapterLink
            else:
                chapterLink = baseLink + chapterLink
     
            print "downloading chapter %d/%d" % (i, len(chapters))
            localFile, mimeType = geturl(chapterLink, "%d.pdf" % i)
     
            if mimeType.gettype() != "application/pdf":
                os.chdir(curDir)
                shutil.rmtree(tempDir)
                error("downloaded chapter %s has invalid mime type %s - are you allowed to download it?" % (chapterLink, mimeType.gettype()))
     
            fileList.append(localFile)
            i += 1
     
        print "merging chapters"
     
        p1 = subprocess.Popen(["echo", bookTitle], stdout=subprocess.PIPE)
        p2 = subprocess.Popen(["iconv", "-f", "UTF-8", "-t" ,"ASCII//TRANSLIT"], stdin=p1.stdout, stdout=subprocess.PIPE)
        bookTitlePath = p2.communicate()[0]
        bookTitlePath = bookTitlePath.strip()
        if bookTitlePath == "":
            os.chdir(curDir)
            shutil.rmtree(tempDir)
            error("could not transliterate book title %s" % bookTitle)
     
        bookTitlePath = bookTitlePath.replace("/", "-")
        bookTitlePath = re.sub("\s+", "_", bookTitlePath)
     
        bookTitlePath = curDir + "/%s.pdf" % bookTitlePath
     
        if len(fileList) == 1:
          shutil.move(fileList[0], bookTitlePath)
        else:
          os.system("pdftk %s cat output '%s'" % (" ".join(fileList), bookTitlePath))
     
        # cleanup
        os.chdir(curDir)
        shutil.rmtree(tempDir)
     
        print "book %s was successfully downloaded, it was saved to %s" % (bookTitle, bookTitlePath)
     
        sys.exit()
     
    # give a usage message
    def usage():
        print """Usage:
    %s [OPTIONS]
     
    Options:
      -h, --help              Display this usage message
      -l LINK, --link=LINK    defines the link of the book you intend to download
      -c HASH, --content=HASH builds the link from a given HASH (see below)
     
    You have to set exactly one of these options.
     
    LINK:
      The link to your the detail page of the ebook of your choice on SpringerLink.
      It lists book metadata and has a possibly paginated list of the chapters of the book.
      It has the form:
        http://springerlink.com/content/HASH/STUFF
      Where: HASH is a string consisting of lower-case, latin chars and numbers.
             It alone identifies the book you intent do download.
             STUFF is optional and looks like ?p=...&p_o=... or similar. Will be stripped.
    """ % os.path.basename(sys.argv[0])
     
    # raise an error and quit
    def error(msg=""):
        if msg != "":
            print "\nERROR: %s\n" % msg
        usage()
        sys.exit(2)
     
        return None
     
    # based on http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python
    def findInPath(prog):
        for path in os.environ["PATH"].split(os.pathsep):
            exe_file = os.path.join(path, prog)
            if os.path.exists(exe_file) and os.access(exe_file, os.X_OK):
                return True
        return False
     
    # based on http://mail.python.org/pipermail/python-list/2005-April/319818.html
    def _reporthook(numblocks, blocksize, filesize, url=None):
        #XXX Should handle possible filesize=-1.
        try:
            percent = min((numblocks*blocksize*100)/filesize, 100)
        except:
            percent = 100
        if numblocks != 0:
            sys.stdout.write("\b"*70)
        sys.stdout.write("%-66s%3d%%" % (url, percent))
     
    def geturl(url, dst):
        downloader = SpringerDownloader();
        if sys.stdout.isatty():
       response = downloader.retrieve(url, dst,
                               lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
            sys.stdout.write("\n")
        else:
            response = downloader.retrieve(url, dst)
     
        return response
     
     
    # start program
    if __name__ == "__main__":
        main(sys.argv[1:])

Ich habe nur oben das hinzugefügt:

    class SpringerDownloader(urllib.URLopener):
        version = "Mozilla"

Und die def geturl(url, dst) geändert in:

    def geturl(url, dst):
        downloader = SpringerDownloader();
        if sys.stdout.isatty():
       response = downloader.retrieve(url, dst,
                               lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
            sys.stdout.write("\n")
        else:
            response = downloader.retrieve(url, dst)
     
        return response

Comment by Milian Wolff (2010-02-05 14:16:00)

Great, thanks for the patch. I included it now (slightly different). Does it work with the vanilla source from github again now? I ask since I can still not download that one book ;-)

Comment by pappy (not verified) (2009-08-11 14:55:00)

i can’t download with script now, error here

“Please wait, link source is being downloaded… http://springerlink.com/content/f54k582l0w11xj18/

The book you are trying to download is called ‘Architecture of an LBS Platform to Support Privacy Control for Tracking Moving Objects in a Ubiquitous Environments’

found 1 chapters downloading chapter 1/1 http://springerlink.com/content/f54k582l0w11xj18/fulltext.pdf 100%

ERROR: downloaded chapter http://springerlink.com/content/f54k582l0w11xj18/fulltext.pdf has invalid mime type text/html - are you allowed to download it? “

Plz help :(

Comment by Milian Wolff (2009-08-11 15:18:00)

You need to be authenticated for SpringerLink via VPN. This script does not support any other authentication.

I myself use it from my university where access to springerlink is automatically authenticated. If it is the same for your university, access one of the servers there and run the script from there. Ask your IT department.

Comment by moohh (not verified) (2010-02-03 23:18:00)

I am authenticated via VPN and I can manually download the books by using Firefox, but if I want to try this script, I get the same error.

The download of a single pdf via wget doesn’t work either. I got ERROR 403.

Comment by Milian Wolff (2010-02-04 00:45:00)

update the script, I fixed that a few hours ago.

Comment by Anonymous (not verified) (2010-07-12 19:06:00)

Nevermind, I figured it out. Springer doesn’t like wget, so you need to fake the browser id using -U ‘Mozilla/5.0’

Comment by Anonymous (not verified) (2010-07-12 18:25:00)

Milian,

Thanks for the script. Can you give some insight on how you fixed the 403 error when wget’ing a single PDF?

I’m trying to do something similar and can’t get past this issue …

Comment by Anonymous (not verified) (2009-07-30 17:01:00)

Habe Cygwin installiert, darin läuft es. Habe nur bezüglich iconv auf Folgendes abgeändert:

p2 = subprocess.Popen([“iconv”, “-f”, “UTF-8”, “-t” ,”CP1258”],

Umlaute kommen dann komisch und es gibt ein Problem wenn Fragezeichen drin sind (im Dateinamen)

Eine andere Schwierigkeit kommt auf, wenn ein Buch aus mehreren Untervolumes besteht, da versagt dann das downloaden.

Das Skript ist genial gemacht, besten Dank an den Autor!

Comment by Christian (not verified) (2009-07-27 23:49:00)

Hey, tolles Script! Ich habe versucht es unter Windows zum laufen zu bekommen - und es auch geschafft!!! Musste jedoch die beiden Abfragen ob pdftk und iconv vorhanden sind abschalten. Beide gibt es für Windows und ich habe sie so integriert, dass sie Platformweit aufrufbar sind.

Herunterladen funktioniert, jedoch folgendes Problem:

    D:\Desktop\springerlink download>springer_download.py --link=http://springerlin
    .com/content/w6536t/?p=007eb555ebe6438c861aa6ca3f773b5d  & pi=3
    Please wait, link source is being downloaded...
            http://springerlink.com/content/w6536t/
     
    The book you are trying to download is called 'Thermodynamik'
     
    found 9 chapters
    downloading chapter 1/9
    http://springerlink.com/content/w6536t/front-matter.pdf           100%
    downloading chapter 2/9
    http://springerlink.com/content/v0p76175827831r5/fulltext.pdf     100%
    downloading chapter 3/9
    http://springerlink.com/content/u402v50mx102830w/fulltext.pdf     100%
    downloading chapter 4/9
    http://springerlink.com/content/t4u767j96639714v/fulltext.pdf     100%
    downloading chapter 5/9
    http://springerlink.com/content/v125w51816124w22/fulltext.pdf     100%
    downloading chapter 6/9
    http://springerlink.com/content/vt14m63616714v11/fulltext.pdf     100%
    downloading chapter 7/9
    http://springerlink.com/content/mr2016q56221t25q/fulltext.pdf     100%
    downloading chapter 8/9
    http://springerlink.com/content/j0572760j2273063/fulltext.pdf     100%
    downloading chapter 9/9
    http://springerlink.com/content/w6536t/back-matter.pdf            100%
    merging chapters
    Traceback (most recent call last):
      File "D:\Desktop\springerlink download\springer_download.py", line 231, in <m
    dule>
        main(sys.argv[1:])
      File "D:\Desktop\springerlink download\springer_download.py", line 141, in ma
    n
        p1 = subprocess.Popen(["echo", bookTitle], stdout=subprocess.PIPE)
      File "d:\Programme\Python\lib\subprocess.py", line 595, in __init__
        errread, errwrite)
      File "d:\Programme\Python\lib\subprocess.py", line 804, in _execute_child
        startupinfo)
    WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden
    Der Befehl "pi" ist entweder falsch geschrieben oder
    konnte nicht gefunden werden.

Woran scheitert die weitere Verarbeitung? LG

Comment by Milian Wolff (2009-07-28 01:51:00)

Naja, ich will den Namen in iconv pipen, weiß nicht ob das auf Windows überhaupt geht. Notfalls einfach auskommentieren und damitleben, dass der dir ggf. versucht ne Datei anzulegen die “ungute” Zeichen im Namen enthält… Oder nen anderen Weg finden iconv unter Windows aufzurufen (ohne echo). Oder vlt. mingw installieren - könnte gehen…

Comment by Anonymous (not verified) (2009-06-02 02:43:00)

What about a Windows compatible script to allow download of articles from journals in an organized fashion? Thanks for ur consideration

Comment by Milian Wolff (2009-06-02 12:59:00)

I don’t use Windows and won’t make the script windows-compatible. Yet I’d happily accept patches. Since python is cross- platform it should’nt be too hard. You’d just have to find alternatives to pdftk and iconv. These two dependencies make the script platform dependent.

Comment by flo (not verified) (2009-04-08 19:29:00)

hi milian!

ich hab mal schnell ne “just HASH” option eingefügt, wenn du willst ..

        try:
            opts, args = getopt.getopt(argv, "hlc:", ["help", "link=","content="])
        except getopt.GetoptError:
            error()
     
        link = ""
     
        for opt, arg in opts:
            if opt in ("-h", "--help"):
                usage()
                sys.exit()
        if opt in ("-l", "--link"):
            link = arg
     if opt in ("-c", "--content"):
         link = "http://springerlink.com/content/" + arg + "/?p="
     
    ...
     
    # give a usage message
    def usage():
        print """Usage:
    %s [OPTIONS]
     
    Options:
      -h, --help              Display this usage message
      -l LINK, --link=LINK    using the whole link to start downloading
      -c HASH, --content=HASH uses just the HASH to start
     
    LINK:
      The link to your the detail page of the ebook of your choice on SpringerLink.
      It lists book metadata and has a possibly paginated list of the chapters of the book.
      It has the form:
        http://springerlink.com/content/HASH/STUFF
      Where: HASH is a string consisting of lower-case, latin chars and numbers.
             STUFF is optional and looks like ?p=...&p_o=... or similar. Will be stripped.

is keine “schöne” lösung, aber ging schnell ^^

Ansonsten, großartig dein Skript!

Comment by Milian Wolff (2009-04-08 20:32:00)

Hab grad deinen Patch, leicht verändert, zu github geschoben. Danke :)

~~~

just pushed a commit to github with your patch (slightly modified). Thanks!

Comment by Anonymous (not verified) (2009-04-04 16:37:00)

Thanks for the quick reply. I think the problem is the university firewall which seems to be blocking the traffic as I can use the script from outside the university. One suggestion is to make this work as a firefox plugin (perhaps with imacros). Thanks for the useful script!

Comment by Anonymous (not verified) (2009-04-03 19:55:00)

Hello. This software is an excellent idea but I get the following error:

4]# ./springer_download.py -l “http://www.springerlink.com/content/h381wp/?sortorder=asc&v=expanded” Please wait, link source is being downloaded… http://www.springerlink.com/content/h381wp/

ERROR: Bad link given ([Errno socket error] (110, ‘Connection timed out’))

The springerlink address is correct because I can paste it into a browser and both the webpage and the pdfs open up properly. I’m using andlinux which runs ubuntu as a service in windows vista. This could be the cause of the problem but the browser, Synaptic Package Manager and pinging from the console work. I’ve also tried to disable my firewall but this did not fix the problem. Thanks in advance for any insight into this problem.

Comment by Milian Wolff (2009-04-03 20:40:00)

I think I know what is the cause yet can’t test it myself right now. Try with the following link:

http://www.springerlink.com/content/h381wp/?sortorder=asc

Note the different layout of the page, I think that’s the cause. Hope that helps.

Comment by Anonymous (not verified) (2009-03-16 21:20:00)

Sehr gut - genau was ich gesucht habe! Danke!

weitere Wünsche:

Die Seitenzahlen des PDF’s mit den Tatsächlichen synchronisieren.
Die einzelnen Kapitel über das PDF-Inhaltsverzeichniss anwählbar machen!

Comment by Milian Wolff (2009-03-16 22:29:00)

Beides nicht wirklich möglich, da man dafür die PDFs bearbeiten müsste. Und PDF ist mehr oder weniger ein Read-Only- Dateiformat.

Comment by Anonymous (not verified) (2010-01-04 13:33:00)

Und was wäre wenn man die einzelnen Kapitel mit Namen als Bookmarks in das endgültige PDF einfügen könnte? Ginge das?

Comment by Milian Wolff (2010-01-04 17:11:00)

Find es raus, ich hab keine Ahnung von PDF-Authoring.

Comment by Matthias (not verified) (2010-03-12 05:42:00)

Grüße, Matthias

Published on March 12, 2010.

Tags: