Code SnippetsTake 2: Download script for springerlink.com Ebooks Syndicate content

Tue, 02/24/2009 - 22:58

NOTE: This script is apparently against the licensing contract between universities and Springer, see: http://www.bib.hm.edu/aktuelles/news/newsdetail_9984.de.html

NOTE 2: I do not maintain this script anymore. Please look for an alternative.

Seems like quite some people are interested in my bash script for downloading ebooks from http://springerlink.com.

That script has some quirks, the greatest of all that it was written in bash which makes it kind of hard to implement new features. And one which was requested was support for books which span multiple pages on SpringerLink.

So here I present springer_download.py - a Python rewrite which should handle all the old links and some more. This is the very first program I’ve written in Python. And since it has to run on the Zedat servers it’s limited to Python 2.4.x without any fancy shmancy additions (a pity, since I’d love to use urlgrabber or pycurl).

the script

You can find the sources on GitHub: http://milianw.github.com/springer_download/

I plan to put all my future code snippets in public repositories on GitHub. That way you can easily track changes and stay up to date. GitHub also has a nice “download” feature which you can use to get the current version. You can find my profile and my repositories at http://github.com/milianw

Note: This script is intended to be run under Linux or other *nix’es which fulfill the requirements (Python 2.4.x, iconv and pdftk). Windows is not supported.

TODO
  • introduce multithreading for faster / simultaneous downloads
  • add speed to progressbar
  • use progressbar in source-downloader
  • use one git-repo per project (makes links work properly)

Comments

Hi! IMHO ist das Skript nun Thu, 11/08/2012 - 12:40 — Felix Krull (not verified)

Hi! IMHO ist das Skript nun nicht mehr einsatzfaehig. Die haben die Seiten und damit auch das Namensschema verändert. Es sollte anpassbar sein, aber ich kann es nicht.

Grüße Felix

they pretty suck. damn, can Mon, 11/12/2012 - 08:42 — Anonymous (not verified)

they pretty suck. damn, can someone help or does someone has got some tips how i can fix the code. i would appreciate verily, thx anyhow.

it’s easy. Download firefox Mon, 04/29/2013 - 08:11 — Anonymous (not verified)

it’s easy. Download firefox and take the download extension called Downthemall. Right click on the page ( in firefox ) click on downthemall and put the tick on filter of pdf docs and click on start. You will wonder how easy it was… :)

Try out this fork: Mon, 11/12/2012 - 09:13 — Anonymous (not verified)

i am baffled and amazed. Mon, 11/12/2012 - 15:27 — Anonymous (not verified)

i am baffled and amazed. ingenious. thx mate, verily. well i can’t offer a spoon but maybe an impalpably and puny contribution to the table manners (don’t know if i am right resp. up2date with the possibly new cobweb): try not to come up to 75 files each 1/2 hour.

There’s a remake of Milian Mon, 12/10/2012 - 02:35 — Anonymous (not verified)

There’s a remake of Milian Wolff’s great tool available from this website: http://tovotu.de/dev/518-Neuer-SpringerLink-Downloader/

Use google translate if you don’t understand the German description: http://translate.google.de/translate?sl=de&tl=en&u=http%3A%2F%2Ftovotu.d…

Seems to work great and preserves table of contents as well as page labeling for downloaded books.

Since a few days/weeks the Mon, 10/22/2012 - 23:10 — Matze (not verified)

Since a few days/weeks the script merges the chapters not correctly! The chapters are in random order in the pdf. Anyone has an idea?

Since a few days/weeks the Mon, 10/22/2012 - 23:09 — Matze (not verified)

Since a few days/weeks the script merges the chapters not correctly! The chapters are in random order in the pdf. Anyone has an idea?

Hi, the script always worked Mon, 10/22/2012 - 08:45 — Dennis (not verified)

Hi, the script always worked flawlessly until I tried to download a book yesterday. It’s not that it’s not running, it just forgets the last chapter (backmatter.pdf) and the coverpage. Best regards.

Vielen, vielen Dank, Milian Sat, 10/13/2012 - 02:22 — Paul (not verified)

Vielen, vielen Dank, Milian und ebenso der community, welche emsig nach Verbesserungen trachtet. Das ist eine große Erleichterung (-wie schön, wenn doch ALLE Lektüre online wäre; was die derzeitige Episteme wohl noch länger zu unterbinden weiß)

Bei manchem Buch bekomme ich allerdings derart Fehlermeldung angezeigt: “… found 500 chapters downloading chapter 1/500 httphttp:http:http:http:http:http:http:http:http://springerlink.com/content/pln81m2474hxmpn6/fulltext.pdf -8192000%

ERROR: downloaded chapter http://springerlink.com/content/pln81m2474hxmpn6/fulltext.pdf has invalid mime type text/html - are you allowed to download Wörterbuch der Psychotherapie? ” Weiß wer einen Rat?

oder: “downloading chapter Sat, 10/13/2012 - 04:18 — Paul (not verified)

oder: “downloading chapter 72/201 http://springerlink.com/content/x5w42567v106m086/fulltext.pdf 100%

ERROR: downloaded chapter http://springerlink.com/content/x5w42567v106m086/fulltext.pdf has invalid mime type text/plain - are you allowed to download Psychopraxis?”

-Es scheint wohl auch nicht unbedingt an der Größe bzw. Anzahl der chapter zu liegen, wie ich anfänglich vermutete (-zumindest gelang eine Datei mit über 400 chapters)

Sofern es glückt, erscheinen dann nur noch manchmal Kapitel in falscher Reihenfolge; aber immerhin sind sie irgendwo im Dokument enthalten. Woran könnte das liegen?

Vielen, vielen Dank, Milian Sat, 10/13/2012 - 02:20 — Paul (not verified)

Vielen, vielen Dank, Milian und ebenso der community, welche emsig nach Verbesserungen trachtet. Das ist eine große Erleichterung (-wie schön, wenn doch ALLE Lektüre online wäre; was die derzeitige Episteme wohl noch länger zu unterbinden weiß)

Bei manchem Buch bekomme ich allerdings derart Fehlermeldung angezeigt: “… found 500 chapters downloading chapter 1/500 httphttp:http:http:http:http:http:http:http:http://springerlink.com/content/pln81m2474hxmpn6/fulltext.pdf -8192000%

ERROR: downloaded chapter http://springerlink.com/content/pln81m2474hxmpn6/fulltext.pdf has invalid mime type text/html - are you allowed to download Wörterbuch der Psychotherapie? ” Weiß wer einen Rat?

Es scheint, dass das Script Wed, 06/06/2012 - 14:56 — indianahorst (not verified)

Es scheint, dass das Script leider nicht mehr funktioniert… egal ob ich -l LINK oder -c ISBN verwende, ich erhalte immer folgende Fehlermeldung:

$ ./springer_download.py -c 978-3-8348-1937-6 File “./springer_download.py”, line 92 print “fetching book information…\n\t%s” % link ^ SyntaxError: invalid syntax

bzw:

$ ./springer_download.py -l “http://www.springerlink.com/content/978-3-8348-1937-6/#section=1062052&page=1” File “./springer_download.py”, line 92 print “fetching book information…\n\t%s” % link ^ SyntaxError: invalid syntax

Vielleicht etwas sehr spät, Tue, 08/14/2012 - 12:36 — Anonymous (not verified)

Vielleicht etwas sehr spät, aber benutzt du Arch Linux (oder andere Distri die ‘python’ standardmäßig mit Python 3 belegt)? Dann musst du das Skript explizit mit python2 aufrufen oder im Skript die erste Zeile anpassen: Ersetze python durch python2

Das Skripts funktioniert nach Mon, 07/09/2012 - 18:24 — Anonymous (not verified)

Das Skripts funktioniert nach wie vor einwandfrei. Aufruf ist z.b. $ ./springer_download.py -l http://www.springerlink.com/content/978-3-8348-1937-6/

Goodmorning guys, I really Sat, 03/10/2012 - 13:48 — Anonymous (not verified)

Goodmorning guys, I really wanted to download some books from springerlink and my library has the right for it. But as i saw here it is with phyton scripting and linux. Im am totally noob in scripting and im not fomilier with linux. I hope within a few days to have installed linux (UBUNTU) on my laptop. I have already been searched for needed software…. but i dont understand the” iconv” requirement… is this already availiable when you download php5?

I really can use some extra help. The best would be if someone give me steps to follow from installation phyton till downloading a book.

Hope to hear soon from you guys.

Kind regard,s Marius

Thank you for this great Tue, 02/21/2012 - 15:33 — ubuntifa (not verified)

Thank you for this great little piece of free software. Saved a lot of time!

Hi Milian thank you very Fri, 02/10/2012 - 11:22 — Mideag (not verified)

Hi Milian thank you very much, this script has saved me several clicks, it has been very helpful :P

Dear Milian, thank you for Fri, 01/13/2012 - 08:32 — Enno (not verified)

Dear Milian, thank you for this nice script which is very useful for me. If I’m outside of the university network I have to use a proxy to get the necessary IP to access springerlink.com. Is there a way to use your script with a proxy? Best wishes, E.

Hi, I have taken the liberty Sun, 11/20/2011 - 22:38 — liob (not verified)

Hi,

I have taken the liberty to write a script that allows you to download ebooks from the thieme ebook library as pdf. You can find it at http://github.com/liob/thieme2pdf

Great script, thanks. This Fri, 11/18/2011 - 13:49 — Tatome (not verified)

Great script, thanks. This saves a lot of fiddling with bash or clicking on download links on my part.

One thing, though: I monkey-patched the script so it doesn’t bail out if it can’t download a chapter. It would be a nice idea to have a switch for that; it could even insert pages into the merged document where chapters were left out.

Cheers, Johannes

Hi, after creating the Tue, 11/01/2011 - 13:01 — Anonymous (not verified)

Hi,

after creating the bookTitle, around line 121, you should check the length of it to avoid max. filename length problems. See http://www.tutorialspoint.com/python/os_fstatvfs.htm how to get the max. filename length. Would be nice ;-) By the way, good job with this program.

Greetings, Mo

I don’t know anything about Sat, 02/04/2012 - 20:06 — Anonymous (not verified)

I don’t know anything about python but yeah, that would be nice. Or does anyone have another solution when having filename length problems? After succesfully downloading the whole book, the script is unable to save the merged pdf-file (Grundkurs_Statistik_in_den_Sozialwissenschaften_-_Eine_leicht_verstaendliche,_anwendungsorientierte_Einfuehrung_in_das_sozialwissenschaftlich_notwendige_statistische_Wissen.pdf) because the file name ist too long.. any ideas?

Hi, ich bekomme folgenden Wed, 09/07/2011 - 16:53 — Hans (not verified)

Hi, ich bekomme folgenden Fehler nach ein paar Kapiteln (er läd immer 3-5 Kapitel, dann bricht es ab):

  1. found 24 chapters
  2. downloading chapter 1/24
  3. http://springerlink.com/content/978-3-540-40306-7/front-matter.pdf100%
  4. downloading chapter 2/24
  5. http://springerlink.com/content/w1j138p3140714v2/fulltext.pdf 100%
  6. downloading chapter 3/24
  7. http://springerlink.com/content/x066412311u22858/fulltext.pdf 100%
  8. downloading chapter 4/24
  9. http://springerlink.com/content/n1hp12738v607147/fulltext.pdf 100%
  10. downloading chapter 5/24
  11. Traceback (most recent call last):
  12. File "./milian/springer_download.py", line 293, in <module>
  13. main(sys.argv[1:])
  14. File "./milian/springer_download.py", line 183, in main
  15. localFile, mimeType = geturl(chapterLink, "%d.pdf" % i)
  16. File "./milian/springer_download.py", line 279, in geturl
  17. lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
  18. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 237, in retrieve
  19. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 205, in open
  20. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 342, in open_http
  21. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 937, in endheaders
  22. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 797, in _send_output
  23. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 759, in send
  24. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 740, in connect
  25. File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 571, in create_connection
  26. IOError: [Errno socket error] [Errno 60] Operation timed out

Hm maybe a new protection Thu, 09/08/2011 - 12:25 — Milian Wolff

Hm maybe a new protection from springerlink that does not allow to download many chapters in short time?

I don’t know… Try to add an import time to the other imports at the top of the downloader script. And then add before the return response in def geturl(url, dst) the following: time.sleep(1). Increase the sleep time if it does not work.

bye

indeed, it is, so shall i put Fri, 10/19/2012 - 12:48 — Anonymous (not verified)

indeed, it is, so shall i put it into the if-clause or the else-clause?? -because i put it in the else-clause and it worked astonishingly fine, ..so far.. but i rather like to slow it down now.. trust me.

btw, a qualified note: it is better to go not under ~25 sec each when saving some more of these in order to avoid the cobweb and getting unpleasant post. thanks to milian, thanks to the community, for sharing and caring, have a good life, with high regards and honesty, -a friend

‘kay, sorry mate for toiling Fri, 10/19/2012 - 13:21 — Anonymous (not verified)

‘kay, sorry mate for toiling and spamming: it was the if-clause. but why did it solve my problem as well when i put it into the else clause?.. just mumbling..

Thanx dude, it seems to work Mon, 09/12/2011 - 17:30 — Hans (not verified)

Thanx dude, it seems to work now. With time.sleep(1) it screened the same error. I increased the level to (10) and now i downloaded 2 books without any error warning.

hans

Hallo, danke für das Skript. Sat, 09/03/2011 - 15:24 — Volker (not verified)

Hallo, danke für das Skript. Generell funktioniert es sehr gut. Leider muss ich nach jedem Kapitel Benutzername und Passwort (VPN-Zugang) eingeben. Hat jemand das gleiche Problem? Kann man da Abhilfe schaffen?

Grüße Volker

Hello, another solution for Fri, 07/29/2011 - 01:36 — James Bond (not verified)

Hello,

another solution for downloading whole books on Springerlink and save it to a pdf file is the Springerlink-Downloader: http://sebastiankusch.de/springerlink/

Grzz Sebastian

If you don’t like python look Thu, 06/16/2011 - 10:33 — Brater (not verified)

If you don’t like python look here: http://code.google.com/p/springer-loader/ Easy to use!

For those having problems Thu, 05/19/2011 - 20:59 — Anonymous (not verified)

For those having problems running the script under Windows (7?), the problem is the findInPath def, which is called without the extension for iconv, pdftk and stapler (.exe). After adding this to all calls of findInPath the script works perfectly. I’m running it under Windows 7 with cygwin. Finding this problem has taken more than one hour … But was worth it. Maybe you could add an test for Windows and automatically add an “.exe” in this case.

Thanks for your work!

That is it! Thanks for your Tue, 05/15/2012 - 18:52 — Blubbafett (not verified)

That is it! Thanks for your comment!

And many thanks to this project

Thanks for your Mon, 05/02/2011 - 12:32 — Chris (not verified)

Thanks for your script!

Working fine with Cygwin / Windows 7 Don’t forget to install pdftk, iconv and imagemagick (for front cover convert) during setup. I also needed to run rebaseall (cygwin/bin/ash.exe -> bin/rebaseall) before python decided to work…

Hi Chris, could you please Fri, 05/13/2011 - 22:57 — Anonymous (not verified)

Hi Chris, could you please explain what you did in detail? Because I try to make the script work for quite a while now, but I never worked with cygwin or python before. Although I installed pdftk and iconv I always get the errors that I have to install them. And if I comment them out, I get another error. Thanks in advance!

Script is working fine here, Thu, 03/10/2011 - 02:41 — Anonymous (not verified)

Script is working fine here, thanks a lot!

thx for the script! it did Mon, 04/04/2011 - 02:13 — exitus_ (not verified)

thx for the script! it did not work in the first place for me (arch linux, python2.7) but after messing up with the code (i’m not familiar with python..) it turned out i just had to change the first line:

 #! /usr/bin/env python2.7

without this change i get this error:

  1.  
  2. ./springer_download.py --content=978-3-540-32319-8
  3. File "./springer_download.py", line 88
  4. print "fetching book information...\n\t%s" % link
  5. ^
  6. SyntaxError: invalid syntax

this maybe because python uses the newest environment (3.2) if there is no version given. i hope this helps for some other users as well!

i still get the same error Sun, 06/12/2011 - 01:25 — Anonymous (not verified)

i still get the same error even if i change the line, or what exactly did you do?

Hi, I got this problem $ Sun, 02/20/2011 - 19:18 — najmi (not verified)

Hi, I got this problem

  1. $ ./springer_download.py -c 978-0-387-09822-7
  2. fetching book information...
  3. http://springerlink.com/content/978-0-387-09822-7/contents/
  4.  
  5. Now Trying to download book 'Data Mining and Knowledge Discovery Handbook'
  6.  
  7. found 68 chapters
  8. downloading chapter 1/68
  9. Traceback (most recent call last):
  10. File "./springer_download.py", line 293, in <module>
  11. main(sys.argv[1:])
  12. File "./springer_download.py", line 183, in main
  13. localFile, mimeType = geturl(chapterLink, "%d.pdf" % i)
  14. File "./springer_download.py", line 279, in geturl
  15. lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
  16. File "/usr/lib/python2.6/urllib.py", line 239, in retrieve
  17. fp = self.open(url, data)
  18. File "/usr/lib/python2.6/urllib.py", line 207, in open
  19. return getattr(self, name)(url)
  20. File "/usr/lib/python2.6/urllib.py", line 355, in open_http
  21. 'got a bad status line', None)
  22. IOError: ('http protocol error', 0, 'got a bad status line', None)

What to do? Thanks!

Hi, super Script, läuft ohne Mon, 03/28/2011 - 00:36 — Anonymous (not verified)

Hi, super Script, läuft ohne jegliche Probleme - vielen Dank für die Entwicklung!

Hallo, leider scheint das Mon, 01/24/2011 - 15:30 — horst (not verified)

Hallo, leider scheint das Skript mit der neuen Springerlink-Seite nicht (mehr) zu funktionieren:

  1. ./springer_download.py --content=978-3-540-32319-8
  2. File "./springer_download.py", line 88
  3. print "fetching book information...\n\t%s" % link
  4. ^
  5. SyntaxError: invalid syntax

Beim Aufruf via ./springer_download.py -l http://springerlink.com/content/978-3-8348-0645-1/contents/ (z.B.) erhalte ich den gleichen Fehler.

iconv und pdftk sind installiert.

I have the same problem. I Wed, 03/30/2011 - 13:28 — Anonymous (not verified)

I have the same problem. I have python 2.7, I do downgrade to 2.6 version?

kann mir bitte jemand helfen Tue, 01/18/2011 - 22:27 — Manman (not verified)

kann mir bitte jemand helfen ich komme bei inconv nicht weiter ist installiert aber es funzt nicht.

I hav a problem whit iconv Tue, 01/18/2011 - 22:24 — Anonymous (not verified)

I hav a problem whit iconv can some one help me pls?

Inspiring story there. What Tue, 01/14/2014 - 18:06 — ขี้ไก่ (not verified)

Inspiring story there. What occurred after? Good luck!

Without an error description, Wed, 01/19/2011 - 12:08 — Milian Wolff

Without an error description, no one will be able to help you. sigh how can people still post such lame support requests, I don’t get it…

There are 10 kind of people. Tue, 02/15/2011 - 22:30 — Anonymous (not verified)

There are 10 kind of people. How can you still not get that?

Great stuff, thank you! Wed, 11/17/2010 - 22:21 — Chris (not verified)

Great stuff, thank you!

On Lubuntu 10.10 I got the Fri, 11/05/2010 - 17:07 — Fruchtpfote (not verified)

On Lubuntu 10.10 I got the following error: “/usr/bin/env: python2: No such file or directory”

I had to change the first line of the script from “#! /usr/bin/env python2” to “#! /usr/bin/env python2.6”

The script worked very well! Thank you!

Hi, i´m using yout great Tue, 10/12/2010 - 16:19 — Frank (Germany) (not verified)

Hi,

i´m using yout great script in windows 7 with cygwin and pdftk. Downloadling works, but at the end, i get an error message:

  1. merging chapters
  2. Error: Failed to open output file: ...pdf
  3. No output created
  4. book XXX was succesfully download, it was saved to /springerlink/.../...pdf
  5. Traceback (most recent call last):
  6. File "./springer_download.py", line 202, in main log ("download %s chapters (%2fMiB) of %s\n" % (len(chapters), os.path.getsize(bookTitlePath)/2.0**20, bookTitle))
  7. File "/usr/lib/python2.6/genericpath.py", line 49, in getsize return os.stat(filename) .st_size
  8. 0SError: [Errno2] No such file or directory: '/springerlink/...pdf

-> “XXX” means in this case the title of the book and “…” the root directory of the springer_download script.

Can anyone help?

Hi! Can you help me? Waht Sat, 10/09/2010 - 16:11 — Katzenstreu (not verified)

Hi!

Can you help me? Waht seems to be the problem?

  1. tim@tim-ubuntu:~/SpringerLink/Thermodynamik$ /home/tim/SpringerLink/springer_download-Kapitel-Fix.py -l http://springerlink.com/content/978-3-8348-0645-1/
  2. fetching book information...
  3. http://springerlink.com/content/978-3-8348-0645-1/contents/
  4.  
  5. Now Trying to download book 'Keine Panik vor Thermodynamik! - Erfolg und Spaß im klassischen „Dickbrettbohrerfach“ des Ingenieurstudiums'
  6.  
  7. found 16 chapters
  8. downloading chapter 1/16
  9. http://springerlink.com/content/978-3-8348-0645-1/front-matter.pdf100%
  10. downloading chapter 2/16
  11. http://springerlink.com/content/v355572k12871586/fulltext.pdf 100%
  12. downloading chapter 3/16
  13. http://springerlink.com/content/m8g5686153341327/fulltext.pdf 100%
  14. downloading chapter 4/16
  15. http://springerlink.com/content/u7t6u3202480077h/fulltext.pdf 100%
  16. downloading chapter 5/16
  17. http://springerlink.com/content/xj705040025704h6/fulltext.pdf 100%
  18. downloading chapter 6/16
  19. http://springerlink.com/content/h116366326741351/fulltext.pdf 100%
  20. downloading chapter 7/16
  21. http://springerlink.com/content/lv70414p3w6wj847/fulltext.pdf 100%
  22. downloading chapter 8/16
  23. http://springerlink.com/content/wh26051x3t251244/fulltext.pdf 100%
  24. downloading chapter 9/16
  25. http://springerlink.com/content/n26612058123u700/fulltext.pdf 100%
  26. downloading chapter 10/16
  27. http://springerlink.com/content/r0m7551412594p30/fulltext.pdf 100%
  28. downloading chapter 11/16
  29. http://springerlink.com/content/j780016l201n0232/fulltext.pdf 100%
  30. downloading chapter 12/16
  31. http://springerlink.com/content/mx82w653w8236kkn/fulltext.pdf 100%
  32. downloading chapter 13/16
  33. http://springerlink.com/content/x74781351u2v2413/fulltext.pdf 100%
  34. downloading chapter 14/16
  35. http://springerlink.com/content/n480mn7g71k27276/fulltext.pdf 100%
  36. downloading chapter 15/16
  37. http://springerlink.com/content/w481235760232v53/fulltext.pdf 100%
  38. downloading chapter 16/16
  39. http://springerlink.com/content/978-3-8348-0645-1/back-matter.pdf 100%
  40. downloading front cover from http://springerlink.com/content/uu5602/cover-large.gif
  41. http://springerlink.com/content/uu5602/cover-large.gif 100%
  42. merging chapters
  43. Error: Failed to open output file:
  44. /home/tim/SpringerLink/Thermodynamik/Keine_Panik_vor_Thermodynamik!_-_Erfolg_und_Spass_im_klassischen_,,Dickbrettbohrerfach"_des_Ingenieurstudiums.pdf
  45. No output created.
  46. book Keine Panik vor Thermodynamik! - Erfolg und Spaß im klassischen „Dickbrettbohrerfach“ des Ingenieurstudiums was successfully downloaded, it was saved to /home/tim/SpringerLink/Thermodynamik/Keine_Panik_vor_Thermodynamik!_-_Erfolg_und_Spass_im_klassischen_,,Dickbrettbohrerfach"_des_Ingenieurstudiums.pdf
  47. Traceback (most recent call last):
  48. File "/home/tim/SpringerLink/springer_download-Kapitel-Fix.py", line 279, in <module>
  49. main(sys.argv[1:])
  50. File "/home/tim/SpringerLink/springer_download-Kapitel-Fix.py", line 202, in main
  51. log("downloaded %s chapters (%.2fMiB) of %s\n" % (len(chapters), os.path.getsize(bookTitlePath)/2.0**20, bookTitle))
  52. File "/usr/lib/python2.6/genericpath.py", line 49, in getsize
  53. return os.stat(filename).st_size
  54. OSError: [Errno 71] Protocol error: '/home/tim/SpringerLink/Thermodynamik/Keine_Panik_vor_Thermodynamik!_-_Erfolg_und_Spass_im_klassischen_,,Dickbrettbohrerfach"_des_Ingenieurstudiums.pdf'

Dieser Fehler trat bei Sat, 10/09/2010 - 17:24 — Katzenstreu (not verified)

Dieser Fehler trat bei ähnlichen Büchern nicht auf. Gib es Probleme die Datei anzulegen, wegen bestimmter Sonderzeichen? Der Download funktioniert. Das mergen wohl auch?! Aber das öffnen der Datei nicht, Zeile 43. Der gleiche Fehler beim Download mit der Option “-c” anstelle von “-l”. ich habe die zwei Tipps von Jochen (To fix the wrong order) in “mein” Skript eingebaut.

Grüße und danke vielmals für die tolle Arbeit!

Tim

Hi, I spot the problem with Thu, 09/02/2010 - 00:11 — Anonymous (not verified)

Hi,

I spot the problem with double-downloading of back-matter.pdf. As a result, 105-152 pp. are inserted right after the TOC, and also at the end of the document (after p.104, where it should be).

Example.

$ ./springer_download.py -c 978-1-84800-912-7 fetching book information… http://springerlink.com/content/978-1-84800-912-7/contents/

Now Trying to download book ‘A Topological Aperitif’

found 9 chapters downloading chapter 1/9 http://springerlink.com/content/978-1-84800-912-7/front-matter.pdf100% downloading chapter 2/9 http://springerlink.com/content/978-1-84800-912-7/back-matter.pdf 100% downloading chapter 3/9 http://springerlink.com/content/g1782312115308j1/fulltext.pdf 100% downloading chapter 4/9 http://springerlink.com/content/x37186361m57ju82/fulltext.pdf 100% downloading chapter 5/9 http://springerlink.com/content/h293v73635134064/fulltext.pdf 100% downloading chapter 6/9 http://springerlink.com/content/r7qk61576h587080/fulltext.pdf 100% downloading chapter 7/9 http://springerlink.com/content/n228263341064731/fulltext.pdf 100% downloading chapter 8/9 http://springerlink.com/content/g7763k24j248262q/fulltext.pdf 100% downloading chapter 9/9 http://springerlink.com/content/978-1-84800-912-7/back-matter.pdf 100% merging chapters book A Topological Aperitif was successfully downloaded, it was saved to /home/vitaliyb/springer/A_Topological_Aperitif.pdf

Best regards, Vitaly

Awesome script! Exactly what Thu, 08/26/2010 - 10:42 — kynan (not verified)

Awesome script! Exactly what I had been looking for for quite some time. I even started coding my own script in python, but only got it to sort-of work. Great job!

Hi Ein sehr schönes Wed, 08/25/2010 - 11:28 — Anonymous (not verified)

Hi Ein sehr schönes skript. Ist es möglich das der Dateiname der Ausgabedatei auch den Autor enthält?

Das Design von Wed, 08/11/2010 - 09:28 — Timba (not verified)

Das Design von springerlink.com wurde umgestellt. Das Skript bricht sofort ab, da der Buchtitel nicht gefunden wird. Ein Buch wird nun über die isbn auffindbar —> http://www.springerlink.com/content/[isbn]/contents/

Die Funktionalität des Downloads einer Liste von Büchern ist deaktiviert (schade)..

should be fixed now. Thu, 08/12/2010 - 22:36 — Milian Wolff

should be fixed now.

Almost fixed. You should Fri, 08/13/2010 - 16:22 — Anonymous (not verified)

Almost fixed. You should change line 89 to: if match and match.group(2) and match.group(2).strip() != “”:

Cheers!

why? Sun, 08/15/2010 - 17:02 — Milian Wolff

why?

I just tried your great Tue, 08/17/2010 - 16:09 — Seb (not verified)

I just tried your great skript. Without the change suggested on the 13th about 30-40% of the tried books failed to download.

well, then I committed it - Wed, 08/18/2010 - 14:50 — Milian Wolff

well, then I committed it - hope it helps.

thanks for the patch

I tried the latest version of Fri, 08/20/2010 - 09:39 — Anonymous (not verified)

I tried the latest version of the script. But i determine that the order the chapters are downloaded and merged is not correct. At the moment the order is:

  1. Front-Matter
  2. Back-Matter
  3. chapter 1
  4. chapter 2
  5. ...
  6. Back-Matter

Why there is a Back-Matter just behind the Front-Matter? Could you please fix it?

To fix the wrong order Sun, 08/29/2010 - 11:59 — jochen (not verified)

To fix the wrong order problem:

change the code from:

        # get chapters
        for match in re.finditer('href="([^"]+\.pdf)"', page):
            chapterLink = match.group(1)
to:
        # get chapters
        for match in re.finditer('class="sprite pdf-resource-sprite" href="([^"]+\.pdf)"', page):
            chapterLink = match.group(1)

I am not a pro coder, but it should fix this.

To fix cover download change Sun, 08/29/2010 - 13:03 — jochen (not verified)

To fix cover download change the following:

  1. # coverimage
  2. match = re.search(r'<div class="coverImage" style="background-image: url\(/content/([^/]+)/cover-medium\.gif\)">', page)
  3. if match:
  4. coverLink = "http://springerlink.com/contents/" + match.group(1) + "/cover-large.gif"
To:
  1. # coverimage
  2. match = re.search(r'<div class="coverImage" title="Cover Image" style="background-image: url\(/content/([^/]+)/cover-medium\.gif\)">', page)
  3. if match:
  4. coverLink = "http://springerlink.com/content/" + match.group(1) + "/cover-large.gif"

Hi, thanks for this nice Sun, 08/22/2010 - 10:03 — Anonymous (not verified)

Hi,

thanks for this nice script. It worked for me till the new SpringerLink Webdesign.

With the latest script, i cant download e. g. this book: http://springerlink.com/content/x421j52q667r0077/

using this: /home/malte/Downloads/springer_download.py -l http://www.springerlink.com/content/x421lj52q667r0077/?sortorder=asc

And it doesnt word for any other book i tried. I dont have a version of your script before 13. august, so i even cant try to get 30 to 40 percent..

Malte

Try using the ISBN instead, Sun, 08/22/2010 - 14:19 — Seb (not verified)

Try using the ISBN instead, in your case ‘springer_download.py -c 978-3-540-24309-0’

Hi, thanks for your fast Sun, 08/22/2010 - 18:38 — Malte (not verified)

Hi,

thanks for your fast reply. But it doenst work for me.

root@malte-desktop:/home/malte# /home/malte/Desktop/springer_download.py -c 978-3-540-24309-0 fetching book information… http://springerlink.com/content/978-3-540-24309-0/

ERROR: Could not evaluate book title - bad link http://springerlink.com/content/978-3-540-24309-0/

Im using a VMWare Ubuntu system, and with old verison it worked very well. I can open the pdf file using my browser, so there is no llicense problem?

Malte

For me it works with exactly Wed, 08/25/2010 - 11:07 — Seb (not verified)

For me it works with exactly the commandline you gave: Now Trying to download book ‘Elektronik für Ingenieure und Naturwissenschaftler’ I am not allowed to download the content, but it started downloading front- and backmatter. Do you use the newest version of the script?

This script is a great idea Tue, 06/08/2010 - 15:49 — Anonymous (not verified)

This script is a great idea and overall works very well!

I encountered one problem though that I was hoping to get some help with. I compiled a list of links to books I want to download and fed them into springer_download.py via a bash script so basically I could download my whole list one after the other, after a while I kept getting the same error asking if I had access to it no matter what book and no matter the access level (this is even after I included a minute inbetween books in the bash script by adding ‘sleep 60’) are you aware of what might be the cause of this issue?

Thanks!

danke für das Skript, Mon, 06/07/2010 - 21:58 — Dierk E. (not verified)

danke für das Skript, funktioniert prima und spart viel Zeit!

Habe einen Fehler gefunden: Mon, 06/07/2010 - 22:17 — Dierk E. (not verified)

Habe einen Fehler gefunden: wenn der generierte Dateiname Zeichen enthält, die auf dem verwendeten Dateisystem nicht erlaubt sind, wird eine Exception geworfen: enthält der Buchtitel und damit auch der Dateiname bspw. ein “ß”, so macht das Skript daraus ein Fragezeichen. Fragezeichen sind in Dateinamen auf FAT32-Dateisystemen aber nicht erlaubt. Vermutlich resultiert das Fragezeichen aus einer fehlerhaften Zeichendekodierung.

…außerdem versucht das Thu, 06/10/2010 - 13:20 — Dierk E. (not verified)

…außerdem versucht das Skript, Dateinamen mit enthaltenem Doppelpunkt anzulegen, was auf FAT32-Dateisystemen ebenfalls zu einem Fehler führt. Die nicht erlaubten Zeiche müssten vor Zeile 179 noch auch bookTitlePath entfernt werden.

In my browser I can open the Thu, 05/20/2010 - 01:14 — Anonymous (not verified)

In my browser I can open the links to the chapters. Seemingly, the geturl function to download the chapters has no useragent set or something.

Bei dem Buch Thu, 05/20/2010 - 00:08 — Anonymous (not verified)

Bei dem Buch hier: http://www.springerlink.com/content/t1166x/?p=1b42507155aa4d7387e1980dd0…

funktioniert es auch nicht mehr.

ERROR: downloaded chapter http://springerlink.com/content/q1n2783365424175/fulltext.pdf has invalid mime type text/html - are you allowed to download Funktionalanalysis Sechste, korrigierte Auflage 2007 - Springer Berlin Heidelberg?

bin aber eingeloggt und habe vpn, so dass ich das Buch manuell im browser anschauen und herunterladen kann

My apologies for knowing Thu, 04/29/2010 - 07:56 — Tony (not verified)

My apologies for knowing nothing about Python, but is there an easy way to remove the pdftk part of the script and output all the individual pdfs? Combining the chapters is running into problems with books like this one:

http://springerlink.com/content/v12557/

It has the front-matter.pdf and back-matter.pdf files on every page which leads to a lot of repetition in the output. It would also be handy for those running under Mac OS X as it was a pain to build pdftk from source.

there is an pre build version Sun, 06/06/2010 - 12:29 — Anonymous (not verified)

there is an pre build version of pdftk avaible at http://www.accesspdf.com/pdftk/. Works like a charm for me (10.6.x) altough it’s built for 10.3 :D just some dylibs …

many thanks to milan for the great script! it comes really really handy some times!

Well, python isn’t that hard Sat, 06/05/2010 - 02:23 — Anonymous (not verified)

Well, python isn’t that hard to read. Just search for the term “pdftk” in the source-/script-file and comment it out (putting a hash-sign infront of the line), then run the script and see if an error comes up. If so, it will direct you to the line where it occurs, so you can investigate further. It’s really simple, try it out :)

Anscheinen haben sie ihre Wed, 04/21/2010 - 20:52 — Anonymous (not verified)

Anscheinen haben sie ihre Oberfläche geändert. Beispiel:

http://www.springerlink.com/content/l1x446/?p=79dea83afbc4473f8787e407e5…

Hier findet das Skript nur noch die vor und Rückseite. Die weiteren Kapitel sind in Unterkapitel eingeteilt. evtl. könnte ein findiger Programmiere das Skript anpassen?

Wäre ne super Sache!! Besten dank!

Möchtest du das gesamte Wed, 04/21/2010 - 15:13 — Anonymous (not verified)

Möchtest du das gesamte Angebot herunterladen? Das gestattet dir weder Springer noch deine Universität, es steht ja bereits das vollständige Herunterladen eines einzigen Buches in Konflikt mit den Nutzungsvereinbarungen.

Wo steht das denn, dass man Thu, 04/22/2010 - 11:44 — Milian Wolff

Wo steht das denn, dass man nicht mal ein vollständiges Buch herunterladen darf? Und nein: Ich möchte natürlich nicht das gesamte Angebot herunterladen und habe das auch immer denjenigen gesagt, die nach etwas derartigem gefragt haben. Das ist völliger Schwachsinn.

Aber ich sehe nicht den Schaden, wenn man sich ein komplettes Buch herunterlädt? Ich muss zugeben, dass ich das auch ohne das Skript tun würde, um an die Bücher und Kapitel zu gelangen, die für eine Vorlesung erfordert werden.

Hallo Milian! Großartiges Mon, 04/19/2010 - 08:22 — Angelo (not verified)

Hallo Milian! Großartiges Skript! Läuft sehr sehr geil. Gibt es Ansätze bzw. Möglichkeiten das Skript derart zu erweitern, dass alle Bücher mit einer Befehlszeile runtergeladen werden können?

Viele Grüße aus Dortmund unter anderem von Jan S. :) Angelo

Hallo, bei mir bleibt das Tue, 04/13/2010 - 17:38 — Anonymous (not verified)

Hallo, bei mir bleibt das Skript unter Ubuntu 8.04 beim downloaden immer irgendwo hängen (wie auch von Christian im Beitrag vom 02/22/2010).

Hab es jetzt noch mehrmals Tue, 04/13/2010 - 20:04 — Anonymous (not verified)

Hab es jetzt noch mehrmals probiert. Einmal hat es das Skript geschafft durchzulaufen ohne hängenzubleiben. War aber nur einmal wo es funktioniert hat bei etwa 20 Versuchen, dass selbe Buch zu downloaden. Ansonsten ist er immer hängen geblieben. Immer an unterschiedlichen Artikeln.

Ja, es funktioniert plötzlich Thu, 04/08/2010 - 12:09 — Anonymous (not verified)

Ja, es funktioniert plötzlich nicht mehr. Könnte der Autor das Script an die Veränderungen anpassen? Würde mich sehr freuen!

Sie haben so eine Art Thu, 03/25/2010 - 14:56 — Anonymous 5 (not verified)

Sie haben so eine Art “Zwischenschicht” bei manchen Büchern. Das Script sucht scheinbar nur auf der initialen Buchseite nach *.pdf Links und lädt sie runter. Springerlink hat jetzt bei manchen Büchern jedes Kapitel noch in eine Art extra Ordner gepackt, wo jeweils nur noch mal die Frontmatter und das Kapitel drin ist (die nennen das dort Teil …). Das wird vom Script bisher nicht berücksichtigt.

Hallo, kann es sein, dass Mon, 03/15/2010 - 23:27 — Anonymous 2 (not verified)

Hallo,

kann es sein, dass auf der springerlink seite was geändert wurde? Hat bisher gut funktioniert. Scheint aber seit neuestem ein Problem zu haben.

hi, bei mir geht das tool Sat, 03/13/2010 - 17:27 — Anonymous (not verified)

hi,

bei mir geht das tool nicht… glaub man muss sich jetzt einloggen. hat jemand eine alternerive? wär super

Hallo, danke für das Skript, Fri, 03/12/2010 - 05:43 — Matthias (not verified)

Hallo, danke für das Skript, konnte es bisher mangels VPN nicht testen, aufgrund der positiven Resonanz gehe ich aber davon aus, dass es bestens funktioniert. Auch ich wäre an einer Version interessiert, die Bookmarks erstellt und dazu die jeweils passende Beschreibung von Springerlink mit einbindet. Ein Programm zum erstellen von Bookmarks mit Linux ist JPdfBookmarks, Anleitung dazu gibts hier: http://jpdfbookmarks.altervista.org und das Programm selbst unter http://flavianopetrocchi.blogspot.com/2008/07/jpsdbookmarks-download-pag… Wäre super, wenn du das noch einbauen könntest, ich selber komme derzeit leider nicht dazu, habe hier einen eingeschränkten Internetzugang und kann das Programm nicht downloaden. Außerdem geht mein VPN nicht…

Grüße, Matthias

I get an error as well Mon, 03/01/2010 - 14:26 — Seb (not verified)

I get an error as well “WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden” what I can do?

The book you are trying to download is called ‘Word 2007’

found 14 chapters downloading chapter 1/14 http://www.springerlink.com/content/m5427g/front-matter.pdf 100% downloading chapter 2/14 http://springerlink.com/content/u838718169040815/fulltext.pdf 100% downloading chapter 3/14 http://springerlink.com/content/t347382q85766802/fulltext.pdf 100% downloading chapter 4/14 http://springerlink.com/content/rm4183k750h28k27/fulltext.pdf 100% downloading chapter 5/14 http://springerlink.com/content/q25m417372705567/fulltext.pdf 100% downloading chapter 6/14 http://springerlink.com/content/t7303462187j3t17/fulltext.pdf 100% downloading chapter 7/14 http://springerlink.com/content/n191l36489484284/fulltext.pdf 100% downloading chapter 8/14 http://springerlink.com/content/wp7t1657u28p4774/fulltext.pdf 100% downloading chapter 9/14 http://springerlink.com/content/l3126077869171g2/fulltext.pdf 100% downloading chapter 10/14 http://springerlink.com/content/j6399r4330572128/fulltext.pdf 100% downloading chapter 11/14 http://springerlink.com/content/p2262884852pl1w4/fulltext.pdf 100% downloading chapter 12/14 http://springerlink.com/content/u647377241368kl7/fulltext.pdf 100% downloading chapter 13/14 http://springerlink.com/content/t5h400553644360l/fulltext.pdf 100% downloading chapter 14/14 http://www.springerlink.com/content/m5427g/back-matter.pdf 100% merging chapters Traceback (most recent call last): File “C:\Dokumente und Einstellungen\Sebastian\Desktop\sp\springer_download.py”, line 238, in <module> main(sys.argv[1:]) File “C:\Dokumente und Einstellungen\Sebastian\Desktop\sp\springer_download.py”, line 147, in main p1 = subprocess.Popen([“echo”, bookTitle], stdout=subprocess.PIPE) File “D:\Python26\lib\subprocess.py”, line 621, in init errread, errwrite) File “D:\Python26\lib\subprocess.py”, line 830, in _execute_child startupinfo) WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden

I don’t know, I won’t support Mon, 03/01/2010 - 20:07 — Milian Wolff

I don’t know, I won’t support Windows. Try cygwin as the poster above you said that it works.

Ich wollte nur mal Wed, 02/24/2010 - 20:31 — Anonymous (not verified)

Ich wollte nur mal rückmelden, dass das Skript mit Cygwin unter Windows 7 hervorragend und ohne Probleme arbeitet. Bei der Installation von Cygwin muss man natürlich darauf achten, die entsprechenden Pakete auszuwählen. Ein großes Danke an den Autor für die Arbeit!

Kannst du sagen, welche Fri, 10/01/2010 - 19:29 — Katzenstreu (not verified)

Kannst du sagen, welche Pakete benötigt werden? Im Internet fand ich leider keine Hilfe.

Nachdem ich es ja unter Mon, 02/22/2010 - 17:26 — Christian (not verified)

Nachdem ich es ja unter Windows teilweise zum laufen bekommen habe sind das meine ersten gehversuche mit Linux, aber trotzdem lädt er es nicht runter…. (VPN ist aktiviert)

vll kann mir ja jemand einen Tipp geben.

  1. ubuntu@ubuntu-desktop:~/Desktop$ ./springer_download.py -l http://springerlink.com/content/h61v67/
  2. Please wait, link source is being downloaded...
  3. http://springerlink.com/content/h61v67/
  4.  
  5. The book you are trying to download is called 'Dubbel'
  6.  
  7. Please wait, link source is being downloaded...
  8. http://springerlink.com/content/h61v67/?sortorder=asc&p_o=10
  9. Please wait, link source is being downloaded...
  10. http://springerlink.com/content/h61v67/?sortorder=asc&p_o=20
  11. found 27 chapters
  12. downloading chapter 1/27
  13. http://springerlink.com/content/h61v67/front-matter.pdf 95%

^^ hier an der stelle hängt er

Hallo, bin alter Windows Fri, 02/05/2010 - 20:54 — RS(15,11) (not verified)

Hallo, bin alter Windows Benutzer und hab mich seit gestern auch wegen deinem Skript in Unix eingearbeitet. Benutze nun Cygwin und mit der neuesten Version des Skriptes läuft alles super. Vielen Dank!!

Hi, thank you for this Wed, 02/03/2010 - 01:50 — Vitaly (not verified)

Hi,

thank you for this script.

Since about a week ago, it stopped working though:

………………………………….

$ ./springer_download.py -l http://www.springerlink.com/content/qv89j2/?p=101a335b740a47c7a7578b7d16…

$ Please wait, link source is being downloaded… http://www.springerlink.com/content/qv89j2/

ERROR: Could not evaluate book title - bad link?

Usage: springer_download.py [OPTIONS]

Options: -h, —help Display this usage message -l LINK, —link=LINK defines the link of the book you intend to download -c HASH, —content=HASH builds the link from a given HASH (see below)

………………………………….

This error appears for whatever book I try to download. Is it because they changed directory structure or something else @ Springer?

Thank you, Vitaly

Thanks for the heads up, I Wed, 02/03/2010 - 16:01 — Milian Wolff

Thanks for the heads up, I fixed the code to circumvent this springerlink “protection” (it didn’t accept the default UserAgent that was sent by python…). Should work properly now (assuming you have the rights to access this book, which I / the FU-Berlin) hast not it seems.

By the way, I spotted another Fri, 02/12/2010 - 08:47 — Vitaly (not verified)

By the way, I spotted another glitch: if the book name contains the colon sign (‘:’), the book is downloaded OK but cannot be saved, as file name cannot include colons. You could substitute it with dash or something…

Can you give me an example? I Fri, 02/12/2010 - 13:59 — Milian Wolff

Can you give me an example? I don’t see why a colon should be removed from a filename, it’s perfectly valid imo. At least on Unix:

  1. $> touch "asdf:foobar"
  2. $> ls
  3. asdf:foobar
  4. $> rm asdf\:foobar
  5. rm: remove regular empty file asdf:foobar? y

Thanks a lot for prompt Fri, 02/12/2010 - 08:38 — Vitaly (not verified)

Thanks a lot for prompt response, Milian! It works great now.

Hi, erstmal danke für das Thu, 02/04/2010 - 16:03 — Thomas (not verified)

Hi, erstmal danke für das Script, ich benutze es schon ziemlich lange…Ich habe jetzt allerdings auch Probleme beim downloaden. Ich bekomme folgende Fehlermeldung:
Please wait, link source is being downloaded…
http://www.springerlink.de/content/q28652/

The book you are trying to download is called ‘Regelungstechnik 1’

found 15 chapters
downloading chapter 1/15
http://www.springerlink.de/content/q28652/front-matter.pdf -819200%

ERROR: downloaded chapter http://www.springerlink.de/content/q28652/front-matter.pdf has invalid mime type text/html - are you allowed to download it?

“Per Hand” kann ich die pdfs der einzelnen Kapitel allerdings problemlos herunterladen.

Hm dann stimmt wohl noch was Fri, 02/05/2010 - 00:01 — Milian Wolff

Hm dann stimmt wohl noch was nicht - muss ich mir mal anschauen. Evtl. wird noch der Referer gecheckt oder sowas - mal gucken was die Leute von SpringerLink sich da ausdenken um es uns Studenten zu erschweren an die Bücher zu kommen… seufz

Ich habe mal was probiert, Fri, 02/05/2010 - 13:35 — Thomas (not verified)

Ich habe mal was probiert, scheint sogar geklappt zu haben :)

  1. #! /usr/bin/env python
  2.  
  3. # -*- coding: utf-8 -*-
  4.  
  5. import os
  6. import sys
  7. import getopt
  8. import urllib
  9. import re
  10. import tempfile
  11. import shutil
  12. import subprocess
  13.  
  14. class SpringerDownloader(urllib.URLopener):
  15. version = "Mozilla"
  16.  
  17. # Set some kind of User-Agent so we don't get blocked by SpringerLink
  18. class SpringerURLopener(urllib.FancyURLopener):
  19. version = "Mozilla"
  20.  
  21.  
  22. # validate CLI arguments and start downloading
  23. def main(argv):
  24. if not findInPath("pdftk"):
  25. error("You have to install pdftk.")
  26. if not findInPath("iconv"):
  27. error("You have to install iconv.")
  28.  
  29. try:
  30. opts, args = getopt.getopt(argv, "hl:c:", ["help", "link=","content="])
  31. except getopt.GetoptError:
  32. error()
  33.  
  34. link = ""
  35.  
  36. for opt, arg in opts:
  37. if opt in ("-h", "--help"):
  38. usage()
  39. sys.exit()
  40. elif opt in ("-c", "--content"):
  41. if link != "":
  42. error("-c and -l arguments are mutually exclusive")
  43.  
  44. link = "http://springerlink.com/content/" + arg
  45. elif opt in ("-l", "--link"):
  46. if link != "":
  47. error("-c and -l arguments are mutually exclusive")
  48.  
  49. link = arg
  50.  
  51. if link == "":
  52. error("You have to define a link.")
  53. if not re.match("https?://(www\.)?springerlink.(com|de)/content/[a-z0-9\-]+/?(\?[^/]*)?$", link):
  54. error("Bad link given. See LINK below.")
  55.  
  56. # remove all arguments from link
  57. link = re.sub(r"/?\?[^/]*$", "/", link)
  58.  
  59. #make sure the link ends on a slash
  60. if link[-1] != "/":
  61. link += "/"
  62.  
  63. baseLink = link
  64.  
  65. chapters = list()
  66. hasFrontMatter = False
  67. hasBackMatter = False
  68.  
  69. loader = SpringerURLopener();
  70.  
  71.  
  72. bookTitle = ""
  73.  
  74. while True:
  75. # download page source
  76. try:
  77. print "Please wait, link source is being downloaded...\n\t%s" % link
  78. page = loader.open(link).read()
  79. except IOError, e:
  80. error("Bad link given (%s)" % e)
  81.  
  82. if re.search(r'403 Forbidden', page):
  83. error("Could not access page: 403 Forbidden error.")
  84.  
  85. if bookTitle == "":
  86. match = re.search(r'<h2 class="MPReader_Profiles_SpringerLink_Content_PrimitiveHeadingControlName">([^<]+)</h2>', page)
  87. if not match or match.group(1).strip() == "":
  88. error("Could not evaluate book title - bad link?")
  89. else:
  90. bookTitle = match.group(1).strip()
  91. print "\nThe book you are trying to download is called '%s'\n" % bookTitle
  92.  
  93.  
  94. # get chapters
  95. for match in re.finditer('href="([^"]+.pdf)"', page):
  96. chapterLink = match.group(1)
  97. if chapterLink == "back-matter.pdf":
  98. hasBackMatter = True
  99. continue
  100. if chapterLink == "front-matter.pdf":
  101. hasFrontMatter = True
  102. continue
  103. if chapterLink[:7] == "http://":
  104. continue
  105. chapters.append(chapterLink)
  106.  
  107. # get next page
  108. match = re.search(r'<a href="([^"]+)">Next</a>', page)
  109. if match:
  110. link = "http://springerlink.com" + match.group(1).replace("&amp;", "&")
  111. else:
  112. break
  113.  
  114. if hasFrontMatter:
  115. chapters.insert(0, "front-matter.pdf")
  116.  
  117. if hasBackMatter:
  118. chapters.append("back-matter.pdf")
  119.  
  120. if len(chapters) == 0:
  121. error("No chapters found - bad link?")
  122.  
  123. print "found %d chapters" % len(chapters)
  124.  
  125. # setup
  126. curDir = os.getcwd()
  127. tempDir = tempfile.mkdtemp()
  128. os.chdir(tempDir)
  129.  
  130. i = 1
  131. fileList = list()
  132.  
  133. for chapterLink in chapters:
  134. if chapterLink[0] == "/":
  135. chapterLink = "http://springerlink.com" + chapterLink
  136. else:
  137. chapterLink = baseLink + chapterLink
  138.  
  139. print "downloading chapter %d/%d" % (i, len(chapters))
  140. localFile, mimeType = geturl(chapterLink, "%d.pdf" % i)
  141.  
  142. if mimeType.gettype() != "application/pdf":
  143. os.chdir(curDir)
  144. shutil.rmtree(tempDir)
  145. error("downloaded chapter %s has invalid mime type %s - are you allowed to download it?" % (chapterLink, mimeType.gettype()))
  146.  
  147. fileList.append(localFile)
  148. i += 1
  149.  
  150. print "merging chapters"
  151.  
  152. p1 = subprocess.Popen(["echo", bookTitle], stdout=subprocess.PIPE)
  153. p2 = subprocess.Popen(["iconv", "-f", "UTF-8", "-t" ,"ASCII//TRANSLIT"], stdin=p1.stdout, stdout=subprocess.PIPE)
  154. bookTitlePath = p2.communicate()[0]
  155. bookTitlePath = bookTitlePath.strip()
  156. if bookTitlePath == "":
  157. os.chdir(curDir)
  158. shutil.rmtree(tempDir)
  159. error("could not transliterate book title %s" % bookTitle)
  160.  
  161. bookTitlePath = bookTitlePath.replace("/", "-")
  162. bookTitlePath = re.sub("\s+", "_", bookTitlePath)
  163.  
  164. bookTitlePath = curDir + "/%s.pdf" % bookTitlePath
  165.  
  166. if len(fileList) == 1:
  167. shutil.move(fileList[0], bookTitlePath)
  168. else:
  169. os.system("pdftk %s cat output '%s'" % (" ".join(fileList), bookTitlePath))
  170.  
  171. # cleanup
  172. os.chdir(curDir)
  173. shutil.rmtree(tempDir)
  174.  
  175. print "book %s was successfully downloaded, it was saved to %s" % (bookTitle, bookTitlePath)
  176.  
  177. sys.exit()
  178.  
  179. # give a usage message
  180. def usage():
  181. print """Usage:
  182. %s [OPTIONS]
  183.  
  184. Options:
  185. -h, --help Display this usage message
  186. -l LINK, --link=LINK defines the link of the book you intend to download
  187. -c HASH, --content=HASH builds the link from a given HASH (see below)
  188.  
  189. You have to set exactly one of these options.
  190.  
  191. LINK:
  192. The link to your the detail page of the ebook of your choice on SpringerLink.
  193. It lists book metadata and has a possibly paginated list of the chapters of the book.
  194. It has the form:
  195. http://springerlink.com/content/HASH/STUFF
  196. Where: HASH is a string consisting of lower-case, latin chars and numbers.
  197. It alone identifies the book you intent do download.
  198. STUFF is optional and looks like ?p=...&p_o=... or similar. Will be stripped.
  199. """ % os.path.basename(sys.argv[0])
  200.  
  201. # raise an error and quit
  202. def error(msg=""):
  203. if msg != "":
  204. print "\nERROR: %s\n" % msg
  205. usage()
  206. sys.exit(2)
  207.  
  208. return None
  209.  
  210. # based on http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python
  211. def findInPath(prog):
  212. for path in os.environ["PATH"].split(os.pathsep):
  213. exe_file = os.path.join(path, prog)
  214. if os.path.exists(exe_file) and os.access(exe_file, os.X_OK):
  215. return True
  216. return False
  217.  
  218. # based on http://mail.python.org/pipermail/python-list/2005-April/319818.html
  219. def _reporthook(numblocks, blocksize, filesize, url=None):
  220. #XXX Should handle possible filesize=-1.
  221. try:
  222. percent = min((numblocks*blocksize*100)/filesize, 100)
  223. except:
  224. percent = 100
  225. if numblocks != 0:
  226. sys.stdout.write("\b"*70)
  227. sys.stdout.write("%-66s%3d%%" % (url, percent))
  228.  
  229. def geturl(url, dst):
  230. downloader = SpringerDownloader();
  231. if sys.stdout.isatty():
  232. response = downloader.retrieve(url, dst,
  233. lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
  234. sys.stdout.write("\n")
  235. else:
  236. response = downloader.retrieve(url, dst)
  237.  
  238. return response
  239.  
  240.  
  241. # start program
  242. if __name__ == "__main__":
  243. main(sys.argv[1:])

Ich habe nur oben das hinzugefügt:

  1. class SpringerDownloader(urllib.URLopener):
  2. version = "Mozilla"

Und die def geturl(url, dst) geändert in:

  1. def geturl(url, dst):
  2. downloader = SpringerDownloader();
  3. if sys.stdout.isatty():
  4. response = downloader.retrieve(url, dst,
  5. lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
  6. sys.stdout.write("\n")
  7. else:
  8. response = downloader.retrieve(url, dst)
  9.  
  10. return response

Great, thanks for the patch. Fri, 02/05/2010 - 14:16 — Milian Wolff

Great, thanks for the patch. I included it now (slightly different). Does it work with the vanilla source from github again now? I ask since I can still not download that one book ;-)

i can’t download with script Tue, 08/11/2009 - 14:55 — pappy (not verified)

i can’t download with script now, error here

“Please wait, link source is being downloaded… http://springerlink.com/content/f54k582l0w11xj18/

The book you are trying to download is called ‘Architecture of an LBS Platform to Support Privacy Control for Tracking Moving Objects in a Ubiquitous Environments’

found 1 chapters downloading chapter 1/1 http://springerlink.com/content/f54k582l0w11xj18/fulltext.pdf 100%

ERROR: downloaded chapter http://springerlink.com/content/f54k582l0w11xj18/fulltext.pdf has invalid mime type text/html - are you allowed to download it? “

Plz help :(

You need to be authenticated Tue, 08/11/2009 - 15:18 — Milian Wolff

You need to be authenticated for SpringerLink via VPN. This script does not support any other authentication.

I myself use it from my university where access to springerlink is automatically authenticated. If it is the same for your university, access one of the servers there and run the script from there. Ask your IT department.

I am authenticated via VPN Wed, 02/03/2010 - 23:18 — moohh (not verified)

I am authenticated via VPN and I can manually download the books by using Firefox, but if I want to try this script, I get the same error.

The download of a single pdf via wget doesn’t work either. I got ERROR 403.

update the script, I fixed Thu, 02/04/2010 - 00:45 — Milian Wolff

update the script, I fixed that a few hours ago.

Nevermind, I figured it out. Mon, 07/12/2010 - 19:06 — Anonymous (not verified)

Nevermind, I figured it out. Springer doesn’t like wget, so you need to fake the browser id using -U ‘Mozilla/5.0’

Milian, Thanks for the Mon, 07/12/2010 - 18:25 — Anonymous (not verified)

Milian,

Thanks for the script. Can you give some insight on how you fixed the 403 error when wget’ing a single PDF?

I’m trying to do something similar and can’t get past this issue …

Habe Cygwin installiert, Thu, 07/30/2009 - 17:01 — Anonymous (not verified)

Habe Cygwin installiert, darin läuft es. Habe nur bezüglich iconv auf Folgendes abgeändert:

p2 = subprocess.Popen([“iconv”, “-f”, “UTF-8”, “-t” ,”CP1258”],

Umlaute kommen dann komisch und es gibt ein Problem wenn Fragezeichen drin sind (im Dateinamen)

Eine andere Schwierigkeit kommt auf, wenn ein Buch aus mehreren Untervolumes besteht, da versagt dann das downloaden.

Das Skript ist genial gemacht, besten Dank an den Autor!

Hey, tolles Script! Ich habe Mon, 07/27/2009 - 23:49 — Christian (not verified)

Hey, tolles Script! Ich habe versucht es unter Windows zum laufen zu bekommen - und es auch geschafft!!! Musste jedoch die beiden Abfragen ob pdftk und iconv vorhanden sind abschalten. Beide gibt es für Windows und ich habe sie so integriert, dass sie Platformweit aufrufbar sind.

Herunterladen funktioniert, jedoch folgendes Problem:

  1. D:\Desktop\springerlink download>springer_download.py --link=http://springerlin
  2. .com/content/w6536t/?p=007eb555ebe6438c861aa6ca3f773b5d & pi=3
  3. Please wait, link source is being downloaded...
  4. http://springerlink.com/content/w6536t/
  5.  
  6. The book you are trying to download is called 'Thermodynamik'
  7.  
  8. found 9 chapters
  9. downloading chapter 1/9
  10. http://springerlink.com/content/w6536t/front-matter.pdf 100%
  11. downloading chapter 2/9
  12. http://springerlink.com/content/v0p76175827831r5/fulltext.pdf 100%
  13. downloading chapter 3/9
  14. http://springerlink.com/content/u402v50mx102830w/fulltext.pdf 100%
  15. downloading chapter 4/9
  16. http://springerlink.com/content/t4u767j96639714v/fulltext.pdf 100%
  17. downloading chapter 5/9
  18. http://springerlink.com/content/v125w51816124w22/fulltext.pdf 100%
  19. downloading chapter 6/9
  20. http://springerlink.com/content/vt14m63616714v11/fulltext.pdf 100%
  21. downloading chapter 7/9
  22. http://springerlink.com/content/mr2016q56221t25q/fulltext.pdf 100%
  23. downloading chapter 8/9
  24. http://springerlink.com/content/j0572760j2273063/fulltext.pdf 100%
  25. downloading chapter 9/9
  26. http://springerlink.com/content/w6536t/back-matter.pdf 100%
  27. merging chapters
  28. Traceback (most recent call last):
  29. File "D:\Desktop\springerlink download\springer_download.py", line 231, in <m
  30. dule>
  31. main(sys.argv[1:])
  32. File "D:\Desktop\springerlink download\springer_download.py", line 141, in ma
  33. n
  34. p1 = subprocess.Popen(["echo", bookTitle], stdout=subprocess.PIPE)
  35. File "d:\Programme\Python\lib\subprocess.py", line 595, in __init__
  36. errread, errwrite)
  37. File "d:\Programme\Python\lib\subprocess.py", line 804, in _execute_child
  38. startupinfo)
  39. WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden
  40. Der Befehl "pi" ist entweder falsch geschrieben oder
  41. konnte nicht gefunden werden.

Woran scheitert die weitere Verarbeitung? LG

Naja, ich will den Namen in Tue, 07/28/2009 - 01:51 — Milian Wolff

Naja, ich will den Namen in iconv pipen, weiß nicht ob das auf Windows überhaupt geht. Notfalls einfach auskommentieren und damitleben, dass der dir ggf. versucht ne Datei anzulegen die “ungute” Zeichen im Namen enthält… Oder nen anderen Weg finden iconv unter Windows aufzurufen (ohne echo). Oder vlt. mingw installieren - könnte gehen…

What about a Windows Tue, 06/02/2009 - 02:43 — Anonymous (not verified)

What about a Windows compatible script to allow download of articles from journals in an organized fashion? Thanks for ur consideration

I don’t use Windows and won’t Tue, 06/02/2009 - 12:59 — Milian Wolff

I don’t use Windows and won’t make the script windows-compatible. Yet I’d happily accept patches. Since python is cross-platform it should’nt be too hard. You’d just have to find alternatives to pdftk and iconv. These two dependencies make the script platform dependent.

hi milian! ich hab mal Wed, 04/08/2009 - 19:29 — flo (not verified)

hi milian!

ich hab mal schnell ne “just HASH” option eingefügt, wenn du willst ..

  1. try:
  2. opts, args = getopt.getopt(argv, "hlc:", ["help", "link=","content="])
  3. except getopt.GetoptError:
  4. error()
  5.  
  6. link = ""
  7.  
  8. for opt, arg in opts:
  9. if opt in ("-h", "--help"):
  10. usage()
  11. sys.exit()
  12. if opt in ("-l", "--link"):
  13. link = arg
  14. if opt in ("-c", "--content"):
  15. link = "http://springerlink.com/content/" + arg + "/?p="
  16.  
  17. ...
  18.  
  19. # give a usage message
  20. def usage():
  21. print """Usage:
  22. %s [OPTIONS]
  23.  
  24. Options:
  25. -h, --help Display this usage message
  26. -l LINK, --link=LINK using the whole link to start downloading
  27. -c HASH, --content=HASH uses just the HASH to start
  28.  
  29. LINK:
  30. The link to your the detail page of the ebook of your choice on SpringerLink.
  31. It lists book metadata and has a possibly paginated list of the chapters of the book.
  32. It has the form:
  33. http://springerlink.com/content/HASH/STUFF
  34. Where: HASH is a string consisting of lower-case, latin chars and numbers.
  35. STUFF is optional and looks like ?p=...&p_o=... or similar. Will be stripped.

is keine “schöne” lösung, aber ging schnell ^^

Ansonsten, großartig dein Skript!

Hab grad deinen Patch, leicht Wed, 04/08/2009 - 20:32 — Milian Wolff

Hab grad deinen Patch, leicht verändert, zu github geschoben. Danke :)

~~~

just pushed a commit to github with your patch (slightly modified). Thanks!

Thanks for the quick reply. Sat, 04/04/2009 - 16:37 — Anonymous (not verified)

Thanks for the quick reply. I think the problem is the university firewall which seems to be blocking the traffic as I can use the script from outside the university. One suggestion is to make this work as a firefox plugin (perhaps with imacros). Thanks for the useful script!

Hello. This software is an Fri, 04/03/2009 - 19:55 — Anonymous (not verified)

Hello. This software is an excellent idea but I get the following error:

4]# ./springer_download.py -l “http://www.springerlink.com/content/h381wp/?sortorder=asc&v=expanded” Please wait, link source is being downloaded… http://www.springerlink.com/content/h381wp/

ERROR: Bad link given ([Errno socket error] (110, ‘Connection timed out’))

The springerlink address is correct because I can paste it into a browser and both the webpage and the pdfs open up properly. I’m using andlinux which runs ubuntu as a service in windows vista. This could be the cause of the problem but the browser, Synaptic Package Manager and pinging from the console work. I’ve also tried to disable my firewall but this did not fix the problem. Thanks in advance for any insight into this problem.

I think I know what is the Fri, 04/03/2009 - 20:40 — Milian Wolff

I think I know what is the cause yet can’t test it myself right now. Try with the following link:

http://www.springerlink.com/content/h381wp/?sortorder=asc

Note the different layout of the page, I think that’s the cause. Hope that helps.

Sehr gut - genau was ich Mon, 03/16/2009 - 21:20 — Anonymous (not verified)

Sehr gut - genau was ich gesucht habe! Danke!

weitere Wünsche:

  • Die Seitenzahlen des PDF’s mit den Tatsächlichen synchronisieren.
  • Die einzelnen Kapitel über das PDF-Inhaltsverzeichniss anwählbar machen!

Beides nicht wirklich Mon, 03/16/2009 - 22:29 — Milian Wolff

Beides nicht wirklich möglich, da man dafür die PDFs bearbeiten müsste. Und PDF ist mehr oder weniger ein Read-Only-Dateiformat.

Und was wäre wenn man die Mon, 01/04/2010 - 13:33 — Anonymous (not verified)

Und was wäre wenn man die einzelnen Kapitel mit Namen als Bookmarks in das endgültige PDF einfügen könnte? Ginge das?

Find es raus, ich hab keine Mon, 01/04/2010 - 17:11 — Milian Wolff

Find es raus, ich hab keine Ahnung von PDF-Authoring.

Hallo, danke für das Skript, Fri, 03/12/2010 - 05:42 — Matthias (not verified)

Hallo, danke für das Skript, konnte es bisher mangels VPN nicht testen, aufgrund der positiven Resonanz gehe ich aber davon aus, dass es bestens funktioniert. Auch ich wäre an einer Version interessiert, die Bookmarks erstellt und dazu die jeweils passende Beschreibung von Springerlink mit einbindet. Ein Programm zum erstellen von Bookmarks mit Linux ist JPdfBookmarks, Anleitung dazu gibts hier: http://jpdfbookmarks.altervista.org und das Programm selbst unter http://flavianopetrocchi.blogspot.com/2008/07/jpsdbookmarks-download-pag… Wäre super, wenn du das noch einbauen könntest, ich selber komme derzeit leider nicht dazu, habe hier einen eingeschränkten Internetzugang und kann das Programm nicht downloaden. Außerdem geht mein VPN nicht…

Grüße, Matthias

Post new comment

The content of this field is kept private and will not be shown publicly.
  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options