Tumblr NSFW Upload thread

Poll results: halp

I'll do my part and help
53.11% 777 votes
I just want to sit back and wank
46.89% 686 votes

Poll ended with 1463 votes.

Xaxu-Slyph
Pixel Perfection - I still call her Lightning Bolt
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
My Little Pony - 1992 Edition
Wallet After Summer Sale -
Not a Llama - Happy April Fools Day!
Artist -

Joltin' Jojo
@BigBuggyBastage  
This is kinda why I’m doing my own thing and not ‘contributing’ to the lists made. Redundecy would be welcome in this case. We can sort out things at a later date. I have a few inquires about possible repositories for all this data and permanent storage locations.
 
@MaresFillies  
@BigBuggyBastage  
I believe it was mentioned earlier in the thread that you COULD possibly pull from those ‘nothing here’ locations.
 
@LostPone  
This is the post I was talking about.
DBot
My Little Pony - 1992 Edition
Wallet After Summer Sale -

gui.derpi.user.title
@MaresFillies  
I’ve downloaded your list, will also provide torrent file soon.
 
But cocoscabin seems to contain mostly random, but i did full copy however
twkr

@Xaxu-Slyph  
Install Python 3 (I tested 3.6.6 and 3.7.1 but it should run on any version on Python 3). Then, if you are on macOS, Linux distro or *BSD, then follow the instructions from that post. If you’re on Windows… It gets more complicated:
 
  1. Add Python to your path (this stackoverflow thread provides a good explaination)  
  2. Download the script anywhere you like  
  3. Open command prompt in that directory (folder?)  
  4. Launch the script in command prompt using python3 %your_script_name% ’http://%some_blog_name%.tumblr.com/rss’ (parameters marked with percentage signs should be replaced accordingly) Python binary should start with “python”. Can have numbers at the end. In my case it was “python3”.
furrypony
Cosmia Nebula  - For Patreon supporters
Crystal Roseluck - Had their OC in the 2023 Derpibooru Collab.
Elements of Harmony - Had an OC in the 2022 Community Collab
Twinkling Balloon - Took part in the 2021 community collab.
My Little Pony - 1992 Edition
Happy Derpy! - For Patreon supporters
Bronze Supporter - Bronze Patron
Friendship, Art, and Magic (2018) - Celebrated Derpibooru's six year anniversary with friends.
Not a Llama - Happy April Fools Day!
An Artist Who Rocks - 100+ images under their artist tag

hopelessly sad filly
@MaresFillies  
@BigBuggyBastage  
Some problems I’ve encountered include the pictures not fully downloaded (like the png only contains the upper half), and when using bbolli’s python script, getting some images not downloaded at all (a workaround I detailed in the other post but basically I used shell to grab all external media urls in the generated html files and use wget on all of them, and manually examine the output to make sure it “makes sense” (really hard to automate that, there were many wtf moments), then after feeling satisfied that all those media files that still exists online are downloaded, run another script to change the external http urls to local urls).
 
Maybe the reason I’m getting these weird errors is just my internet connection is bad. I’m using a VPN and it sometimes gives up after I download through it too heavily.
rautamiekka
Lunar Supporter - Helped forge New Lunar Republic's freedom in the face of the Solar Empire's oppressive tyrannical regime (April Fools 2023).
King Sombra - Celebrated the 10th anniversary of The Crystal Empire!
Elements of Harmony - Had an OC in the 2022 Community Collab
Magical Inkwell - Wrote MLP fanfiction consisting of at least around 1.5k words, and has a verified link to the platform of their choice
Speaking Fancy - Badge given to members that help with translations
Cool Crow - "Caw!" An awesome tagger
Magnificent Metadata Maniac - #1 Assistant
Preenhub - We all know what you were up to this evening~
Twinkling Balloon - Took part in the 2021 community collab.
My Little Pony - 1992 Edition

@MaresFillies
@BigBuggyBastage
Some problems I’ve encountered include the pictures not fully downloaded (like the png only contains the upper half), and when using bbolli’s python script, getting some images not downloaded at all (a workaround I detailed in the other post but basically I used shell to grab all external media urls in the generated html files and use wget on all of them, and manually examine the output to make sure it “makes sense” (really hard to automate that, there were many wtf moments), then after feeling satisfied that all those media files that still exists online are downloaded, run another script to change the external http urls to local urls).
Maybe the reason I’m getting these weird errors is just my internet connection is bad. I’m using a VPN and it sometimes gives up after I download through it too heavily.
 
That’s very much likely; I’m using RipMe 1.7.72, made in Java 7+ or 8+, which once I figured out the Tumblr app registration made a quick work outta ripping my Tumblr likes. I don’t have a VPN and my Internet is usually solid.
Background Pony #C38C
Does anyone know a way to mass save tumblr background/banner art and avatars? Some pics don’t seem to appear anywhere else except background/banner/avatar.
BigBuggyBastage
Pixel Perfection - I still call her Lightning Bolt
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
My Little Pony - 1992 Edition
Wallet After Summer Sale -
Not a Llama - Happy April Fools Day!

Go fsck yourself
@furrypony  
I second that it’s your Internet connection. If your VPN provider offers different servers, you might want to check that you’re on one ‘designated’ for the ‘intended’ usage – some VPN providers are finicky about that. For instance, some servers might be set up for torrents only, a few for general use, others primarily for media/streaming. You’ll probably want to be connected to the latter, IFF your VPN provider offers such an aminal. Check that your client software is up-to-date, as well.
 
 
@Xaxu-Slyph  
Agreed; as long as “everyone grabs something,” and there are no ‘unclaimed’ sites left, I think we should be okay as far as a ‘complete’ archive, disorganized as it might be for now (and assuming all the data is good). Redundancy is truly underrated. :)
 
As far as ‘dead’ Tumblrs, I’ve only tried messing with one so far, and got nowhere. I might have to give it another look, as time permits, but I know we’re running short on that.
 
@twkr  
I’m afraid I’m of little help here, since embedded systems are my thing; C____ is as ‘modern’ a language I use. Most of the time, I’m doing things in C, or assembly if I need really small firmware. I’ll have a look at the code, but I doubt I can come up with anything useful in time.
Xaxu-Slyph
Pixel Perfection - I still call her Lightning Bolt
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
My Little Pony - 1992 Edition
Wallet After Summer Sale -
Not a Llama - Happy April Fools Day!
Artist -

Joltin' Jojo
@BigBuggyBastage  
1,530 currently ripped with the python script provided. I also noticed that I could go offline, load up an HTML and ‘view image’ and it would open it up. Thereby allowing one to grab something that way if they so wished. I’ve had errors like everyone else, but I’m doing my best to gather as much as possible. Many, MANY of them will be single post Ask Blogs that never got off the ground and a few mistyped folders as well. I will state that I will probably be the last one to be asked when this is all over due to wanting to be one of those ‘fill in the blank spot’ types in all of this. May I ask if there is a way we can do some kind of ‘restoration’ effort when this is all said and done? Like some kind of repository for everything? And if one WAS to start uploading everything they had from the beginning of an Ask Blog, but parts of it were here. How would you go about it? Cause if you wanted to make it a linear one you could do ‘Previous’ and ‘Next’ links, but if you threw a duplicate flag, how would you ensure links to the next part of the series? I know comments merge, but do descriptions?
Background Pony #EC47
@zontargs  
Just found this place looking for archival efforts out there, will this save animation and such? Movie files? Flash? Archive.org has always been notoriously shitty at capturing sites exactly as they are. Unfamiliar with if they’ve improved that much the past couple of years, but it’d be best to ask before dude rules a tumblr “saved” when it’s only piecemealed. If made aware of the shortcomings early perhaps they can be fixed on the fly while this is done.
Background Pony #EC47
ALSO, here’s something VERY important if you archive a Tumblr blog on the WaybackMachine:
Do not just archive the individual pages of a Tumblr blog, that way you will not get everything. If you archive the pages, the WaybackMachine does not automatically save the pictures themselves, ONLY the thumbnails.
To preserve the pictures themselves in the WaybackMachine, as well, you need to:
  1. Enlarge the picture by clicking on the thumbnail.
  2. Doing a right-click on the enlarged picture.
  3. Copying the picture URL.
  4. Pasting the picture URL in the WaybackMachine under “Save Page Now” and clicking on “Save Page”. Only then you’ll have a picture archived.
This is a VERY important thing to do. Someone archived all pages of Jan’s “Ask the Crusaders” askblog, but did not think on archiving the individual pictures, so as a result, you can’t enlarge them in the WaybackMachine archive and only have tiny thumbnails.
So make sure to archive the pictures themselves, too, only then you have a fully-functionable, archived blog.
 
See if you can directly access the old image URLs? Tumblr has always had a junky setup, like if you just plain close a blog but it was reblogged even once, the direct image URL still functions. I have a lot of old dead Robotech fan stuff I regularly URL upload onto forums for avatars and such. Of course, with such a massive amount of nukes coming, they might actually clear the database for the first time in a decade on the 17th
 
@Wiimeiser  
Well, considering the sort of people they originally attracted, and their general stances/takes on mlp over the years, and their utterly selfish scorched earth approach to things… Well, it may not be intentional, but I’m sure they consider it a little bonus and will be in no rush to correct this at all.
 
@hodmann oh good, someone else is familiar with that exploitable flaw! Nice job! 👍  
Excuse me if something was already mentioned btw, I am just doing a quick skim to see if any assists or trivial knowledge were required in the last stretch. I know how much giant robot & srw crossover there is fandom-wise! Also on that note, Giant Robo got absolutely macekred because of Ginrei. (40% sfw loss) Put on some damn pants already, woman!
MaresFillies
Pony! - Derpi Supporter
From the Night - For mods that have been previously banned
Diamond - For users who have donated to the site
Wallet After Summer Sale -
Twilight Sparkle -
Happy Derpy! - For Patreon supporters
Gold Supporter - Gold Patron

Assistant
Twi’s Engineer Hubby
@DBot
 
Okay, done with work. :) Going to continue backing up once I drive through traffic. Haha.
 
My total size is 34 GB. So pretty close. Alright! ;) That’s just those tumblrs right? If so I must be doing something right.
twkr

@Darkenetor  
If you’re getting 403, this means that your IP is throttled by tumblrs or you are using a bad proxy/VPN.  
Generally blogs with status 200 are accessible, 404 ones are completely dead and can be safely ignored. After you separate the dead (I use grep and sed for this, awk or some table processor should work good too), look for blogs with redirect_url containing “www.tumblr.com”. You should get what to do with them after this.
Darkenetor
Wallet After Summer Sale -

@twkr  
No proxies or VPN, test links are out of your lists and work in the browser, I haven’t downloaded anything at all from Tumblr here yet other than running that script on ~150 Tumblrs. Just checked again a script run and in the browser, same results.
 
@url,status,redirects_to  
http://4clop.tumblr.com,403,http://4clop.tumblr.com  
http://amnestie.tumblr.com,403,http://amnestie.tumblr.com  
http://albdifferent.tumblr.com,403,http://albdifferent.tumblr.com@
twkr

@Darkenetor  
I’ve ran in when I posted my reply to you and at least for me it runs perfectly fine. Not sure why it doesn’t work for you. It’s so simple pretty much nothing can break it.
 
UPD: it just occurred to me that I have “white” static IP that you may not. In this case it may be that someone else (or multiple people) whom you share an ISP with annoyed Tumblr too much and it decided to start throttling requests.
Darkenetor
Wallet After Summer Sale -

@twkr
 
Is this the right file, and what’s your OS and curl version?
 
I’m on Arch with this:  
@% curl –version  
curl 7.62.0 (x86_64-pc-linux-gnu) libcurl/7.62.0 OpenSSL/1.1.1a zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) libssh2/1.8.0 nghttp2/1.34.0  
Release-Date: 2018-10-31  
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp  
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL@
 
Had to replace line terminators with just \n but I can’t think of anything that could break for this.
twkr

@Darkenetor
 
Server (FreeBSD):  
@$ freebsd-version  
11.2-RELEASE-p1  
$ curl –version  
curl 7.62.0 (amd64-portbld-freebsd11.2) libcurl/7.62.0 OpenSSL/1.0.2o zlib/1.2.11 nghttp2/1.33.0  
Release-Date: 2018-10-31  
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smtp smtps telnet tftp  
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy@
 
Desktop (macOS):  
@$ sw_vers -productVersion  
10.13.6  
$ curl –version  
curl 7.63.0 (x86_64-apple-darwin17.7.0) libcurl/7.63.0 SecureTransport zlib/1.2.11  
Release-Date: 2018-12-12  
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp  
Features: AsynchDNS IPv6 Largefile NTLM NTLM_WB SSL libz UnixSockets@
 
I’ve also tested it on VMs with latest Alpine (curl 7.62) and Void (curl 7.63) and it seems to work fine. Though I only tested first 20 tumblrs from the list as it takes ages to process the whole list.
Xaxu-Slyph
Pixel Perfection - I still call her Lightning Bolt
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
My Little Pony - 1992 Edition
Wallet After Summer Sale -
Not a Llama - Happy April Fools Day!
Artist -

Joltin' Jojo
Note to anyone using the Python Script. You can apparently simply put a space between each blog you wish to have it grab and it will go until finished. So if you run C:\Python\Tumblr_backup.py 1 2 3 4. It would grab any media from the blogs of 1, 2, 3 and 4. I tested it a few times and it seems to be working. For any of you that just want to leave it running for awhile. Be warned. Some blogs reblog often and end up in the 5,000 plus range which CAN take some time. Anthroquestria had about 20,000 if I remember and I had one that actually got up to 60,000(Which I stopped. Probably leave that one for last.) I’m going for quantity as I can grab 20 or more blogs in the amount of time it would take to grab that one. Just a heads up.
furrypony
Cosmia Nebula  - For Patreon supporters
Crystal Roseluck - Had their OC in the 2023 Derpibooru Collab.
Elements of Harmony - Had an OC in the 2022 Community Collab
Twinkling Balloon - Took part in the 2021 community collab.
My Little Pony - 1992 Edition
Happy Derpy! - For Patreon supporters
Bronze Supporter - Bronze Patron
Friendship, Art, and Magic (2018) - Celebrated Derpibooru's six year anniversary with friends.
Not a Llama - Happy April Fools Day!
An Artist Who Rocks - 100+ images under their artist tag

hopelessly sad filly
@Xaxu-Slyph  
you could also write a shell script that reads in a text file of all the tumblrs you want to grab, line by line, and for each line, run the python script on it.
 
#!/bin/bash
filename=’tumblrs_to_download.txt’
echo Start
while read tumblr_name; do
python tumblr_backup.py –save-video-tumblr –save-audio tumblr_name
done < $filename
 
Note: please update your bbolli python script! 3 days ago it just got updated so that it can grab tumblr videos only. This saves the trouble of downloading ALL videos (including youtube ones) which was a major problem.
Interested in advertising on Derpibooru? Click here for information!
Ministry of Image - Fanfiction Printing

Help fund the $15 daily operational cost of Derpibooru - support us financially!

Syntax quick reference: **bold** *italic* ||hide text|| `code` __underline__ ~~strike~~ ^sup^ %sub%

Detailed syntax guide