SHA512 checksum integrity problems?

Lapsio

I pulled entire Derpibooru around week ago (2 161 243 images with metadata). At this point I’m verifying integrity of data against derpicdn and it’s really confusing because around 50% of images have bad sha512 checksum. For example this:
 
https://derpibooru.org/images/2185
 
API says it has sha512: 93f198…something…something  
But it actually has in full representation sha512: c19e3d…something…something
 
And it’s not like 10 images suffer from this issue but over 1 200 000 so around half of pictures report bad sha. Images itself tbh don’t look corrupted but it’s quite hard to check 1 200 000 files manually to determine if they’re not corrupted…
 
Am I missing something?
LemonDrop
Duckinator - Same nonsensical quacks in every pond
Pixel Perfection - I still call her Lightning Bolt
Lunar Guardian - Earned a place among the ranks of the most loyal New Lunar Republic soldiers (April Fools 2023).
Crystal Roseluck - Had their OC in the 2023 Derpibooru Collab.
Elements of Harmony - Had an OC in the 2022 Community Collab
Non-Fungible Trixie -
Twinkling Balloon - Took part in the 2021 community collab.
Ten years of changes - Celebrated the 10th anniversary of MLP:FiM!
My Little Pony - 1992 Edition
Friendship, Art, and Magic (2020) - Took part in the 2020 Community Collab

C++ Crazed
I get the same result so something is likely fucked. Given the updating of the image’s hash happens occasionally whenever thumbnails are reprocessed I think (based on what I can see in the code) there is a possibility it was broken before and just hasn’t been recalculated since then (as the original sha512 hash for that old image is null, maybe they didn’t have that field long ago or something). Either way the current code doesn’t look like it has the way it calculates wrong or anything, so maybe it’s just with older images.
Rene_Z
Non-Fungible Trixie -
My Little Pony - 1992 Edition
Wallet After Summer Sale -
Not a Llama - Happy April Fools Day!
Artist -

The sha512 field is often the hash of the original image before optimization. Post related.  
I don’t know if or when that changed, since they always had both the orig_sha512_hash and the sha512_hash fields, but they apparently didn’t work before.  
Cloudflare also applies their own compression to images, but I don’t know if that feature is still in use.
 
The hash is used mostly to prevent duplicate uploads.
Background Pony #E705
I did some experiments, I see that pictures in the range November 2015 - December 2019 have this problem of checksum mismatch. Later or earlier images doesn’t have this problem.
 
 
It is possible to manually check 100-1000 random images, to verify they look OK. When random 100 or 1000 randomly sampled objects are OK, it gives very large confidence that more than 99% of objects are ok. Anyway, this is useful also to catch many other bugs in download process and verify download better (for example, missing pictures due wrong default filter or wrong data with correct matching hash of the wrong data)
Background Pony #10DD
@Background Pony #E705
 
I was just downloading by id, not search. I mean I was just scraping all posts from 1 to 2400000 using API, downloading everything that had .representations.full JSON field available xD
Interested in advertising on Derpibooru? Click here for information!
KilianKuro Commissions!

Help fund the $15 daily operational cost of Derpibooru - support us financially!

Syntax quick reference: **bold** *italic* ||hide text|| `code` __underline__ ~~strike~~ ^sup^ %sub%

Detailed syntax guide