MD5 checksum to identify the same file

Status
Not open for further replies.

estancia

New Member
1
2011
0
0
What are the chances MD5 checksum is the same, filesize is the same, but the file content is different? Do the filehosts delete by MD5?

I am curious what kind of storage they use, Local HD storage? or some high end storage that does de-dup and compression? so even one file getting deleted, part of the blocks sharing with another file might still present on the storage unless all files linked to that block have been deleted.

Thoughts/Insights??
 
2 comments
RapidGator and MegaUpload

I do know that once an asshole successfully completes a DMCA takedown notification with RapidGator they are sent a "file deleted" email that makes a reference to letting RapidGator know if the file re-appears as they will provide the md5 hash to prevent it from being re-uploaded. I've run into this myself where I've tried to re-upload a file and it gets the error "File is blocked".

Chances of a hash collision are extremely low, I've heard that even if you take all the files that Google has and rounded up their md5 checksum values you wouldn't find an accidental collision in that data set, in other words this just simply wont happen.

From listening to some interviews that Kim Dotcom gave about MegaUpload it does sound like he did have some sort of de-duplication in play for his storage on the 1100 or so servers that he had. My guess is that he was doing this with the ZFS file system which uses SHA-256 instead of md5. To the best of my knowledge no file host currently uses any such deduplication methods, especially when you have uploaders like myself who change the checksum intentionally with every upload it kind of makes it pointless at this time to enable :)
 
Status
Not open for further replies.
Back
Top