Status
Not open for further replies.

Tango

Moderator
Staff member
4,060
2009
1,618
16,205
RS,HF,FS,MF,UL,FN LinkBot

Since RS made the 25GB limit there were 1000`s of dead posts on waz-warez so I have made a link checker (based on the dman checker)


My coding is poor, so if you want to bash it, make your own and post it, or post some improvements for it.

Checks downloads links from
Rapidshare.com
Hotfile.com
Fileserve.com
Mediafire.com
Uploading.com
FileSonic.com

Uses very little resources & checks upto 80k topics per hour depending on your server location :)


Run the script using the url linkchecker.php?topic=1000
t= is starting topic

i3084_livelinks.jpg


i3083_deadlinks.jpg


Download HERE MODX Format
 
100 comments
I have this working for ipb 3 but for some reason it will only check guest's area and i have allowed access for it
 
Script looks really good.
I've got the php all configured. Now, how do I create the table?
CREATE TABLE IF NOT EXISTS `checked_topics` (
`id` int(9) unsigned NOT NULL auto_increment,
`topicid` varchar(255) default NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
`oldforumid` varchar(255) NOT NULL,
PRIMARY KEY (`id`)

) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=103796 ;

Where do I put this?
 
I'm using this and it's working nicely.
The only problem I'm having is that it skips a very large amount of threads saying:
Code:
[URL="http://hd-dump.org/viewtopic.php?&t=75"][B]http://hd-dump.org/viewtopic.php?&t=75[/B][/URL]
No  Download Links or Already Binned
Although these threads clearly DO have links and haven't been binned.

Any ideas?
I think it's skipping around 80% of my threads.
Also, another question.
How can I tell the linkbot to recheck threads that it's already checked in the past? The only way I see is to delete the content of the checked_topics table in mysql.
Is there any other way?
Also, is there a way to define a % of dead links to which to decide to trash the topic, like 90% or something.

Thanks
 
Make sure bots can view the posts.

Try altering these lines.
  1. $stag="<div class=\"postbody";
  2. $etag="postprofile";
look in your page source for a bit of code above the download links.


try
  1. $stag="<code>";
  2. $etag="</body>";
 
Thanks for the advice.
It didn't help though :(

I'm not sure if it's got to do with being able to view them, since it seems to be pretty random. It can read some threads within a forum, and others it can't... it doesn't matter what section it's in (permissions).
 
awsome script gavo :) top man for the shear bro.

any chance you know how i can get it to post on the thread the the topic was checked on e.g (24/04/2010-at 9pm) or somthing like that :)
 
I'm using this and it's working nicely.
The only problem I'm having is that it skips a very large amount of threads saying:
Code:
[URL="http://hd-dump.org/viewtopic.php?&t=75"][B]http://hd-dump.org/viewtopic.php?&t=75[/B][/URL]
No  Download Links or Already Binned
Although these threads clearly DO have links and haven't been binned.

Any ideas?
I think it's skipping around 80% of my threads.
Also, another question.
How can I tell the linkbot to recheck threads that it's already checked in the past? The only way I see is to delete the content of the checked_topics table in mysql.
Is there any other way?
Also, is there a way to define a % of dead links to which to decide to trash the topic, like 90% or something.

Thanks
try
  1. $stag="<code>";
  2. $etag="</code>";
or
  1. $stag="<div class="codecontent">";
  2. $etag="</div>";
 
I would prefer to check links by API for diferant hosts to keep the speed up and not get IP banned :) but will check with other hosts.


It has checked 22k topics in 1 hour :D
 
I would prefer to check links by API for diferant hosts to keep the speed up and not get IP banned :) but will check with other hosts.


It has checked 22k topics in 1 hour :D

Really nice, you should tune, make it more user friendly and submit it to phpBB(dot)com ;)
 
I have not seen your script but if you are using code tags that may not be the best way.

Scanning the forums for the api types ('%rapidshare.com/files/%' or post like '%hotfile.com/dl/%' or post like '%megaupload.com/?d=%'
or post like '%filefactory.com/file/%' or post like '%freakshare.net/files/%') is a safer way of getting the links only then using regex to be sure it is a file link.

It used to take 8 hours to scan 100K posts but now 13 minutes. Gav0 is correct in dropping the curl one link at a time.
 
I would prefer to check links by API for diferant hosts to keep the speed up and not get IP banned :) but will check with other hosts.


It has checked 22k topics in 1 hour :D

The only other host that I know of that has an API that you can check links is SendSpace and it's in Beta. It would be really cool if the other big file hosts added an API as it's much faster.
 
Status
Not open for further replies.
Back
Top