I wanted a way to monitor the web for certain terms(i.e. leaked info on a company). For example, being able to have an arrary of search terms and operators to query aganist, and then email me a nice little html report. This is the reason for googs.pl.

I used google API to do the querys. I also (although not sure how well it works) append the google operator daterange: which needs the julian date, thus hoping to only return new results that day and only email me if it does find new ones. This way I don’t have to look at old stuff all the time or get tons of email. You can comment that feature out if you dont want it. To figure the date I used the perl module Cal::Date which I posted a link below. Then I just set it up in a cronjob to run everyday.

# Devin Ertel
# googs.pl
# 
#!/usr/bin/perl
 
use strict;     
use SOAP::Lite;
use MIME::Lite;
use Net::SMTP;
use Cal::Date qw(DJM MJD today);
 
#Get Todays Date
my $date = today();
 
#convert to julian
my $jul_today= DJM($date);
 
#Put Your Google API Key Here
my $google_key='your_google_key_here';
 
#Google WSDL File Location
my $google_wsdl = "./GoogleSearch.wsdl";
 
#Put querys here, escape any "'s with \" 
my $query;
my @query = ("company + hacking",
	     "allintext:company + hacking",
             "your querys"
	     );
 
 
#assign current julian date to query
my $goog_daterange = " + daterange:".$jul_today."-".$jul_today;
 
#SOAP::Lite instance with GoogleSearch.wsdl.
my $google_soap = SOAP::Lite->service("file:$google_wsdl");
 
 
#Set Up Mail Vars
my $faddy = 'from_address@blah.com';
my $taddy = 'to_address@blah.com';
my $mail_host = 'your_mail_host';
 
my $subject = "New Information Posted!";
my $msg_body ="";
 
#Its Google Time
 
#Loop Through Array of Querys
foreach $query (@query){
 
	#add daterange: operator to curren query
	my $query_date=$query.$goog_daterange;
 
	my $results = $google_soap -> 
    		doGoogleSearch(
      			$google_key, $query_date , 0, 10, "false", "",  "false",
      			"", "latin1", "latin1"
    		);
 
	# Exit On No Results
	@{$results->{resultElements}} or exit;
 
	# Loop Results and Output to HTML
	foreach my $result (@{$results->{resultElements}}) {
 
        #had to take brackets out for this post for the html breaks and lines
	$msg_body .= "br".
  		      $result->{'title'}."br".
  		      "a href=".$result->{URL}.">".$result->{URL}."/a br".
  		      $result->{snippet}.
		      "
hr";
 
	}
}
#Setup Message
 
my $msg=MIME::Lite->new (
        From => $faddy,
        To => $taddy,
        Subject => $subject,
	Type => 'TEXT/HTML',
	Encoding => 'quoted-printable',
	Data => $msg_body,
)       or die "Could Not Create Msg: $!\n";
 
 
#Send Message
MIME::Lite->send('smtp', $mail_host, Timeout=>60);
$msg->send;

References:
http://freshmeat.net/projects/caldate/
http://www.google.com/apis/
http://search.cpan.org/~yves/MIME-Lite-3.01/lib/MIME/Lite.pm

One Thought on “Monitor Web w/ googs.pl

  1. This is a really good read for me. Must admit that you are one of the best bloggers I have ever read. Thanks for posting this informative article.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation