dilbertone Posted February 25, 2011 Share Posted February 25, 2011 good day, hello dear community! i am currently ironing out a little parser-script. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I tried out several things - but i dont helped. I allways got bad results. See i have good csv-data - but unfortunatley no spider logic... I need some bits to get there! How to get there!? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } Well - with this i have a good csv-output:- but unfortunatley no spider logic. How to add the spider-logic here... !? well i need some help Love to hear from you Quote Link to comment Share on other sites More sharing options...
dilbertone Posted February 25, 2011 Author Share Posted February 25, 2011 good evening - here i am back again!! i run into troubles... i guess that i have made some mistakes while applying some code in the above mentioned script.... #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $i_first = "0"; my $i_last = "6100"; my $i_interval = "50"; for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; #process pageurl } my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } there have been some issues - i have made a mistake i guess that the error is here: for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; #process pageurl } my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces i have written down some kind of double - code. I need to leave out one part ... this one here; What do you think about this!? my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces I get these kind of errors - it looks very very nasty! martin@suse-linux:~> cd perl martin@suse-linux:~/perl> perl bavaria_all_.pl Possible unintended interpolation of %h in string at bavaria_all_.pl line 52. Possible unintended interpolation of %h in string at bavaria_all_.pl line 52. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52. syntax error at bavaria_all_.pl line 59, near "/," Global symbol "%h" requires explicit package name at bavaria_all_.pl line 59. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60. Substitution replacement not terminated at bavaria_all_.pl line 63. martin@suse-linux:~/perl> what do you think!? i look forward to hear from you! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.