dilbertone Posted February 26, 2011 Share Posted February 26, 2011 good evening dear community! Howdy, at the moment i am debugging some lines of code... purpose: i want to process multiple webpages, kind of like a web spider/crawler might. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records Attempt: Here are the first 5 page URLs: http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=0 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=50 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=100 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=150 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=200 We can see that the "s" attribute in the URL starts at 0 for page 1, then increases by 50 for each page there after. We can use this information to create a loop: #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $i_first = "0"; my $i_last = "6100"; my $i_interval = "50"; for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $html = get("http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"); $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { #trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; #load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; #derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; #trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } } i tested the code and get the following results: .- see below - the error message shown in the command line... btw: here the lines 57 and 58: #trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; what do you think? Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.