2017年2月7日火曜日

Get data and construct R sentence to load daily data. - perl


  1. data comes as "[[2017-01-01,54],[2017-01-02,79],<SKIP>,[2017-02-01,160]]".
  2. starts and ends with 2 brackets.
  3. each day's data is a pair of date and # of incident separated by "],[".
  4. execute wget command.
  5. open retrieved file.
  6. remove unnecessary brackets and separtor. store data into array @data.
  7. disassemble each @data entry to pick up date and data.
  8. construct R sentences to input, which is made up from start date, end date and # of incident of each day.



#! /usr/bin/perl

# $file="./testdata";
system("wget http://10.251.66.58/kljstatistics/php/dashboard/maindashboard/dailynew2016.php");
$file="./dailynew2016.php";
$outfile="> ./dataget.r";

open(IN,$file) or die "$!";
open(OUT,$outfile) or die "$!";
print("# start\n");
while(<IN>){
 # data is expected to come in a sigle line.
 # remove 2 brackets at the start and the end of the input.
 $buff = substr($_,2,length($_));
 $buff = substr($buff,0,length($buff)-2);
 # split data at the sequence of "],["
 @data = split(/\],\[/, $buff);
}
print "# file read end. the size of data  is ";
# get the length of the list.
$size = @data;
print "# ";
print $size;
print "\n\n\n";

$count = 0;
# prepare start of the output statement.
$output = "w <- c(); w <- append(w,xts(c(";
@xts = split(/,/,$data[0]); # take 1st element
$startdate = $xts[0];  # and store its date part
# print "startdate is ",$startdate; # debug purpose
foreach $element(@data)  # disassemble @data from start again.
{
  @xts = split(/,/,$element); #split at ','
  $output =$output.$xts[1].","; # store # of incident and concatenate
}

$output = substr($output,0,length($output)-1);
# add date sequence with start date and end date. end date comes from the last data element.
$output = $output."),seq(as.Date(\"$startdate\"),as.Date(\"$xts[0]\"),1)))";
print OUT $output;
print OUT "\n######### end date is ",$xts[0]," #####################\n";
print OUT "inc_daily_xts <- w\n";
print OUT "inc_daily <- as.numeric(w)\n";
print "\n\n\n";


$len = scalar(@data);
print "# $len th entry is $data[$len-1]\n";
# remove input file. don't forget this otherwise wget stores data in a different filename.
system("rm dailynew2016.php");
print("# from $startdate til $xts[0].\n");
print "# the end of the process.\n";

close(IN);
close(OUT);

0 件のコメント: