2016年8月19日金曜日

revisiting perl


CAUTION!

Perl list seq number starts from ZERO, then some_list[size_of_list] returns nothing. when access to the last element,  use some_list[size_of_list -1].

this is not R.

The input file format is as below.

[[2016-01-01,111],[2016-01-02,222], <skip>,[2016-09-03,999]]


  • Contents comes as a single line.
  • Remove 2 brackets at the beginning and the end of the line.
  • Disassemble the line at "],[", thus each entry contains the pair of data and its date.
  • Dissasseble each entries at ",". Do this from the first entry to the last.
  • Save the date of the first entry.
  • Concatenate retrieved data with ",". Thus construct the sequence which start with the data of the first date till the last date.
  • Use the date of the first entry and the last entry to create the necessary date sequence such as "seq(as.Date(\"start_date\"),as.Date(\"end_date\"),1)"


#! /usr/bin/perl

# $file="./testdata";
system("wget http://10.251.66.58/kljstatistics/php/dashboard/maindashboard/dailynew2016.php");
$file="./dailynew2016.php";
$outfile="> ./dataget.r";

open(IN,$file) or die "$!";
open(OUT,$outfile) or die "$!";
print("# start\n");
while(<IN>){
# data is expected to come in a sigle line.
# remove 2 brackets at the start and the end of the input.
$buff = substr($_,2,length($_));
$buff = substr($buff,0,length($buff)-2);
# print $buff; # for debug only
# split data at the sequence of "],["
@data = split(/\],\[/, $buff);
}
print "# file read end. the size of data  is ";
# get the length of the list.
$size = @data;
print "# ";
print $size;
print "\n\n\n";


$count = 0;
# continue process from the start till the end of the list
# prepare start of the output statement.
$output = "w <- c(); w <- append(w,xts(c(";
while($count < $size){
# break up the element by comma.
@xts = split(/,/,$data[$count]);
# for the first element of the list.
if($count == 0){
# save start date for the later process.
$startdate = $xts[0];
}
# concatenate incident number with comma.
$output =$output.$xts[1].",";
$count++;
}
# remove comma at the end of $output
$output = substr($output,0,length($output)-1);
# add date sequence with start date and end date. end date comes from the last data element.
$output = $output."),seq(as.Date(\"$startdate\"),as.Date(\"$xts[0]\"),1)))";
print OUT $output;
print "\n\n\n";


$len = scalar(@data);
print "# $len th entry is $data[$len-1]\n";
# remove input file. don't forget this otherwise wget stores data in a different filename.
system("rm dailynew2016.php");
print("# from $startdate til $xts[0].\n");
print "# the end of the process.\n";

close(IN);
close(OUT);

0 件のコメント: