2017年2月27日月曜日

Edit CSV with Perl - set environment parameter-

don't forget to set env parameters below in order to use Perl's "CSV_XS" library. they might be included in ".bashrc". please check your environment first.

PERL5LIB,PERL_LOCAL_LIB_ROOT,PERL_MB_OPT,PERL_NM_OPT



PATH="<some directry>/perl5/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="<some directry>/perl5/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="<some directry>/perl5${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"<some directry>/perl5\""; export PERL_MB_OPT;
PERL_MM_OPT="INSTALL_BASE=<some directry>/perl5"; export PERL_MM_OPT;

2017年2月25日土曜日

Get JGB historical data from MOF - Part 2.




  1. Updated  the previous version to use foreach, join,push and regular expression. this is more like real Perl. 
  2. using "foreeach"  instead of "for", no longer denepds on the fixed length list.
  3. with "push" and "join", codes are more readable.
  4. prepare the condition clause after heisei era.



system("wget http://www.mof.go.jp/jgbs/reference/interest_rate/data/jgbcm_all.csv");
$file="./jgbcm_all.csv";
$outfile="> ./jgb_seireki.csv";

open(IN,$file) or die "$!";
open(OUT,$outfile) or die "$!";
print("# start\n");
print OUT "Date,oneY,twoY,threeY,fourY,fiveY,sixY,sevenY,eightY,nineY\n";
$line_num = 0;
while(<IN>){
  $line_num++;           # increment counter before "next" otherwise......
  next if($line_num <3); # skip iteration if counter is less than 3 to skip 1st and 2nd lines.
  $gengou = substr($_,0,1);   # pick up the first character of the line and store into $gengou.
  @buff = split(/,/, $_); # split the line by comma.
  @data = split(/\./, $buff[0]); # split the first field by period.
  $year = substr($data[0],1,length($data[0])); # pick up the numerical part of wareki data.
  if($gengou eq 'S'){  # if the first character = S, the data belongs to showa.
    $year=$year+1925;  # add 1925 to adjust as showa period started in 1926.
  }
  if($gengou eq 'H'){  # for the case of heisei until 2019.
    $year=$year+1988;  # adjust for heisei period.
  }
  # if($gengou eq 'Z'){  # for the case of after heisei. change 'Z' to the appropriate
  #   $year=$year+2018;  # adjust 
  # }
  $outbuff = ""; # initialize buffer to construct output
  my @interest = (); # the list to store zero padded interest rate
  foreach $i(@buff){  # pick up element
    if($i !~ /^[A-Z]/ and $i =~ /[0-9]/){ # if the element is all numeric.
      push(@interest,sprintf("%.3f", $i));  # zero padd and push into the array
    }else{
      next; # otherwise junmp to the next element
    }
  }
  $outbuff = join(",",@interest,);  # join array concat w/ comma.
  print OUT  "$year-$data[1]-$data[2],$outbuff\n";
}
###

close(IN);
close(OUT);
system("rm jgbcm_all.csv")

2017年2月24日金曜日

Get JGB historical data from MOF

財務省のサイトより日本国債の金利データを取得し、XTS形式で扱えるように変換する。すなわち、第一フィールドの日付データが和暦(Sで昭和を、Hで平成を表している)に、第二フィールド以降の金利データを小数点以下3桁固定のフォーマットに変換する。


  1. data comes as csv format and filename is jgbcm_all.csv
  2. execute wget command.
  3. skip 1st and 2nd lines
  4. convert data(see 7.).
  5. execute system("perl jgb.pl");jgb_xts <- as.xts(read.zoo(read.csv("jgb_seireki.csv")));
  6. don't forget to remove jgbcm_all.csv!
  7. data comes as "S49.10.3,10.388,9.378,8.839,8.520,8.354,8.298,8.244,8.120,8.203"
    1. 1st field is date. S49.10.3 must be converted to 1974-10-3.
    2. H for heisei, S for showa.
    3. other fields contain interest data. must do zero padding.




system("wget http://www.mof.go.jp/jgbs/reference/interest_rate/data/jgbcm_all.csv ");
$file="./jgbcm_all.csv";
$outfile="> ./jgb_seireki.csv";

open(IN,$file) or die "$!";
open(OUT,$outfile) or die "$!";
print("# start\n");
print OUT "Date,oneY,twoY,threeY,fourY,fiveY,sixY,sevenY,eightY,nineY\n";  # output head line
$line_num = 0;
while(<IN>){
  if($line_num > 1){  # skip 1st and 2nd lines.

    $gengou = substr($_,0,1);   # pick up the first character of the line.
    @buff = split(/,/, $_); # split the line by comma.
    @data = split(/\./, $buff[0]); # split the first field by period.
    $data[0];
    $year = substr($data[0],1,length($data[0])); # pick up the numerical part of wareki data.
    # print OUT "$gengou";
    if($gengou eq 'S'){  # if the first character = S, data belongs to showa.
        $year=$year+1925;  # add 1925 to adjust as showa period started in 1926.
      }
      else{  # just for the case of heisei until 2019.
      $year=$year+1988;  # adjust for heisei period.
    }
    for ($count = 1; $count < 10; $count++){
      $interest[$count] = sprintf("%.3f", $buff[$count]);  # zero padding each interest data.
    }
    $output = $interest[1].",".$interest[2].",".$interest[3].",".$interest[4].",".$interest[5].",".$interest[6].",".$interest[7].",".$interest[8].",".$interest[9];

    print OUT   "$year-$data[1]-$data[2],$output\n";
  }
    #  print OUT "$year-$data[1]-$data[2],$buff[2],$buff[9]\n";}
  $line_num++;
 # print
}
close(IN);
close(OUT);
system("rm jgbcm_all.csv")


2017年2月17日金曜日

Manipulate columns

When data day_xts has multiple columns and you like to delete or pick up one of them, writing a simple condition clause is enough to do that job.
However, in the case to deal with more than one column, it doesn't work and you need other solution.


> head(day_xts)
           new_call new_mail abandon inc
2013-01-01       35       28    12.9  63
2013-01-02       55       21    31.1  76
2013-01-03       86       20    76.7 106


> head(day_xts[,colnames(day_xts) != "inc"])
           new_call new_mail abandon
2013-01-01       35       28    12.9
2013-01-02       55       21    31.1
2013-01-03       86       20    76.7


Using "grep" with regular expression picks up columns either "inc" or "abandon".

> head(day_xts[,grep('inc|abandon',colnames(day_xts))])
           abandon inc
2013-01-01    12.9  63
2013-01-02    31.1  76
2013-01-03    76.7 106


"invert" argument does reverse the result. The columns which are NOT designated in "grep" will be chosen.

> head(day_xts[,grep('inc|abandon',colnames(day_xts),invert=T)])
           new_call new_mail
2013-01-01       35       28
2013-01-02       55       21
2013-01-03       86       20

2017年2月10日金曜日

VAR - 1992::2016-12-31


prepare data for extended period.

kikan <- c("1992::2016-12-31")
v_GPC_q_1992_2016 <- merge(GDPC96[kikan],to.quarterly(UNDCONTSA[kikan])[,4])
v_GPC_q_1992_2016 <- merge(v_GPC_q_1992_2016,to.quarterly(PAYEMS[kikan])[,4])
names(v_GPC_q_1992_2016)[2] <- "UNDCONTSA"
names(v_GPC_q_1992_2016)[3] <- "PAYEMS"

> VARselect(v_GPC_q_1992_2016)
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
    10      2      2     10 

$criteria
                  1            2            3            4            5            6            7            8            9           10
AIC(n) 2.785261e+01 2.545883e+01 2.543818e+01 2.542177e+01 2.544280e+01 2.546334e+01 2.552722e+01 2.558417e+01 2.532732e+01 2.523473e+01
HQ(n)  2.798702e+01 2.569405e+01 2.577421e+01 2.585860e+01 2.598043e+01 2.610179e+01 2.626647e+01 2.642423e+01 2.626818e+01 2.627640e+01
SC(n)  2.818592e+01 2.604212e+01 2.627145e+01 2.650502e+01 2.677603e+01 2.704655e+01 2.736041e+01 2.766735e+01 2.766047e+01 2.781786e+01

FPE(n) 1.248282e+12 1.140354e+11 1.119086e+11 1.104548e+11 1.134074e+11 1.166774e+11 1.257451e+11 1.350810e+11 1.064882e+11 9.943996e+10

This will give the predict from now to 16 quarter latter.

predict(VAR(v_GPC_q_1992_2016,la=10),n.ahead = 16)

2017年2月7日火曜日

Get data and construct R sentence to load daily data. - perl


  1. data comes as "[[2017-01-01,54],[2017-01-02,79],<SKIP>,[2017-02-01,160]]".
  2. starts and ends with 2 brackets.
  3. each day's data is a pair of date and # of incident separated by "],[".
  4. execute wget command.
  5. open retrieved file.
  6. remove unnecessary brackets and separtor. store data into array @data.
  7. disassemble each @data entry to pick up date and data.
  8. construct R sentences to input, which is made up from start date, end date and # of incident of each day.



#! /usr/bin/perl

# $file="./testdata";
system("wget http://10.251.66.58/kljstatistics/php/dashboard/maindashboard/dailynew2016.php");
$file="./dailynew2016.php";
$outfile="> ./dataget.r";

open(IN,$file) or die "$!";
open(OUT,$outfile) or die "$!";
print("# start\n");
while(<IN>){
 # data is expected to come in a sigle line.
 # remove 2 brackets at the start and the end of the input.
 $buff = substr($_,2,length($_));
 $buff = substr($buff,0,length($buff)-2);
 # split data at the sequence of "],["
 @data = split(/\],\[/, $buff);
}
print "# file read end. the size of data  is ";
# get the length of the list.
$size = @data;
print "# ";
print $size;
print "\n\n\n";

$count = 0;
# prepare start of the output statement.
$output = "w <- c(); w <- append(w,xts(c(";
@xts = split(/,/,$data[0]); # take 1st element
$startdate = $xts[0];  # and store its date part
# print "startdate is ",$startdate; # debug purpose
foreach $element(@data)  # disassemble @data from start again.
{
  @xts = split(/,/,$element); #split at ','
  $output =$output.$xts[1].","; # store # of incident and concatenate
}

$output = substr($output,0,length($output)-1);
# add date sequence with start date and end date. end date comes from the last data element.
$output = $output."),seq(as.Date(\"$startdate\"),as.Date(\"$xts[0]\"),1)))";
print OUT $output;
print OUT "\n######### end date is ",$xts[0]," #####################\n";
print OUT "inc_daily_xts <- w\n";
print OUT "inc_daily <- as.numeric(w)\n";
print "\n\n\n";


$len = scalar(@data);
print "# $len th entry is $data[$len-1]\n";
# remove input file. don't forget this otherwise wget stores data in a different filename.
system("rm dailynew2016.php");
print("# from $startdate til $xts[0].\n");
print "# the end of the process.\n";

close(IN);
close(OUT);