Archive for March, 2010

Analyze billions of data records

Monday, March 22nd, 2010

I am using MapReduce and Hive (on Amazon EC2) to analyze large data sets.

Early Fraud Detection System

Monday, March 22nd, 2010

Traditional fraud prevention methods try to detect fraud when it happens. By exploring the unusually behavior patterns long before when fraud occurs, we are able to proactively prevent fraud with minimum impact to good customers.

SAS vs. Oracle SQL for data preparation?

Friday, March 5th, 2010

As we know, in a data analysis project we spend most of time preparing the data. I have found that using Oracle SQL for data preparation is handy. For example, to calculate the moving average for each patient, it takes more than 15 lines of SAS statements. When using Oracle SQL to do the same thing, it is simply a call of avg function, “avg(num) over(partition by patient order by month rows between 4 preceding and current row)”. Please see the scripts below. Could you share you experience in this area?

SAS scripts: Calculating moving average
proc sort data=ds1;
by patient;
run;

%let n = 4;

data ds2;
set ds1;
by patient;
retain num_sum 0;
if first.patient then do;
count=0;
num_sum=0;
end;
count+1;
last&n=lag&n(num);
if count gt &n then num_sum=sum(num_sum,num,-last&n);
else num_sum=sum(num_sum,num);
if count ge &n then mov_aver=num_sum/&n;
else mov_aver=.;
run;

Oracle SQL: Calculating moving average

create table ds2 as select a.*, avg(num) over(partition by patient order by month rows between 4 preceding and current row) mov_aver from ds1 a;

We provide advanced Oracle SQL training for SAS programmers. For details, please contact me at jzhou@businessdataminers.com. We believe Oracle SQL skill could really make the data work more productive and manageable.