Interestingly, I just found that the most searched keyword is PROC SQL, through the traffic analysis of my tiny blog. The reason possibly is: nowadays everybody knows SQL, more or less; then someone can do some parts of the SAS job by PROC SQL without using any procedure or DATA step.
As a dialect of SQL, PROC SQL can really play a lot of tricks in SAS, which are mostly carried out by DATA step or PROCs. Here I summarize the 10 interesting ones among them. Let's start with the free SAS help dataset SASHELP.CLASS. This dataset contains the weight, height, sex and other information of 19 fake teenagers. In this demo, I primarily paly with the WEIGHT variable from it.
obs = _n_;
1. Calculate the median of a variable
This is the task usually done by PROC MEANS. With the aggregating HAVING clause, PROC SQL can fulfill this purpose as well.
from (select e.weight
from class e, class d
group by e.weight
having sum(case when e.weight = d.weight then 1 else 0 end)
ge abs(sum(sign(e.weight - d.weight))));
2. Draw a horizontal bar chart
It is a substitute in case that we want to have a look at the distribution of a variable but don’t know PROC GPLOT or other plotting procedures.
select age, '|',
repeat('*',count(*)*4) as frequency
group by age
order by age;
3. Return the running total for a variable
DATA step and PROC IML both can get this job done, like what I have shown at a previous post. PROC SQL is the 3rd choice, just with the little help of a subquery at the SELECT clause.
select name, weight,
(select sum(a.weight) from class as a where a.obs <= b.obs) as running_total
from class as b;
4. Report data with subtotal
The first thing in my mind is the powerful REPORT procedure. However, PROC SQL is an alternative with the set operator.
select name, weight
select 'Total', sum(weight)
5. Find the column information from metadata
SAS stores the metadata at its DICTIONARY datasets. PROC SQL can visit the column information easily, without using the CONTENTS procedure.
select name, type, varnum
where libname = 'WORK' and memname = 'CLASS';
6. Rank a variable
PROC RANK is a really handy tool for this functionality. Besides it, PROC SQL can do some simple ranking as well.
select name, a.weight, (select count(distinct b.weight)
from class b
where b.weight <= a.weight) as rank
from class a;
7. Random sampling
PROC SURVEYSELECT involves with too much statistics. If we just need a simple random sampling, PROC SQL is a more popular option. For example, if I want to pick out 8 observations, I can use PROC SQL like:
proc sql outobs = 8;
order by ranuni(1234);
8. Replicate a data set without data
In PROC SQL, it is a fairly simple one-sentence statement to create an empty data set. Not sure DATA step can do it.DATA step also has a way to do it.
create table class2 like class;
9. Transpose data
DATA step ARRAY is a big headache for most people. Suppose that we want to list the names of the teenagers by their genders, PROC SQL has a way to do it, although it may require some complicated queries/subqueries.
select max(case when sex='F'
then name else ' ' end) as Female,
max(case when sex='M'
then name else ' ' end) as Male
from (select e.sex,
(select count(*) from class d
where e.sex=d.sex and e.obs < d.obs) as level
from class e)
group by level;
10. Count the missing values
The great thing in PROC SQL is that the NMISS and N functions under it work for both numeric and character variables. We can also write a macro based on them to search the missing values for all variables.
select count(*), nmiss(weight), n(weight)
- Posts: 98
Join date: 2012-03-14
Permissions in this forum:You cannot reply to topics in this forum