About this Resource

It is important to develop good habits in organising and saving your work. This is particularly true when working on large datasets.

The need to be well organised when handling large datasets is particularly important. The difference in time to run analysis on a dataset containing 250000 cases and 3000 variables requires a different approach to analysing a file with 2000 cases and 350 variables. Not being organised will mean that you spend a lot of time waiting for the process to complete. An important element of any analysis is reproducibility - can you reproduce the analysis and get the same result. With many analysis packages offering browser type interfaces where you choose actions from drop down menus this can be difficult, particularly if the analysis involves a lot of transformations of variable.

Most analysts only use menu driven options when they want to quickly answer a very simple question such as what is the basic distribution of the variable in the population or to check cell sizes. Developing the habit of working with syntax (a programme specification for the analysis) rather than menu clicks will make your analysis much more efficient in the long run. For example, if you need to rerun the analysis for any reason, the steps in the analysis are already set out in the programme rather than trying to redo all the steps in sequence.

Example 5: SPSS Syntax


freq vars=cntry freehms prtyban euftf.

* you can insert comments into your syntax to document what you are doing.
* an asterisk at beginning means it will be recognised as a comment not a command

CROSSTABS
/TABLES=cntry BY freehms prtyban
/CELLS= COUNT ROW.
FREQUENCIES
VARIABLES=freehms.
EXECUTE.

* create new variable grouping into 3 categories rather than 5

RECODE
freehms
(1 thru 2=1) (3=2) (4 thru 5=3) (SYSMIS=Copy) INTO freehms3 .
VARIABLE LABELS freehms3 'Gays and lesbians free to live as they wish 3 category'.
VALUE LABELS freehms3
1 'Agree or agree strongly'
2 'neither'
3 'disagree or stronger' .freq vars=freehms3.
EXECUTE .
FREQUENCIES
VARIABLES=freehms freehms3.
EXECUTE.
FREQUENCIES
VARIABLES=freehms freehms3.

* apply the design weight, run the tables and then remove the weight.

WEIGHT
BY dweight .

CROSSTABS
/TABLES=cntry BY freehms freehms3
/CELLS= COUNT ROW.

WEIGHT
OFF.

Download this syntax example as a text file

The University of Manchester; Mimas; ESRC; RDI

Countries and Citizens: Unit 3 Making cross-national comparisons using micro data by Siobhan Carey, Department for International Development is licensed under a Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales Licence.