CAQDAS - A primer

CAQDAS - A primer

Thomas König

Computer-Assisted Qualitative Data Analysis Software (CAQDAS) — also sometimes simply called Qualitative Data Analysis Software (QDAS or QDA software) — becomes ever more popular in social research, but there are still many people within academia, who remain unconvinced about the exact value of CAQDAS for their research. This essay argues that CAQDAS are a useful tool for certain methodologies, might aide some other methodologies, and are inferior to non-CAQDAS or even no software with respect to still other methodologies. After a brief description of the basic CAQDAS functions and dysfunctions, I will elaborate an, admittedly incomplete and preliminary, overview of the methodological affinities of contemporary CAQDAS. These affinities, in turn, have implications for the choice of a particular CAQDAS and the precautions, when should keep in mind when using CAQDAS.

What CAQDAS can do

Readers familiar with the concept of CAQDAS might want to skip this section.

Computer-Assisted Qualitative Data Analysis Software (CAQDAS)searches, organizes, categorizes, and annotates textual and visual data. Programs of this type also frequently support theory-building through the visualization of relationships between variables that have been coded in the data.

Search Functions

All CAQDAS perform some form of text and code searches. Among the most common search functions are:

Not all CAQDAS perform all of these search types and those that do offer a certain search type do not necessarily to the same degree of versatility. It is also worth noting that all programs require you to import your data into the CAQDAS. If you merely want to search your data without actually coding the data, the academic) freeware InfoRapid Search&Replace offers one of the most customizable search facilities, which can be used for a very wide range of textual data formats.

Coding Functions

At the heart of CAQDAS is their ability to code data. A number of basic coding strategies exists:

Just like with the search functions, the range of coding functions available varies with the specific CAQDAS.

Conceptual Network

All CAQDAS offer you to arrange your codings in a codebook, which frequently can be hierarchically organized. But some programs moreover allow you to link your codes in a number of ways and visualize these links in diagrams. Most newer programs have also a number of ways to visualize co-occurrences of codes, to assist you in theory building.

What CAQDAS cannot do

It should go without saying that CAQDAS cannot analyze data on its own. But even when taken on its own terms, there some desirable functions, which CAQDAS to date cannot adequately perform.

CAQDAS cannot Undo

Most editor programs social scientists use these days (such as Microsoft™ Office or Adobe™ Audition and Premiere) offer extensive undo and automatic backup functions. Consequently, we are used to rather carelessly alter files. However, a similar undo safety net does currently not exist in CAQDAS. Therefore it is highly advisable to manually make frequent backup copies of project files.

CAQDAS cannot swallow all file formats

Even though some CAQDAS handle some select multimedia data formats, textual data are usually only accepted by CAQDAS, if they are stored in plain text and/or rich text format. If you are able to determine the data format yourself, for instance in the case of interview transcripts, you should tailor your data format to the CAQDAS you are using. If you harvest data from the internet or, if you used other already prepared data, you should keep an appropriate file format converter program. Either way, once you have imported your data into the CAQDAS database, it is highly advisable to no longer edit the data in their native editors, except in those cases, where multi media data are merely linked to in the CAQDAS.

CAQDAS cannot exchange data

For those working on projects, it should be mentioned that CAQDAS data cannot be shared across platforms. While it is fairly easy to import SPSS data into SAS and vice versa, you usually cannot swap data (that is, your coding tree) across different CAQDAS.

We now have a rough idea, what CAQDAS can and cannot do, which leads to the more important question of their usability.

When use CAQDAS?

Would the use of CAQDAS improve your analysis? The answer to this question depends on the methodology you intend to use. Sometimes, it appears as if "CAQDAS analysis" would constitute a method in itself; it does not. In fact, despite their name, CAQDAS programs are not necessarily suitable for all types of qualitative analyses, nor are they necessarily unsuitable for all types of quantitative analyses.

Even though CAQDAS are sometimes advertised as catch-all software for all sorts of qualitative analyses, CAQDAS do have an in-built bias towards one or the other methodological approach. In particular, the market leaders -- QSR with its N6 and NVivo product ranges and ATLAS.ti -- have been developed on the background of Grounded Theory. Even though the connection between CAQDAS and the various strands of Grounded Theory is certainly not a deterministic one, and even though a number of users might follow different approaches (Lee & Fielding 1996: 3.3), an analysis of citations which refer to CAQDAS clearly shows that Grounded Theory is the dominant approach among CAQDAS users (MacMillan and Koenig 2004: 182f).

Less well-known CAQDAS also contain bias. MAXqda and Kwalitan clearly were developed with a Weberian ideal type analysis in mind, THE ETHNOGRAPH is rooted in ethnography, Qualrus has the most customizable automatic coding features, and QDA Miner has been developed keeping the integration of qualitative and quantitative methodologies particularly in mind.

Methodologies that are Well Supported by CAQDAS

Unsurprisingly, CAQDAS are particularly useful for those methodological approaches, which its developers had in mind, when designing their program, that is:

The most unique feature of CAQDAS is certainly their ability to effectively code and annotate a wide variety of data, in particular textual data. In contrast to standard quantitative content analysis, CAQDAS-assisted coding is not solely based on mechanically performed string searches, but it frequently also involves the interpretation of the analyst. Interactive coding, which is at the heart of both Qualitative Content Analysis and Grounded Theory, entails two virtues, which can easily mutate into vices, though:

We should also not forget that coding is at bottom a strategy that inherently lends itself to quantitative analysis. No matter sophisticated codes are, they slice data into standardized categories, which then can be analyzed with statistical tools.

In contrast, much of ethnography is not as rigid in the pursuit of the coding of data. Instead, it often requires an effective search through different types of data and is strongly facilitated through the annotation of data.

Different CAQDAS fare differently in the coding and searching departments, so be sure to check, which CAQDAS allow for the coding and searching functions you require for your work. Overall, however, any CAQDAS would facilitate data analysis, if you choose to pursue any of the three methodologies listed above.

Methodologies that can be Enhanced by CAQDAS

Because of their relative flexibility, CAQDAS can also be adapted to a number of other methodologies, even if researchers working outside Grounded Theory traditions might still be required to "coax a recalcitrant software packege into aiding their preferred style of analysis" (Dohan & Sánchez-Jankowski: 484). Among the approaches that certainly could benefit from such adaptations are:

There are a number of papers that use coding practices similar to those used in Qualitative Content Analysis to identify frames in textual data. We have attempted to systematize the identification and validation of frames via keywords, a process that allows for the identification of frames in large corpora of textual data. Indeed, even visual data can to some extent be coded as frame markers.

With respect to Conversation Analysis, it has been suggested that those CAQDAS, which are able to code segments of video and/or audio files — currently, notably ATLAS.ti, HyperRESEARCH, and Qualrus —, would both open the opportunity to dispense of a lengthy transcription process and allow for a "bird's eye view of the data" through coding ( Note, however, that the freeware Transana might serve the same function, if one uses only audio and video data.

Methodologies with Limited Potential for CAQDAS Use

But there are also clear limitations of CAQDAS. For very different reasons, not all forms of

are strongly supported by CAQDAS.

Much of Discourse Analysis requires a "holistic" exegesis of data that does itself not lend itself to coding, which, effectively, is a first step towards the quantification of data. The structure of a text or an entire communication might, for example, tell you important things that cannot be revealed by any parts of that text. Conversely, there maybe instances, where a single word or phrase might be more "important" than the rest of the text (Kracauer 1952: 639), which is also not akin to coding. There are, of course, procedures in CAQDAS, which allow you to label entire documents and attach notes to important sections in you data, but these tasks can frequently easier and even better be performed by simple text processors or shareware file organizers such as InfoRapid Cardfile. The latter program also outperforms existing CAQDAS in a second task essential for discourse analysis, that is an efficient search through data, offering search capabilities, which are comparable to the most sophisticated searches available in CAQDAS, while at the same time allowing for searches in a wide variety of formats. CAQDAS, in contrast, usually only allow searches in ASCII and/or RTF format.

Visual Analysis is not particularly well supported by any of the existing CAQDAS, because none of the programs in question efficiently codes particular frame sequences in videos or parts of pictures, even though some programs do offer indigenous multi media support, while others allow the linking to external multi media data and still others have no facilities for such files at all. There are some ways around these deficiencies, but the freeware program Transana, which allows you to comfortably transcribe and annotate videos is equally or even better suitable for most visual analyses purposes than any of the CAQDAS.

(International) Comparative Research, which focuses on institutional and social structural variables, would primarily require the sorting of data, which frequently can be done more swiftly in regular database or even spreadsheet programs, as well as organizers such as Cardfile. Further analysis of these data may be conducted using fs/QCA, which was specifically programmed for the type of fuzzy set Boolean analysis Ragin (2000) has pioneered in the social sciences.

As its name suggests, CAQDAS are not meant for quantitative content analysis, even though for mixed methods studies, most CAQDAS do offer very comfortable export utilities for statistical packages such as SPSS, SAS, or Stata. Indeed, the quantification of qualitative content analysis is probably one of the most promising fields for the extension of the use of CAQDAS beyond Grounded Theory and ethnography. For traditional quantitative analysis, better suited programs are available, though.

Which CAQDAS should be used

To get the most mileage from CAQDAS, the selection of a particular CAQDAS should be based on the suitability of the CAQDAS for the methodology chosen, which in turn should depend on the theoretical framework. Many users tend to choose the program that is available at their department, and/or which has been used by a teacher or a colleague. While the logics of path dependency and institutional inertia seem to pervade academia (Krücken 2003), the benefits of using the most appropriate CAQDAS rather than the most readily available one are huge and largely outweigh the costs.

For starters, a specific CAQDAS might not offer the functionality that is required for a specific research technique or even a specific type of data. NVivo, for instance, cannot be used with more than a couple of hundred documents; Qualrus currently does not offer menu-driven automatic coding, and many of the older programs do not or only rudimentary support the inclusion of pictures, videos, or audio material.

The learning curve for all CAQDAS is steep. That is not necessarily because the interface of all CAQDAS is intuitive — NVivo uses a non-standard Windows interface, N6 features an archaic interface, which pastes command lines into a Window, Kwalitan and MAXqda have a Windows 3.11 feel, Qualrus uses its own language for automation processes, ATLAS.ti's tool button icons are often not very straightforward symbols, at times even single letters, and most CAQDAS do not handle their different Windows smoothly — but because basic CAQDAS functions are not very complex, as they rarely extend beyond searching, coding, and visualization; the basic functions of any CAQDAS can be learned in a relatively short time period.

While the advantage to be able to discuss problems with a particular program with a colleague might encourage the use of the most readily available programs, CAQDAS developers and their support teams usually reply to queries quite swiftly, and for most programs user fori, in which particulars of the programs can be discussed, exist.


Once a specific CAQDAS has been deemed a suitable tool for one's methodology, there still remain some common hazards in the use of CAQDAS.

The Lure of Coding

Probably the most frequent misguidance of research using CAQDAS is the temptation to substitute analysis with coding. It is very tempting to neatly attach codes to data, as it gives the impression of an analytical ordering of data, particularly, when combined with a well structured coding tree. To be sure, to some extent, coding may be a useful precursor for analysis. Indeed, coding already requires some theoretical work. But the inductive coding of data, particularly when done in vivo, that is, the coding of text onto itself, leads often only to summary descriptions rather than analysis. Also, recall that certain types of analyses do not lend themselves to coding. Therefore, it is essential only to code data, if that is advisable in one's methodological framework.

Somewhat related to this danger is the tendency to identify a too large number of codes, even if coding in itself is an advisable practice for a given methodology. The ganglia of large, cluttered coding trees more often than not do not facilitate but preclude effective analysis of data.

Inductive Theorizing

Another danger innate to the coding strategies many CAQDAS offer is the lure of inductive theorizing. With tools such as in vivo coding it becomes all too easy to simply arrange the data in a manner that it results into a theory.

Of course, empirical research will always lead to shifts and extensions of theory, when the initial theory does not represent the data in a meaningful way or even is contradicted (i.e., falsified) by the data.

But the notion that "[q]ualitative sociology relies heavily on the 'emergence' of themes, construction of categories 'out of' data, and linking those categories to form theories" (Singh & Richards 2003: 5) is far from being uncontested among sociologists in general and so-called "qualitative" sociolgists in particular. In fact, such pre-theoretical inductive theorizing has an elective affinity to positivism (in its initial meaning from Comte to Carnap), an intellectual movement most qualitatively inclined researchers were debunking from the 1940s to 1960s. As of late, positivism has received some unexpected allies in the form of those qualitative researchers, who advocate goodness criteria such as "authenticity," "voice," or "sacredness"1 in social research (Lincoln 2002).

This cannot be the place to explore the entire argument between inductionism and deductive theorizing, let alone settle it. Instead, it should suffice to say, that by no means the idea that "[q]ualitative research seeks to capture and discover meaning by immersion in data; measures are not created in advance but evolve from the data" (Brent & Slusarz 2003: 284) is an uncontested methodological assumption. Notably some forms of social constructionism and Critical Theory have little patience with inductive theorizing, as they consider it precisely the analysts' task to "break" with existing social categories (e.g., Bourdieu et al. [1973] 1991).2 Moreover, Habermas (1963: 166) exposes the inherently conservative bias in inductive theorizing. Theoretical work in these traditions is not aided but obstructed by in vivo coding and the idea that categories should be constructed "'out' of data."

CAQDAS and Causality

Those CAQDAS that offer network models of their codings offer another small bias: Even though links can be bi-directional, flowcharts encourage the development of linear-causal models, rather than more dynamic and/or reflexive models, or theories that posit merely elective affinities rather than causal relationships.

How to use CAQDAS

In the end, CAQDAS are useful for a wide variety of methodological approaches, although probably not quite as a wide one some CAQDAS enthusiasts claim they are. What is important for all analyses that utilize CAQDAS is an awareness, that CAQDAS are a tool rather than a method. In other words, to benefit from CAQDAS, it is essential to adapt CAQDAS use to one's methodology, rather than vice versa. Therefore, it is advisable to structure analyses that use CAQDAS as follows:

  1. Select a theoretical approach.
  2. Match the theoretical approach with the appropriate methodology and data.
  3. Investigate which, if any, CAQDAS might support the methodological approach best.

In the wake of ever improving software tolls, it has been suggested to turn this process on its head:

"The research community, in particular its methodologists, now need to study [N6/NVivo] tools, and like composers for the newly invented piano, study the new expressiveness thereby gained."(Richards 2002: 214)3

Even if CAQDAS would not contain a methodological bias, this appears to be a curious idea, as the techniques performed by CAQDAS are not exactly novel. Data search techniques have existed long before the computer age — for instance in the form of indexing systems stored in filing cabinets or on micro fiches. Nor are computerized text searches an invention of CAQDAS. As early as 1973 did the UNIX mainframe operating system feature GREP searches (Hauben & Hauben 1996: chapter 9), which (with respect to plain text files) surpass many modern CAQDAS in power. Neither is coding a novel strategy, nor are conceptual diagrams. To be sure, it is the accomplishment of CAQDAS to establish these techniques became widely available in qualitative social research. To all these techniques CAQDAS primarily contribute in terms of efficiency.

As it stands, however, CAQDAS are not methodology neutral. To start out with CAQDAS in the development of methodologies, would therefore let software products to influence the methodological and by implication theoretical development of a discipline. But the theoretical development of such methodologies should flow from the theoretical developments in the discipline, not vice versa. It then may very well turn out that CAQDAS can help to establish or facilitate new methodologies.


  1. Ironically, by introducing the notion of "sacredness" as a validity criterion into social research, these scholars in a way return to and even go beyond Durkheim, whom many regard as archetypical positivist. Unlike Durkheim, who regarded religious phenomena merely as the epitome of social facts, regardless of their moral value, Lincoln (2002) and her associates also scientifically (and implicitly morally) affirm their notion of sacredness.
  2. Luhmann (1990) elaborates the epistemological necessity for such proceeding in constructionist theories of communication.
  3. Similarly, Bazeley (2002: 242) declares that thanks to QSR CAQDAS "[t]he season is open for creative methodological thinking."


media methods  > research >  software evaluation > caqdas > a primer contact