Skip Nav

# Looking for other ways to read this?

## IN ADDITION TO READING ONLINE, THIS TITLE IS AVAILABLE IN THESE FORMATS:

❶In this case, records should be maintained of both catch landed and retained on board.

Nonmetric multidimensional scaling is a method that begins with data about the ordering established by subjective similarity or nearness between pairs of stimuli. The idea is to embed the stimuli into a metric space that is, a geometry. This method has been successfully applied to phenomena that, on other grounds, are known to be describable in terms of a specific geometric structure; such applications were used to validate the procedures.

Such validation was done, for example, with respect to the perception of colors, which are known to be describable in terms of a particular three-dimensional structure known as the Euclidean color coordinates. Similar applications have been made with Morse code symbols and spoken phonemes. The technique is now used in some biological and engineering applications, as well as in some of the social sciences, as a method of data exploration and simplification. One question of interest is how to develop an axiomatic basis for various geometries using as a primitive concept an observable such as the subject's ordering of the relative similarity of one pair of stimuli to another, which is the typical starting point of such scaling.

The general task is to discover properties of the qualitative data sufficient to ensure that a mapping into the geometric structure exists and, ideally, to discover an algorithm for finding it. Some work of this general type has been carried out: But the more general problem of un- derstanding the conditions under which the multidimensional scaling algo- rithms are suitable remains unsolved. In addition, work is needed on under- standing more general, non-Euclidean spatial models.

Ordered Factorial Systems One type of structure common throughout the sciences arises when an ordered dependent variable is affected by two or more ordered independent variables.

This is the situation to which regression and analysis-of-variance models are often applied; it is also the structure underlying the familiar physical identities, in which physical units are expressed as products of the powers of other units for example, energy has the unit of mass times the square of the unit of distance divided by the square of the unit of time.

There are many examples of these types of structures in the behavioral and social sciences. One example is the ordering of preference of commodity bun- dles collections of various amounts of commodities which may be revealed directly by expressions of preference or indirectly by choices among alternative sets of bundles. A related example is preferences among alternative courses of action that involve various outcomes with differing degrees of uncertainty; this is one of the more thoroughly investigated problems because of its potential importance in decision making.

A psychological example is the trade-off be- tween delay and amount of reward, yielding those combinations that are equally reinforcing. In a common, applied kind of problem, a subject is given descrip- tions of people in terms of several factors, for example, intelligence, creativity,. In all these cases and a myriad of others like them the question is whether the regularities of the data permit a numerical representation. Initially, three types of representations were studied quite fully: The first two representations underlie some psycholog- ical and economic investigations, as well as a considerable portion of physical measurement and modeling in classical statistics.

The third representation, averaging, has proved most useful in understanding preferences among un- certain outcomes and the amalgamation of verbally described traits, as well as some physical variables.

For each of these three cases adding, multiplying, and averaging re- searchers know what properties or axioms of order the data must satisfy for such a numerical representation to be appropriate. On the assumption that one or another of these representations exists, and using numerical ratings by sub- jects instead of ordering, a scaling technique called functional measurement referring to the function that describes how the dependent variable relates to the independent ones has been developed and applied in a number of domains.

What remains problematic is how to encompass at the ordinal level the fact that some random error intrudes into nearly all observations and then to show how that randomness is represented at the numerical level; this continues to be an unresolved and challenging research issue. During the past few years considerable progress has been made in under- standing certain representations inherently different from those just discussed. The work has involved three related thrusts. The first is a scheme of classifying structures according to how uniquely their representation is constrained.

The three classical numerical representations are known as ordinal, interval, and ratio scale types. For systems with continuous numerical representations and of scale type at least as rich as the ratio one, it has been shown that only one additional type can exist.

A second thrust is to accept structural assumptions, like factorial ones, and to derive for each scale the possible functional relations among the independent variables. And the third thrust is to develop axioms for the properties of an order relation that leads to the possible representations. Much is now known about the possible nonadditive representations of both the muItifactor case and the one where stimuli can be combined, such as. Closely related to this classification of structures is the question: What state- ments, formulated in terms of the measures arising in such representations, can be viewed as meaningful in the sense of corresponding to something em- pirical?

Statements here refer to any scientific assertions, including statistical ones, formulated in terms of the measures of the variables and logical and mathematical connectives.

In particular, statements that remain invariant under certain symmetries of structure have played an important role in classical geometry, dimensional analysis in physics, and in relating measurement and statistical models applied to the same phenomenon.

In addition, these ideas have been used to construct models in more formally developed areas of the behavioral and social sciences, such as psychophysics.

Current research has emphasized the communality of these historically independent developments and is at- tempting both to uncover systematic, philosophically sound arguments as to why invariance under symmetries is as important as it appears to be and to understand what to do when structures lack symmetry, as, for example, when variables have an inherent upper bound. Clustering Many subjects do not seem to be correctly represented in terms of distances in continuous geometric space.

Rather, in some cases, such as the relations among meanings of words which is of great interest in the study of memory representations a description in terms of tree-like, hierarchial structures ap- pears to be more illuminating. This kind of description appears appropriate both because of the categorical nature of the judgments and the hierarchial, rather than trade-off, nature of the structure.

Individual items are represented as the terminal nodes of the tree, and groupings by different degrees of similarity are shown as intermediate nodes, with the more general groupings occurring nearer the root of the tree. Clustering techniques, requiring considerable com- putational power, have been and are being developed.

Some successful appli- cations exist, but much more refinement is anticipated. Network Models Several other lines of advanced modeling have progressed in recent years, opening new possibilities for empirical specification and testing of a variety of theories.

In social network data, relationships among units, rather than the units themselves, are the primary objects of study: Special models for social network data have been developed in the past decade, and they give, among other things, precise new measures of the strengths of relational ties among units.

A major challenge in social network data at present is to handle the statistical depend- ence that arises when the units sampled are related in complex ways. Some issues of inference and analysis have been dis-. This section discusses some more general issues of statistical inference and advances in several current approaches to them.

Causal Inference Behavioral and social scientists use statistical methods primarily to infer the effects of treatments, interventions, or policy factors. Previous chapters in- cluded many instances of causal knowledge gained this way. As noted above, the large experimental study of alternative health care financing discussed in Chapter 2 relied heavily on statistical principles and techniques, including randomization, in the design of the experiment and the analysis of the resulting data.

Sophisticated designs were necessary in order to answer a variety of questions in a single large study without confusing the effects of one program difference such as prepayment or fee for service with the effects of another such as different levels of deductible costs , or with effects of unobserved variables such as genetic differences. Statistical techniques were also used to ascertain which results applied across the whole enrolled population and which were confined to certain subgroups such as individuals with high blood pres- sure and to translate utilization rates across different programs and types of patients into comparable overall dollar costs and health outcomes for alternative financing options.

A classical experiment, with systematic but randomly assigned variation of the variables of interest or some reasonable approach to this , is usually con- sidered the most rigorous basis from which to draw such inferences. But ran- dom samples or randomized experimental manipulations are not always fea- sible or ethically acceptable. Then, causal inferences must be drawn from observational studies, which, however well designed, are less able to ensure that the observed or inferred relationships among variables provide clear evidence on the underlying mechanisms of cause and effect.

Certain recurrent challenges have been identified in studying causal infer- ence. One challenge arises from the selection of background variables to be measured, such as the sex, nativity, or parental religion of individuals in a comparative study of how education affects occupational success. The adequacy of classical methods of matching groups in background variables and adjusting for covariates needs further investigation.

Statistical adjustment of biases linked to measured background variables is possible, but it can become complicated. Current work in adjustment for selectivity bias is aimed at weakening implau- sible assumptions, such as normality, when carrying out these adjustments.

Even after adjustment has been made for the measured background variables, other, unmeasured variables are almost always still affecting the results such as family transfers of wealth or reading habits. Analyses of how the conclusions might change if such unmeasured variables could be taken into account is. The third important issue arises from the necessity for distinguishing among competing hypotheses when the explanatory variables are measured with dif- ferent degrees of precision.

Both the estimated size and significance of an effect are diminished when it has large measurement error, and the coefficients of other correlated variables are affected even when the other variables are meas- ured perfectly. Similar results arise from conceptual errors, when one measures only proxies for a theoretical construct such as years of education to represent amount of learning.

In some cases, there are procedures for simultaneously or iteratively estimating both the precision of complex measures and their effect. On a particu tar criterion. Although complex models are often necessary to infer causes, once their output is available, it should be translated into understandable displays for evaluation Results that depend on the accuracy of a multivariate model and the associated software need to be subjected to appropriate checks, including the evaluation of graphical displays, group comparisons, and other analyses.

New Statistical Techniques Internal Resampling One of the great contributions of twentieth-century statistics was to dem- onstrate how a properly drawn sample of sufficient size, even if it is only a tiny fraction of the population of interest, can yield very good estimates of most population characteristics. When enough is known at the outset about the characteristic in question for example, that its distribution is roughly nor- mal inference from the sample data to the population as a whole is straight- forward, and one can easily compute measures of the certainty of inference, a common example being the 9S percent confidence interval around an estimate.

But population shapes are sometimes unknown or uncertain, and so inference procedures cannot be so simple. Furthermore, more often than not, it is difficult to assess even the degree of uncertainty associated with complex data and with the statistics needed to unravel complex social and behavioral phenomena. Internal resampling methods attempt to assess this uncertainty by generating a number of simulated data sets similar to the one actually observed. The definition of similar is crucial, and many methods that exploit different types of similarity have been devised.

These methods provide researchers the freedom to choose scientifically appropriate procedures and to replace procedures that are valid under assumed distributional shapes with ones that are not so re- stricted. Flexible and imaginative computer simulation is.

For a simple random sample, the "bootstrap" method repeatedly resamples the obtained data with replacement to generate a distribution of. The distribution of any estimator can thereby be simulated and measures of the certainty of inference be derived. The "jackknife" method repeatedly omits a fraction of the data and in this way generates a distribution of possible data sets that can also be used to estimate variability.

These methods can also be used to remove or reduce bias. For example, the ratio-estimator, a statistic that is commonly used in analyzing sample surveys and censuses, is known to be biased, and the jackknife method can usually remedy this defect. The methods have been extended to other situations and types of analysis, such as multiple regression.

There are indications that under relatively general conditions, these methods, and others related to them, allow more accurate estimates of the uncertainty of inferences than do the traditional ones that are based on assumed usually, normal distributions when that distributional assumption is unwarranted.

For complex samples, such internal resampling or subsampling facilitates estimat- ing the sampling variances of complex statistics. An older and simpler, but equally important, idea is to use one independent subsample in searching the data to develop a model and at least one separate subsample for estimating and testing a selected model. Otherwise, it is next to impossible to make allowances for the excessively close fitting of the model that occurs as a result of the creative search for the exact characteristics of the sample data characteristics that are to some degree random and will not predict well to other samples.

Robust Techniques Many technical assumptions underlie the analysis of data. Some, like the assumption that each item in a sample is drawn independently of other items, can be weakened when the data are sufficiently structured to admit simple alternative models, such as serial correlation.

Usually, these models require that a few parameters be estimated. Assumptions about shapes of distributions, normality being the most common, have proved to be particularly important, and considerable progress has been made in dealing with the consequences of different assumptions.

More recently, robust techniques have been designed that permit sharp, valid discriminations among possible values of parameters of central tendency for a wide variety of alternative distributions by reducing the weight given to oc- casional extreme deviations. It turns out that by giving up, say, 10 percent of the discrimination that could be provided under the rather unrealistic as- sumption of normality, one can greatly improve performance in more realistic situations, especially when unusually large deviations are relatively common.

These valuable modifications of classical statistical techniques have been extended to multiple regression, in which procedures of iterative reweighting can now offer relatively good performance for a variety of underlying distri- butional shapes.

They should be extended to more general schemes of analysis. Many Interrelated Parameters In trying to give a more accurate representation of the real world than is possible with simple models, researchers sometimes use models with many parameters, all of which must be estimated from the data. Classical principles of estimation, such as straightforward maximum-likelihood, do not yield re- liable estimates unless either the number of observations is much larger than the number of parameters to be estimated or special designs are used in con- junction with strong assumptions.

Bayesian methods do not draw a distinction between fixed and random parameters, and so may be especially appropriate for such problems. A variety of statistical methods have recently been developed that can be interpreted as treating many of the parameters as or similar to random quan- tities, even if they are regarded as representing fixed quantities to be estimated. Theory and practice demonstrate that such methods can improve the simpler fixed-parameter methods from which they evolved, especially when the num- ber of observations is not large relative to the number of parameters.

Successful applications include college and graduate school admissions, where quality of previous school is treated as a random parameter when the data are insufficient to separately estimate it well.

Efforts to create appropriate models using this general approach for small-area estimation and underc. Missing Data In data analysis, serious problems can arise when certain kinds of quanti- tative or qualitative information is partially or wholly missing. Various ap- proaches to dealing with these problems have been or are being developed. One of the methods developed recently for dealing with certain aspects of missing data is called multiple imputation: It is currently being used to handle a major problem of incompatibility between the and previous Bureau of Census public-use tapes with respect to occupation codes.

The extension of these techniques to address such prob- lems as nonresponse to income questions in the Current Population Survey has been examined in exploratory applications with great promise.

Computing Computer Packages and Expert Systems The development of high-speed computing and data handling has funda- mentally changed statistical analysis. Methodologies for all kinds of situations. This computing ca- pability offers the hope that much data analyses will be more carefully and more effectively done than previously and that better strategies for data analysis will move from the practice of expert statisticians, some of whom may not have tried to articulate their own strategies, to both wide discussion and general use.

But powerful tools can be hazardous, as witnessed by occasional dire misuses of existing statistical packages. Until recently the only strategies available were to train more expert methodologists or to train substantive scientists in more methodology, but without the updating of their training it tends to become outmoded. Now there is the opportunity to capture in expert systems the current best methodological advice and practice. If that opportunity is ex- ploited, standard methodological training of social scientists will shift to em- phasizing strategies in using good expert systems - including understanding the nature and importance of the comments it provides rather than in how to patch together something on one's own.

With expert systems, almost all behavioral and social scientists should become able to conduct any of the more common styles of data analysis more effectively and with more confidence than all but the most expert do today. However, the difficulties in developing expert systems that work as hoped for should not be underestimated. Human experts cannot readily explicate all of the complex cognitive network that constitutes an important part of their knowledge.

Additional work is expected to overcome these limitations, but it is not clear how long it will take. There was relatively little systematic work on realistically rich strategies for the applied researcher to use when attacking real-world problems with their multiplicity of objectives and sources of evidence.

More recently, a species of quantitative detective work, called exploratory data anal- ysis, has received increasing attention. In this approach, the researcher seeks out possible quantitative relations that may be present in the data. The tech- niques are flexible and include an important component of graphic represen- tations. While current techniques have evolved for single responses in situa- tions of modest complexity, extensions to multiple responses and to single responses in more complex situations are now possible.

Graphic and tabular presentation is a research domain in active renaissance, stemming in part from suggestions for new kinds of graphics made possible by computer capabilities, for example, hanging histograms and easily assimi- lated representations of numerical vectors. Research on data presentation has. Another influence has been the rapidly increasing availability of powerful computational hardware and software, now available even on desktop computers.

These ideas and capabilities are leading to an increasing number of behavioral experiments with substantial statistical input. Nonetheless, criteria of good graphic and tabular practice are still too much matters of tradition and dogma, without adequate empirical evidence or theo- retical coherence. To broaden the respective research outlooks and vigorously develop such evidence and coherence, extended collaborations between statis- tical and mathematical specialists and other scientists are needed, a major objective being to understand better the visual and cognitive processes see Chapter 1 relevant to effective use of graphic or tabular approaches.

Combining Evidence Combining evidence from separate sources is a recurrent scientific task, and formal statistical methods for doing so go back 30 years or more.

These methods include the theory and practice of combining tests of individual hypotheses, sequential design and analysis of experiments, comparisons of laboratories, and Bayesian and likelihood paradigms. There is now growing interest in more ambitious analytical syntheses, which are often called meta-analyses.

One stimulus has been the appearance of syntheses explicitly combining all existing investigations in particular fields, such as prison parole policy, classroom size in primary schools, cooperative studies of ther- apeutic treatments for coronary heart disease, early childhood education in- terventions, and weather modification experiments.

In such fields, a serious approach to even the simplest question how to put together separate esti- mates of effect size from separate investigations leads quickly to difficult and interesting issues. One issue involves the lack of independence among the available studies, due, for example, to the effect of influential teachers on the research projects of their students. Another issue is selection bias, because only some of the studies carried out, usually those with "significant" findings, are available and because the literature search may not find out all relevant studies that are available.

In addition, experts agree, although informally, that the quality of studies from different laboratories and facilities differ appreciably and that such information probably should be taken into account. Inevitably, the studies to be included used different designs and concepts and controlled or measured different variables, making it difficult to know how to combine them. Rich, informal syntheses, allowing for individual appraisal, may be better than catch-all formal modeling, but the literature on formal meta-analytic models.

As throughout the report, they constitute illus- trative examples of what the committee believes to be important areas of re- search in the coming decade.

Methodological studies, including early computer implementations, have for the most part been carried out by individual investigators with small teams of colleagues or students. Occasionally, such research has been associated with quite large substantive projects, and some of the current developments of computer packages, graphics, and expert systems clearly require large, orga- nized efforts, which often lie at the boundary between grant-supported work and commercial development.

As such research is often a key to understanding complex bodies of behavioral and social sciences data, it is vital to the health of these sciences that research support continue on methods relevant to prob- lems of modeling, statistical analysis, representation, and related aspects of behavioral and social sciences data.

Researchers and funding agencies should also be especially sympathetic to the inclusion of such basic methodological work in large experimental and longitudinal studies.

Additional funding for work in this area, both in terms of individual research grants on methodological issues and in terms of augmentation of large projects to include additional methodological aspects, should be provided largely in the form of investigator- initiated project grants.

Ethnographic and comparative studies also typically rely on project grants to individuals and small groups of investigators. While this type of support should continue, provision should also be made to facilitate the execution of studies using these methods by research teams and to provide appropriate methodological training through the mechanisms outlined below.

Many of the new methods and models described in the chapter, if and when adopted to any large extent, will demand substantially greater amounts of research devoted to appropriate analysis and computer implementation. And even when generally available methods such as maximum-likelihood are applicable, model application still requires skillful development in particular contexts. Many of the familiar general methods that are applied in the statistical analysis of data are known to provide good ap- proximations when sample sizes are sufficiently large, but their accuracy varies with the specific model and data used.

To estimate the accuracy requires ex- tensive numerical exploration. Investigating the sensitivity of results to the assumptions of the models is important and requires still more creative, thoughtful research.

It takes substantial efforts of these kinds to bring any new model on line, and the need becomes increasingly important and difficult as statistical models move toward greater realism, usefulness, complexity, and availability in computer form.

More complexity in turn will increase the demand for com- putational power. Although most of this demand can be satisfied by increas- ingly powerful desktop computers, some access to mainframe and even su- percomputers will be needed in selected cases.

Interaction and cooperation between the developers and the users of statis- tical and mathematical methods need continual stimulation both ways. Ef- forts should be made to teach new methods to a wider variety of potential users than is now the case. Several ways appear effective for methodologists to com- municate to empirical scientists: Methodologists, in turn, need to become more familiar with the problems actually faced by empirical scientists in the laboratory and especially in the field.

Several ways appear useful for communication in this direction: In addition, research on and development of sta- tistical packages and expert systems should be encouraged to involve the mul- tidisciplinary collaboration of experts with experience in statistical, computer,. As a rule they are updated frequently so that they offer timely discussions of methodological trends. Most of them are introductory in nature, written for student researchers.

Because of the influence of psychology and other social sciences on the development of data collection in educational research, representative works of psychology Trochim and of general social sciences Robson are included. Available online, Trochim is a reader-friendly introduction that provides succinct explanations of most quantitative and qualitative approaches.

Olsen is helpful in showing how data collection techniques used in other disciplines have implications for educational studies. Specific to education, Gall, et al. Johnson and Christensen offers a more balanced treatment meant for novice researchers and educational research consumers. Finally, Arthur, et al. Research methods and methodologies in education.

A diverse edited text discussing trends in study designs, data collection, and data analysis. It includes twelve chapters devoted to different forms of data collection, written by authors who have recently published extensively on the topic. Annotated bibliographies found at the end of each chapter provide guidance for further reading. Research methods in education.

This long-running, bestselling, comprehensive source offers practical advice with clear theoretical foundations. The newest edition has undergone significant revision. Specific to data collection, revisions include new chapters devoted to data collection via the Internet and visual media. Slides highlighting main points are available on a supplementary website. The SAGE handbook of online research methods. This extensive handbook presents chapters on Internet research design and data collection written by leading scholars in the field.

It discusses using the Internet as an archival resource and a research tool, focusing on the most recent trends in multidisciplinary Internet research. A long-standing, well-respected, nuts-and-bolts perspective on data collection meant to prepare students for conducting original research. Although it tends to emphasize quantitative research methodologies, it has a uniquely rich chapter on historical document analysis. Johnson, Burke, and Larry Christensen. Telephone interviews are less time consuming and less expensive and the researcher has ready access to anyone on the planet who hasa telephone.

Disadvantages are that the response rate is not as high as the face-to- face interview but cosiderably higher than the mailed questionnaire. The sample may be biased to the extent that people without phones are part of the population about whom the researcher wants to draw inferences. This method saves time involved in processing the data, as well as saving the interviewer from carrying around hundreds of questionnaires.

However, this type of data collection method can be expensive to set up and requires that interviewers have computer and typing skills. Paper-pencil-questionnaires can be sent to a large number of people and saves the researcher time and money. People are more truthful while responding to the questionnaires regarding controversial issues in particular due to the fact that their responses are anonymous.

But they also have drawbacks. Majority of the people who receive questionnaires don't return them and those who do might not be representative of the originally selected sample. A new and inevitably growing methodology is the use of Internet based research. This would mean receiving an e-mail on which you would click on an address that would take you to a secure web-site to fill in a questionnaire.

This type of research is often quicker and less detailed. Some disadvantages of this method include the exclusion of people who do not have a computer or are unable to access a computer. Also the validity of such surveys are in question as people might be in a hurry to complete it and so might not give accurate responses.

Questionnaires often make use of Checklist and rating scales. These devices help simplify and quantify people's behaviors and attitudes. A checklist is a list of behaviors,characteristics,or other entities that te researcher is looking for. Either the researcher or survey participant simply checks whether each item on the list is observed, present or true or vice versa.

## Main Topics

Data Collection is an important aspect of any type of research study. Inaccurate data collection can impact the results of a study and ultimately lead to invalid results. Data collection methods for impact evaluation vary along a continuum.

### Privacy FAQs

DATA COLLECTION Research methodology A brief and succinct account on what the techniques for collecting data are, how to apply them, where to Magister “Civilisation: find data of any type, and the way to keep records for language and Cultural an optimal management of cost, time and effort. Studies.