Chapter Four Table of Contents
o
Information
Collection and Evidential Datasets
§
Cultural Resources
and Inventories
·
FIGURES
·
TABLES
Predictive cultural resource models are “a simplified set of testable hypotheses, based either on behavioral assumptions or on empirical correlations, which at a minimum attempts to predict the loci of past human activities resulting in the deposition of artifacts or alteration of the landscape” (Kohler 1988:33). Based upon their accumulated experience, most archaeologists could, on cursory review of a topographic map, accurately predict with 50% to 80% accuracy where archaeological sites would most likely occur. Predictive capacity alone, however, fails to meet the explanatory capacity of rigorous scientific inquiry. Sites that fall outside of the predictive pattern are often of greater interest to archaeologists. To better understand and evaluate outliers, one must first have a quantitative means to evaluate those sites that fall within a “normal” distribution (Heidelberg (2001:6).
Determining which environmental and cultural variables and
how those variables would be analyzed was a major consideration for the
development of the planning model. Initial test runs with a limited data set
from northeastern Nevada utilized chi-square analysis to determine the
distributional relationship of sites to distinct environmental zones. The
process required extensive manipulation of tabular and grid data sets then
subsequent overlay of predictive themes to produce a generalized sensitivity
map. Updates and model testing utilizing this method would require continued
technical expertise, thus reducing the overall utility of the model as a
planning tool. A more economic approach to modeling was sought, in which new
data could be easily input and new models generated in response to additional
information.
A weights-of-evidence software package, Spatial Data
Modeler (Kemp et al. 1999), was recently developed to run with the ArcView®
Spatial Analyst extension (ESRI, Redlands, CA). It integrates a number
of not-dissimilar steps used in the initial chi-square analysis and showed
promise as a user-friendly, programmatic approach to developing a predictive
model. To test its reliability and to better understand the modeling program,
we contrasted the weights-of evidence results with a cell based chi-square
analysis.
Spatial Data Modeler (SDM) is an ArcView®
extension developed by the United States Geological Survey for mineral
exploration purposes. SDM has several options for exploring data including:
weights of evidence (WofE), logistical regression, fuzzy logic, and
neural networks. Weights of evidence is particularly useful in predicting
mineral deposits based upon the location of known resources and archaeologists
have successfully applied the application to predict the probability of site
locations.
Weights-of-evidence is a discrete multivariate method originally developed in a nonspatial context for combining a number of medical symptoms to predict disease (Bonham-Carter 1994; Xu et al. 1992). “ In this situation, the response variable (presence/absence of disease) is binary and the predictor variables are also of the presence/absence type.” (Bonham-Carter 1994:1). Assuming that the variables are not dependent, data sets are combined to give the posterior probability to each cell for each unique binary combination. Bonham-Carter (1998) explains this idea with the following example:
If one wished to predict the likelihood of rain for a given day in an area that receives an average of 80 days of rain a year, a sound estimate of the prior probability of rain would be the ratio 80/365. This initial measure of probability can then be modified using other pieces of information to determine the probability that it will rain in a particular month depending on the month, the location of the jet stream, or any other factors. The factors determining the probability of rain will vary with the time of year and can be figured into the equation to produce a model that will answer: “what is the probability that it will rain tomorrow?” (Bonham-Carter 1998:302-303).
Weights-of evidence methods were adapted for use in mineral
exploration by overlaying geologic and geochemical data sets to predict
locations of ore bodies (Bonham-Carter et al. 1988; Raines 1999), and as a
means to predict the location of fossil pack rat middens (Mensing et al. 2000).
Archaeologist apply this same method in a spatial sense by using archaeological
sites in an area as training points to create a probability map which aids in
the prediction of locations likely to contain sites in the area under study.
Results can be used for numerous purposes but, most recently, have been used by
Federal agencies to better manage public lands.
The Bayesian weights-of-evidence approach requires a set of training
points, in this case; archaeological sites, a set of evidential themes
or variables that are assumed to be predictive of training point location, and
a spatially defined study area. Training points are then compared with the
evidential themes to calculate a weight assessing the spatial
association between the points and each class within the theme. A
positive weight indicates the class is present; a negative weight if the class
is absent. The strength of a correlation is measured by its contrast (W+-W-).
Positive contrast values suggest that more training points occur within that
class than would be expected by chance. Negative contrasts indicate that fewer
training points within that class than would be expected by chance. The
contrast is divided by the standard deviation of the contrast values to provide
a normalized (Student) contrast for each class.
Positive contrast values are grouped to assess the relative
strength of the predictive pattern for each class (Table 4.1). Depending upon
contrast values, the user determines which classes are “inside” (predictive) or
“outside” (not predictive) within each evidential theme.
By determining high or low cutoff points, the user’s
decisions directly influence the model outcome. In addition, expert opinion can
be used to weight an individual class of data thought to be intrinsically more
important, or to discard contrasts that are artificially high as a result of
disproportionate unit area to training point values.
Prior to running the model, the program calculates a prior
probability assuming a random distribution of sites:
Prior Probability = Number of
training points
Total of study area units
Since the training points make up a very small sample of the entire study area, prior probability will likely be a number much smaller than the actual density of sites within the study area. After weights have been calculated and re-classified into a binary evidentiary theme, they are combined to create a response theme that calculates a posterior probability for all cells within each unique group of binary combinations. Posterior probabilities that are higher than the prior probability suggests a non-random distribution within that intersection of evidential themes.
Background data used to analyze cultural and landscape features for the planning model were acquired from a number of different sources. The challenge with both the cultural and landscape data sets was to locate evidential themes that could be applied or adapted to the larger study area. In some cases (e.g. geology), consistent data was available for one state, but missing from others. Scale was also considered, especially for layers like vegetation, where detailed regional coverages lacked comparability between analytic units.
Cultural resource layers compiled for the analysis were derived from a number of different sources and required varying degrees of manipulation in order to maximize their utility. Idaho and Utah have developed and maintained a geographic information system for cultural resources. Both states graciously supplied that information for the project area. Nevada is in the process of completing a similar conversion to an electronic archive. As different cultural data sets were received, data was merged into a consistent format. All GIS data sets were converted from their default projections to a uniform UTM Zone 11, NAD 1927 projection.
Cultural resource shapefiles and resource inventory
shapefiles were provided by the Utah State Historic Preservation Office.
Depending upon relative size of the feature, site and inventory locations are
displayed as point, line or polygon shapes. For analytical purposes, points and
lines were buffered to create synthetic polygons and then merged with the
appropriate (site or inventory) polygon layers to create single polygonal site
or inventory layers. Attributes for the Utah synthetic shapes included buffered
width, area, site or inventory number, confidence in plot location, and data
entry specifics. Using ArcView® utilities, a center point was
created for each site so that each entity could also be displayed as a single
point.

The Utah site database consisted of a Microsoft Access®
database containing Intermountain Antiquities Computer System (IMACS)
encoded fields. Site numbers in the IMACS database allowed the data to be
linked to the GIS site shapefiles.
The Idaho State Historic Preservation Office provided a Microsoft
Access® database containing UTM coordinates for
each site within the project area. Fields pertaining to a range of feature
types are present in the table structure, and descriptive artifact attributes
are annotated for each site. A separate table containing SHPO National Register
status was provided with the site data. Inventory databases with locational
information have not been compiled for Idaho.
Using the Idaho site UTM coordinates, a point theme was created for each site for use in the GIS. Attribute tables for the site points contained all tabular data presented in the Idaho database. As quarter section data in the inventory database was inconsistent, an attempt to determine inventory extent based upon legal descriptions proved futile. Composite legal descriptions often produced areas significantly larger than the reported inventory extent, making the data unreliable.
Nevada SHPO maintains site and inventory archives at the
Nevada State Museum for its northern counties, and at the University of Nevada,
Las Vegas, Harry Reed Center for Environmental Studies for southern counties.
Archival data is currently in the process of being converted to an electronic
database and GIS format. Site and inventory data for Elko County was previously
entered into the statewide GIS, and into a Microsoft Access®
database. The database contains fields and codes identical to the IMACS
site record. Spatial and database information for sites and inventories lying
within the White Pine county and Lincoln county portions of the study area were
compiled as part of this project.
Several steps were involved in data compilation for the study
area within Lincoln and White Pine counties. First, archival USGS maps (7.5 and
15 minute quadrangles) containing site and inventory locations were scanned at
the UNLV Harry Reid Center archive. Those quads were then geo-referenced to UTM
Zone 11; NAD 27 coordinates. Each site and inventory marked on the maps was
digitized. Any sites smaller than 2.5 acres in extent were digitized as point
features using GIS software; linear sites were digitized as lines; all other
sites were represented as polygons. Similar digitizing rules were applied to
inventoried areas. Site and inventory metadata consisting of map source, entry
dates and accuracy or error flags were appended to attribute tables for each
shape.
Site data from records predating IMACS (1982) proved to be
somewhat inconsistent. Likewise, early investigations are generally less
complete than more recent ones and the survey methods used at the time varied
considerably. To control for variability in survey method and site reporting,
assemblage and administrative site data were entered only for those sites
occurring within inventories with a cumulative extent greater than 640 acres.
Size criteria assured relatively uniform reconnaissance and reporting technique
and constrained site vs. non-site analysis of the landscape within consistent
parameters.
Sites were selected by intersecting inventory area with site location. Site records were assembled from archives at the BLM Ely Field Office, UNLV Harry Reid Center and the Nevada State Museum. Administrative and assemblage data were compiled in an Microsoft Access® database using the IMACS encoding format, then linked to the spatial data in the GIS attribute tables. Like the Utah data, shapefiles were transformed into a single polygon layer by buffering points and lines into a synthetic polygon shape, then merging those with the existing polygon shapefile for analytical purposes. Site centerpoints were also calculated for each feature for use if point analysis was required.
After site data from all three states were assembled, GIS shapefiles were merged into a single analytical theme and joined to respective site assemblage data. Since assemblage data was reported in slightly different format for each state, attribute fields were reformatted to indicate presence or absence of specific artifact types or general classes, feature types, and temporal affiliation. The resulting table produced comparable data attributes for all site records. It was used to identify historic and prehistoric site affinity and created a baseline for archaeological and anthropological site analysis. (Figure 4.1) depicts the distribution of site center points across the study area. Inventories greater than 640 acres are shown within the Nevada and Utah data set.
Landscape level analysis required the compilation of a number of environmental data sets or evidential themes that could be used with the site data to construct a probability model. Data sets compiled for the project area included slope, vegetation, landform, and hydrology. A roads layer was compiled for historic resource analysis. GIS layers pertaining to potential marsh habitat were also derived as a means to address research questions relating to prehistoric land use.
Slope was derived from the USGS National Elevational Data set (NED). The 30 meter NED was clipped to each analytical unit within the project area and slope was calculated for each cell, and then converted to a slope grid. For analytical purposes, slope was divided into five classes: 0-5 degrees, 5-15 degrees, 15-30 degrees, 30-45 degrees and greater than 45 degrees. The NED was also used to create shaded relief maps for use as background graphic in each of the analytic units.
Vegetation layers were derived from Fire Sciences Laboratory, Rocky Mountain Research Station, Potential Natural Vegetation Groups (Schmidt et al. 2002). This is coarse-scale data that were developed as part of a national level, fire-planning model. Vegetation data was refined to match terrain using a 500 meter Digital Elevation Model, 4th Code Hydrological Units and Ecological Sub-regions (Bailey’s Sections). Classifications follow Küchler (1975) descriptions for ECO Region 4 (Table 4.2).
In order to derive a general characterization of landform within each analytic unit, the NED data set was reclassified into three ranges of slope that roughly approximate flats, piedmont and mountainous areas. Flats comprise all slopes between 0 and 3%; piedmont lies between 4 and 10%; and mountains are all slopes above 10%. The resulting classes approximate elevational rings of valley bottom, alluvial fan and upland slopes for each analytic unit.
A hydrologic layer consisting of springs and streams was compiled for each of the analytic units. Source data was derived from USGS 1:100,000 Digital Line Graphs (DLG) clipped to the project area then buffered at intervals of 200, 400, 1000 and 2000 meters. Buffered shapes were then converted into grids for each analytic unit. Both intermittent and perennial stream classes are included in the data set, since present intermittent water courses may have been more productive prehistorically.
The extent of potential marsh habitat was derived from the U.S. Soil Conservation Service STATSGO State Soil Geographic database. The STATSGO database was designed for use as a regional, multi-state resource planning, management and monitoring tool. Soil data is derived from generalized information provided in the county-wide soils database and extrapolated to 1:250,000 scale USGS quadrangles. STATSGO data sets include fields relating to soil class, structure, texture, engineering capabilities, suitability for agriculture, and potential for various rangeland habitat types. STATSGO databases were queried for soils with the potential to sustain wetland plants and the potential to sustain wetland wildlife. The results were used as a proxy for potential wetlands. Those shapes were then buffered at 1000, 3000 and 5000 meter intervals for analytical purposes and then converted to grids.
The roads layer was extracted from U.S. Census Bureau 2000 Tiger/line files. Data was derived from a generalized 1:100,000 base layer. Line data was then buffered to 200, 400, and 1000 meter widths for analytic purposes.
Cultural resource inventory data sets allowed for
multiple approaches be used to construct a management model. Inventoried areas
provided a controlled setting where both site and non-site data can be
assessed. Within the site/non-site parameters chi-square analysis could also be
conducted to validate predictive patterns observed in the calculated weights
tables.
Weights tables were compiled in Spatial Data Modeler
using sites within inventoried areas as a training point theme and inventory
extent as a mask over all evidential themes. Unit area settings suggested by Spatial
Data Modeler vary according to analytic unit size. The suggested unit area
compensates for variation between study area cell size and output cell size of
the evidential themes (Suggested Value= (total Study Area / total Training
Points) / 40). Default settings for most of the analytic units ranged from 0.20
to 0.30 square kilometers (447.2 or 547.7 meter grid). To maintain consistency
within each analytic unit, the unit area was arbitrarily set to 500 meter cells
(0.25 square kilometers). Multiple training points within a cell greatly
inflate prior probabilities since probability is evaluated as a deviation from
the normal distribution of one training point per unit area. SDM will
automatically weed or remove any duplicate training points within a cell
so that there are no more than one training point (site) per unit area.
Once unit area and training point parameters are set, Spatial
Data Modeler calculates a weight table for each evidential theme. The
resulting contrasts (weight + - weight-) indicate the
relative strength of each predictive class.
To test the efficacy of the weights calculations and to aid
in the selection of predictive classes for creation of a final response theme,
a chi-square test was run with the inventoried site data set against the
evidential themes. To create a site/non-site matrix, the project area was
arbitrarily gridded into 250 meter square cells and a centerpoint was
calculated for each cell. Centerpoints were clipped to the analytic unit, then
again clipped so that only grid points within inventoried areas remained. Using
ArcView® Spatial Analyst, any grid point within 100
meters of a site polygon was selected and saved as a site training point. The
selected subset was switched, and all remaining grid points were saved as a
non-site theme (Figure
4.2).
With the Spatial Data Modeler area unit set to 250
meter cell size, weights were calculated using both site and non-site training
point themes. Resulting contrasts were compared with the previous run of
weeded, inventoried sites. Classes with the highest contrasts in both the 250
meter grid site and weeded site weights tables were chosen for validation using
the chi-square test. Evidential class with the highest contrast was tabulated
against site and non-site occurrences (Table 4.3). A chi-square above 3.84 was
considered significant at 1df. If chi-square testing confirmed the contrast as
predictive, that class was chosen as “inside” the pattern.
Response themes, using all sites (weeded) within each
analytic unit, and the predictive classes were then run for each of the
hydrographic units that contained inventory themes. The normalized posterior
probability was then reclassified to reflect high, moderate and low probability
of site occurrence. Summary tables of sites within each probability zone were
compared with the results from areas of previous inventory. The probability
model was considered accurate if highest site frequencies were associated with
areas of high and moderate probability. Since the Idaho data lacks spatial data
for inventories, the comparisons allowed us to assess the feasibility of using
site center points regardless of inventory status as valid training points for
pattern prediction.
Return to Top
|
|
|
|
|