Introduction
One of the goals of the workshop is to enable people who have already collected distance sampling data to do some preliminary analysis of their data using the computer program Distance. This page explains how to get your data into a format that Distance can easily read. If you can bring your data to the workshop in this format then we will be able to import it quicker, and so you will have more time to play with the analysis.
If you are already using Distance 6 (the current versions) then feel free to bring along your data and analyses as a Distance project file.
If you have not got any data to bring along then don’t worry – we will be providing plenty of informative exercises to keep you busy during the computer sessions! Also, there may well be someone with similar interests to you who has brought some data along, so you could discuss the analysis together.
Getting Data into Distance
In Distance, all of your data and analyses are kept in a “project file”. You can get data into a project file either by entering it from the keyboard or by importing it from a text file. If you only have relatively few observations then it may be easiest to re-enter the data from the keyboard. However, most people will already have their data on computer, for example in a spreadsheet or database file, and in this case it is easiest to turn it into a text file and import this file into Distance.
To import a data file into Distance, it must be in “flat file format” – i.e., arranged in rows and columns with one row for each observation. The actual number of columns depends on the type of survey (see later). Here’s an example of part of a data file, from a line transect survey with two strata:
Stratum 1;100;Line 1;10;14
Stratum 1;100;Line 1;10;8
Stratum 1;100;Line 1;10;22
Stratum 1;100;Line 2;10.3;7
Stratum 1;100;Line 2;10.3;37
Stratum 1;100;Line 2;10.3;13
Stratum 2;123;Line 1;5.7;
Stratum 2;123;Line 2;8.4;27
Stratum 2;123;Line 2;8.4;76
Stratum 2;123;Line 2;8.4;44
Stratum 2;123;Line 2;8.4;7
In this file, the columns are separated by semicolons. Column 1 is the stratum name, column 2 is the stratum area, column 3 is the transect name, column 4 is the transect length, and column 5 is the perpendicular distance. Notice that all transects from the same stratum are grouped together on adjacent lines, and all observations from the same transect are grouped together. Notice also that the record “Line 1” in “Stratum 2” has no distance in the final column – this is a transect where no objects were seen.
There is a narrated video that describes the sequences of instructions you provide to Distance to bring these data into Distance for analysis. The video is 8 minutes in length and provides you with requisite information to import simple data structures into Distance.
Which columns should you include in your data file? As a minimum, your file should contain a column for transect or point name and a column for observed distance. For line transect surveys you will also need a column for transect length. If your survey involved stratification then you will need to include columns for stratum name and stratum area. If you measured radial distance and angle then you will need a column for angle, and if your objects are clusters, rather than individuals, then you should include a column for cluster size. So, you will end up with somewhere between 2 and 7 columns, depending on the type of survey.
The columns should be separated by a delimiter (ASCII character), which can be either a tab, semicolon, comma or space. The order of the columns is not important, as you tell Distance which column is which during the import process. Each row should finish in a Carriage-return + Line-feed combination. This is the default end-of-line indicator used by most windows-based applications, so you usually don’t have to worry about this.
Data collected in intervals (bins)
In some distance sampling surveys, the exact distances to the observations are not recorded. Instead, observations are placed in pre-defined intervals, or bins. For example, in a point transect survey of songbirds one could define intervals of 0-50 metres, 50-100, and 100-200. To enter this type of data into Distance, enter each observation at the mid-point of the interval. So, if there were 2 birds seen in 0-50, 3 at 50-100 and 3 at 100-200 on point 1, then the data file would look like this:
Point1,25
Point1,25
Point1,75
Point1,75
Point1,75
Point1,150
Point1,150
Point1,150
…
(In this example we are pretending that there are no strata, so there is no stratum name or stratum area column. Also, because it is a point transect example, there is no transect length column. We are using a comma as delimiter.)
Additional information
There are some other features in Distance that I haven’t mentioned here. For example, Distance is capable of importing additional columns of data that you may wish to use in your analysis (such as year of survey in multi-year surveys). This will be covered at the workshop. To keep maximum flexibility, it is best to bring your data along as a text file, as outlined above, but also in its original spreadsheet or database format in case you decide to take on a more complex analysis in a later part of the workshop. The computers that we will be using will have Microsoft Excel loaded, so if you can bring your data in an Excel-compatible format, all the better. From within Excel, it is easy to arrange, sort and filter columns, and then export them into a text file that Distance can read. Alternatively, if you are bringing along your own computer then you can use your favourite package to do the required re-formatting.
Conclusion
I hope that this page has provided you with enough information to get at least a subset of your data into a format that Distance can easily read in. If you are still confused about what to do, don’t worry – we will be on hand to help when you get to the workshop. Just bring some of your data along in some kind of electronic format and we should be able to get at least some of it into Distance relatively speedily!