Preparing your data for use in Distance

Introduction

One of the goals of the workshop is to enable people who have already  collected distance sampling data to do some preliminary analysis of their data  using the computer program Distance. This page explains how to get your data  into a format that Distance can easily read. If you can bring your data to the  workshop in this format then we will be able to import it quicker, and so you  will have more time to play with the analysis.

If you are already using Distance 6 (the current versions) then feel  free to bring along your data and analyses as a Distance project file.

If you have not got any data to bring along then don’t worry – we will be  providing plenty of informative exercises to keep you busy during the computer  sessions! Also, there may well be someone with similar interests to you who has  brought some data along, so you could discuss the analysis together.

Getting Data into Distance

In Distance, all of your data and analyses are kept in a “project  file”. You can get data into a project file either by entering it from the keyboard or by importing it from a text file. If you only have relatively few  observations then it may be easiest to re-enter the data from the keyboard.  However, most people will already have their data on computer, for example in a  spreadsheet or database file, and in this case it is easiest to turn it into a  text file and import this file into Distance.

To import a data file into Distance, it must be in “flat file  format” – i.e., arranged in rows and columns with one row for each  observation. The actual number of columns depends on the type of survey (see  later). Here’s an example of part of a data file, from a line transect survey  with two strata:

Stratum 1;100;Line 1;10;14

Stratum 1;100;Line 1;10;8

Stratum 1;100;Line 1;10;22

Stratum 1;100;Line 2;10.3;7

Stratum 1;100;Line 2;10.3;37

Stratum 1;100;Line 2;10.3;13

Stratum 2;123;Line 1;5.7;

Stratum 2;123;Line 2;8.4;27

Stratum 2;123;Line 2;8.4;76

Stratum 2;123;Line 2;8.4;44

Stratum 2;123;Line 2;8.4;7

In this file, the columns are separated by semicolons. Column 1 is the  stratum name, column 2 is the stratum area, column 3 is the transect name,  column 4 is the transect length, and column 5 is the perpendicular distance.  Notice that all transects from the same stratum are grouped together on  adjacent lines, and all observations from the same transect are grouped  together. Notice also that the record “Line 1” in “Stratum  2” has no distance in the final column – this is a transect where no  objects were seen.

There is a narrated video that describes the sequences of instructions you provide to Distance to bring these data into Distance for analysis.  The video is 8 minutes in length and provides you with requisite information to import simple data structures into Distance.

Which columns should you include in your data file? As a minimum, your file  should contain a column for transect or point name and a column for observed  distance. For line transect surveys you will also need a column for transect  length. If your survey involved stratification then you will need to include  columns for stratum name and stratum area. If you measured radial distance and  angle then you will need a column for angle, and if your objects are clusters,  rather than individuals, then you should include a column for cluster size. So,  you will end up with somewhere between 2 and 7 columns, depending on the type  of survey.

The columns should be separated by a delimiter (ASCII character), which can  be either a tab, semicolon, comma or space. The order of the columns is not  important, as you tell Distance which column is which during the import  process. Each row should finish in a Carriage-return + Line-feed combination.  This is the default end-of-line indicator used by most windows-based  applications, so you usually don’t have to worry about this.

Data collected in intervals (bins)

In some distance sampling surveys, the exact distances to the observations  are not recorded. Instead, observations are placed in pre-defined intervals, or  bins. For example, in a point transect survey of songbirds one could define  intervals of 0-50 metres, 50-100, and 100-200. To enter this type of data into  Distance, enter each observation at the mid-point of the interval. So, if  there were 2 birds seen in 0-50, 3 at 50-100 and 3 at 100-200 on point 1, then  the data file would look like this:

Point1,25

Point1,25

Point1,75

Point1,75

Point1,75

Point1,150

Point1,150

Point1,150

(In this example we are pretending that there are no strata, so there is no  stratum name or stratum area column. Also, because it is a point transect  example, there is no transect length column. We are using a comma as  delimiter.)

Additional information

There are some other features in Distance that I haven’t mentioned here. For  example, Distance is capable of importing additional columns of data that you  may wish to use in your analysis (such as year of survey in multi-year  surveys). This will be covered at the workshop. To keep maximum flexibility, it  is best to bring your data along as a text file, as outlined above, but also in  its original spreadsheet or database format in case you decide to take on a  more complex analysis in a later part of the workshop. The computers that we  will be using will have Microsoft Excel loaded, so if you can bring your data  in an Excel-compatible format, all the better. From within Excel, it is easy to  arrange, sort and filter columns, and then export them into a text file that  Distance can read. Alternatively, if you are bringing along your own computer  then you can use your favourite package to do the required re-formatting.

Conclusion

I hope that this page has provided you with enough information to get at  least a subset of your data into a format that Distance can easily read in. If  you are still confused about what to do, don’t worry – we will be on hand to  help when you get to the workshop. Just bring some of your data along in some  kind of electronic format and we should be able to get at least some of it into  Distance relatively speedily!