The Goal: expanding understanding of biodiversity through sound
Many of the most biodiverse ecosystems left on the planet are poorly studied, highly threatened, and in need of information that can be helpful in moving toward better protection and management. As part of the Elephant Listening Project’s efforts to better understand the ecology of forest elephant movements, and the spatial and temporal pattern of poaching, we have recorded the rainforest’s sounds 24/7 at many places in Central Africa. But we only extract a tiny bit of the rich information that is contained in these continuous recordings, now representing more than 1 million hours of the sounds of birds, primates, insects, frogs, you name it. If it makes sound, we record it.
In order to make this rich acoustic data available to anyone who would like to mine it, we have partnered with Amazon Web Services (AWS) to make sounds available in the cloud, using their object storage service, S3. While the raw sound files are available for download, AWS also provides access to flexible cloud computing resources in the form of EC2 instances. Especially after exploring a subset of the sound files we are making available, users may want to consider analyzing large portions of these data using an AWS EC2 instance because data transfer will be extremely fast between S3 and EC2 compared to downloading the data to a local machine. An additional benefit of using EC2 for analyses is access to instances with more powerful computing resources than a desktop or laptop. Depending on the type of sound analysis desired, working with the Nouabalé-Ndoki dataset can be extremely memory intensive, for example, running detectors or quantifying noise profiles. Just one possibility, for example, would be to use RStudio, with associated acoustic analysis packages (see Louis Aslett’s detailed guide for setting up RStudio on an AWS EC2 instance).
This is a very simple dataset, comprised almost entirely of 24-hr, 8kHz, 16-bit audio recordings in .wav format recorded on an acoustic grid in northern Republic of Congo (Figure 1). Collection of these data was made possible through collaboration with the Wildlife Conservation Society and the Nouabalé-Ndoki Foundation, who together administer activities in the national park and surrounds. Funding for this project includes the U.S. Fish and Wildlife Service, the Born Free Foundation, and Lisa Yang.
The landscape is lowland tropical rainforest comprised of a number of forest types including mixed semi-evergreen forest, monodominant Gilbertiodendron forest, and swamp forest. Fifty acoustic units were placed using a stratified random sampling method on a grid of 25 km2 cells.
Figure 1. Location of recording grid in Africa (upper left) and location of 50 recording sites in the Nouabalé-Ndoki National Park and adjacent forestry concession in northern Congo (right). ‘Mbeli’ and ‘Goualougo’ are research camps in the national park with permanent staff. ‘Camp Bonio’ is an old hunting camp in the forestry concession (slightly shaded area south of the river near Mbeli camp and south of acoustic unit 4e). Slightly more than half of the recording sites are in the national park. Coverage Extent: Because of ongoing elephant poaching in this area, exact locations of recording units will be provided only on specific request from researchers. The rectangular extent of the entire grid is latitude 1.89020 to 2.32474, longitude 16.44315 to 16.74016 (WGS84).
These sound files are freely available for scientific study and exploration, including for the development of detection algorithms.
These sounds are not available for any media use including advertising, promotion, radio or movie production, etc. without express permission from the owners.
While this dataset is freely available to all users, the wide diversity of ways to interface with the relevant Amazon Web Services S3 bucket means that it is not intuitive to all users exactly how to get to the individual sound files.
Main registry page for the elephant dataset. This link shows the basic information about the dataset and some links to documentation and other information, but the information in the righthand column is not interpretable to many of us.
The easiest way to access the files is to create a free AWS account. There is no cost to accessing files in the public-access datasets, or to downloading files from these datasets (however using other AWS services might incur charges). This link will then take you to the parent directory of the public dataset (to the ‘Objects’ tab) where you will see a listing of directory entries, including ‘recordings/’. Clicking on this link and then the ‘wav/’ link takes you to the listing of each recording site, and within each of these, the actual sound files. (Note that each of these files is about 1.2GB in size, so take a fair amount of time to download and also that there are numerous pages of files within each site folder).
An alternative is to use the AWS Command Line Interface (CLI), a text-based tool commonly used to perform tasks on AWS, including accessing S3. This does not need an AWS account, provided you use the “–no-sign-request” option in the command line.
For example, to list the contents of all files in the ‘nn01a’ site folder, you would enter:
aws s3 ls s3://congo8khz-pnnn/recordings/wav/nn01a/ –no-sign-request
(Note that each site folder has hundreds of 24hr sound files to be listed).
To download a single object (sound file) once you know the exact name of the file:
aws s3 ls s3://congo8khz-pnnn/recordings/wav/nn01a/nn01a_20201209_000100.wav to <file path to download location>
Recorder Description and Placement
- all recordings were made using a SWIFT acoustic recorder, designed and built by the Center for Conservation Bioacoustics at Cornell University.
- the standard SWIFT electronics board and microphone were housed in a custom-designed ‘ruggedized’ housing to perform better in the Central African rainforest. The electronics and power supply were housed in a small Pelican© case. The waterproof microphone embedded into the wall of the pelican case and protected by a stainless steel screen to eliminate termite damage and destruction of the microphone by primates.
- stratified random sampling was used to place one recorder within each 25 sq km grid cell.
- recorders were placed 7-10m high, suspended beneath a tree limb.
- recordings were 8kHz, 16 bit, at a SWIFT gain setting of 47.5 dB.
- the frequency response of each unit was characterized using an Audio Precisions® APx-520 audio analyzer.
Archive structure and filenames
- the sound archive is organized by recording site, containing all available sound files recorded at that location.
- the recording site is prefixed onto each sound file name and some description is available in the table below.
- each sound file name includes the date (yyyymmdd) and the starting time of the recording (hhmmss).
- most sounds are ~24 hours long, but there are some oddballs.
- there are non-consistent gaps in days covered at each site:
Recorder maintenance occurred on approximate 3- or 4-month cycles and it took at least 25 days to visit all recorders for collecting data and refreshing the power supply. Variation in battery capacity sometimes resulted a recorder stopping recording before others, and occasional failures occurred that produced gaps in recording coverage. These gaps are not coincident across recorders.
- typically the file initiated after maintenance was not included in the archive because of human disturbance. Similarly, the last file was also typically omitted unless it was a full 24 hrs.
Geographic Extent: latitude 1.89020 to 2.32474, longitude 16.44315 to 16.74016 (WGS84)
|Site||Habitat (at recorder site)||Protection|
|nn01a||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn01c||mixed forest-closed||national park|
|nn01d||mixed forest-closed||national park|
|nn01e||mixed forest-closed||national park|
|nn01f||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn01g||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn02a||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn02b||mixed forest-closed||national park|
|nn02c||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn02d||mixed forest-closed||national park|
|nn02e||mixed forest-closed||national park|
|nn02f||mixed forest-closed||national park|
|nn02g||mixed forest-open||national park|
|nn03a||mixed forest-closed||national park|
|nn03b||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn03c||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn03d||mixed forest-closed||national park|
|nn03e||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn03f||swamp forest||national park|
|nn03g||mixed forest-closed||national park|
|nn04a||foret mixte||national park|
|nn04b||mixed forest-closed||national park|
|nn04c||mixed forest-closed||national park|
|nn04d||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||logging – active 2017-2018|
|nn04e||mixed forest-open||logging – active 2017-2018|
|nn04f||mixed forest-closed||logging – active 2017-2018|
|nn05a||mixed forest-closed||national park|
|nn05b||mixed forest-closed||national park|
|nn05c||mixed forest-closed||logging – active 2017-2018|
|nn05d||mixed forest-closed||logging – active 2017-2018|
|nn05e||swamp forest||logging – active 2017-2018|
|nn05f||mixed forest-closed||logging – active 2017-2018|
|nn06a||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||logging – long inactive|
|nn06b||mixed forest-closed||national park|
|nn06c||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn06d||mixed forest-closed de marantacees||logging – active 2017-2018|
|nn06e||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||logging – active 2017-2018|
|nn06f||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||logging – active 2017-2018|
|nn07a||mixed forest-closed||logging – long inactive|
|nn07b||mixed forest-closed||logging – long inactive|
|nn07c||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||national park|
|nn08a||mixed forest-open||logging – long inactive|
|nn08b||mixed forest-closed||logging – long inactive|
|nn08c||swamp forest||logging – long inactive|
|nn09a||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||logging – long inactive|
|nn09b||mixed forest-closed||logging – long inactive|
|nn09c||mixed forest-closed||logging – long inactive|
|nn10a||mixed forest-open||logging – long inactive|
|nn10b||Gilbertiodendron (mono-dominant); very tall canopy, sparse understory||logging – long inactive|