SLX Corpus of Classic Sociolinguistic Interviews

Resource

URL

https://dss2.princeton.edu/data/308/

Blurb

Comprises 8t sociolinguistic interviews with a total of 9 speakers, conducted in the 1960s and 70s. All of the interviews are conducted by William Labov or by one of his students. Labov notes that these interviews are not classic in the sense that they form part of a systematic sociolinguistic study of the speech community. What makes these interviews classic is that they represent classic solutions to the problems of achieving cross-cultural contact, reducing the effect of the Observers Paradox and approximating the vernacular of everyday life. Most importantly, they are interviews with extraordinarily gifted, memorable and fluent speakers.

These particular interviews have also been targeted for inclusion in this corpus because of their sound quality and because publication of the audio data and corresponding transcripts and annotations does not violate any agreement the interviewer made with the speakers regarding data distribution.

The corpus includes the complete interview recordings plus time-aligned verbatim transcripts for each speaker. Also included in the publication is a sociolinguistic variable survey that represents an overview of the intra- and inter-speaker variation attested in the corpus, highlighting a broad range of phonological, phonetic, grammatical, lexical and stylistic variables. Finally, the publication includes a number of annotation tools that allow users to listen to each interview while browsing the corresponding transcripts, and to display and hear each token identified in the variable survey. These tools can be extended to create new time-aligned transcripts or tag additional variables within the existing corpus.

The SLX Corpus was developed as part of the Data and Annotations for Sociolinguistics (DASL) Project, an investigation of best practices in the use of digital speech corpora for the study of language variation. Containing classic interview material in the Labovian tradition, it is a valuable teaching tool for linguists. The recordings demonstrate successful interviewing techniques, the sound quality is high, and the digitization, segmentation and transcription of the data represent best practice in these areas. The variable survey highlights over 150 sociolinguistic variables attested in the corpus and suggests avenues for further research. Most importantly, the SLX Corpus provides both an example of a digital speech corpus developed specifically to support sociolinguistic research, and a stable benchmark for training in sociolinguistic data collection, digitization, segmentation, transcription, analysis and publication.

Data

The 17 speech files are 22050Hz, 16-bit, single-channel in the MS WAV (RIFF) format, for a total of 575 minutes (~ 1.5GB).

The audio data reflects a broad spectrum of speaking styles, including spontaneous speech, narratives, responses and formal linguistic tasks. The interviews touch on a multitude of topics, and corpus users should note that the language of the interviews represents the uncensored opinions of the speakers, reflecting their daily concerns and personal histories.

Taken as a whole, the speakers exemplify a wide variety of regional and social dialects. Demographic information for each main speaker in the corpus is displayed in the table below.

Speaker	Age	Speech Community	Occupation	Ethnicity	Education
Adolphus H.	81	Near Hillsboro, NC	Farmer	African American	Very little
Bobbie A.	22	Ayr, Scotland	Saw Doctor	Scottish/Italian	Some technical college
Henry G.	60	E. Atlanta, GA (Dekalb Co.)	Railroad foreman	European American	High school graduate
Jerry T.	19	Near Leakey, Texas	Gas station attendant	European American	Some high school
Joe D. (interviewed with Eddie M.)	21	Liverpool, England	Docker	English	Some high school
Eddie M. (Interviewed with Joe D.)	19	Liverpool, England	Docker	English	Some high school
Kathy D.	15	Rochester, NY	Student	European American	In 11th grade
Louise A.	53	Knoxville, TN	Mother	European American	Unknown
Rose B.	43	New York, NY (Lower East Side)	Factory seamstress	Italian American	Sixth Grade

The corpus also contains transcripts, annotations, annotation tools and documentation.

The documentation includes the complete segmentation and transcription guidelines, descriptions of the variables and style codes used in the variable survey, demographic information plus Labovs notes about each speaker, and an instruction manual for using the corpus tools.

Link time

2022-11-01 14:02:00 UTC

Sample

Principal investigator

Producer

Distributor

Version

More detail URL

Resource type

Single study

Subjects

Qualitative Data

Regions

Europe

Countries

United Kingdom
United States