Corpora containing 7,849 Mandarin Chinese "utterances" and their phonetic segmentation and tone labels separated into training and test sets. The utterances were derived from 1997 Mandarin Broadcast News Speech and Transcripts. That collection consists of approximately 30 hours of Chinese broadcast news recordings from Voice of America, China Central TV and KAZN-AM, a commercial radio station based in Los Angeles, CA.