This chapter provides an introduction to character encoding, exploring the fundamental relationship between characters and binary data. It discusses how computers store information and how characters are represented in binary form. The principles and characteristics of the most common encodings for Western European languages are examined. The chapter concludes with recommendations for corpus construction and finally explains some typical problems that can arise from improper handling of character encodings.
Christian Wartena (Tue,) studied this question.