Parse-O-Matic
Home Sitemap
 
About the PSKB / Terms of Use

The term "text file" generally means a human-readable file in which each line is terminated by a specific character or pair of characters. There are three main types of text files; the main characteristic that distinguishes them is the way that each line is terminated.

The Windows/DOS text file format ends each line with the characters "Carriage Return" and "Linefeed", in that order. The Linux/Unix format terminates each line with a "Linefeed" only. Finally, the Mac (short for "Macintosh") format puts a "Carriage Return" character at the end of each line.

Technical Notes

"Carriage Return" is often abbreviated as CR. It has the decimal value of 13 which equates to the hexadecimal value $0D. The CR character is occasionally known as Ctrl-M or ^M.

"Linefeed" is often abbreviated as LF. It has the decimal value of 10 and the hex value $0A. The LF character is occasionally referred to as Ctrl-J or ^J.

You will sometimes see the abbreviation CRLF to mean "a carriage return followed by a linefeed".

The End-of-File Character

The text file itself is sometimes — but not always — terminated with the "End of Message" character, sometimes known as EM, Ctrl-Z or ^Z (character value: Decimal 26, Hex $1A). However, you are rarely aware of this, since the character is removed when you load the text file into a program.

Programs that use text files should address the fact that some text files end with EM and some do not. They can ask the operating system how long the file is, then after loading the file they can inspect the last character. If it is EM, it can be ignored. The upshot of this is that a text file may sometimes appear to be one byte longer than you think it should be.

In some cases, the end of a text file may be marked with the NUL (Null) character (decimal value 0, hex value $00).

Editing a Text File

Text files can generally be loaded by a text editor program (such as Windows Notepad, or NoteTab from Fookes Software), and most word-processing programs can load them as well. However, when you save a text file loaded this way it may lose its original format.

For example, you might load a Mac text file (in which each line ends with LF), but if you edit and save it using a Windows text editor you might find that each line in the file now ends with CRLF. This might cause problems later, if the next program to use the file does not know how to deal with CRLF-delimited files. In such case, the extra LF may appear in the program as a strange-looking character at the beginning of each line (starting with the second line).

Worse problems can arise if you edit a text file in a word-processing program. When you save the file, you must ensure that you save it as a text file rather than a word-processing file. In Word 2002 you can select "File/Save As", and then select "Plain Text (*.txt)". If you should inadvertently save a text file in a word-processing format, it will now contain a lot of additional information it did not have before. This will probably render it useless to the next program that tries to use it, since it expected an ordinary text file. Fortunately, it will probably be easy to load the file back into the word processing program and save it again, this time making sure to specify a text file format.

Some Examples of Text File Extensions

A file whose name ends with the characters .txt is almost certainly a text file. Other extensions typical of text files include .me (as in a file named Read.Me) and .htm — which is an HTML file, as used by web pages.

Windows files with the .ini extension are also text files, so they could be loaded into a text editor program. However, just because you can do this does not mean that you should do this. An ini file typically contains the settings for a program, and if you alter the file the program might stop working.

Files with the .csv extension are comma-separated-value files. These can be loaded into a text editor, but your operating system may be configured to open them in a spreadsheet if you double-click on them.

To summarize the foregoing: many programs save data in text files, but not all text files are supposed to be loaded into a text editor program.

EBCDIC Text Files

Some mainframe computers use the EBCDIC character set instead of ASCII. As it happens, the CR character is the same in EBCDIC as in ASCII. However, the Linefeed character in EBCDIC is hex $25. An EBCDIC text file might also use the "New Line" character (hex $15).

In any case, most programs designed to work with ASCII text files generally cannot cope with EBCDIC files. So if you receive a file that you have been assured is "text", yet when you load it into your text editor it looks like nonsense, it might possibly be an EBCDIC file. (Either that, or the file has been encrypted in some way.)




 

Parse-O-Matic Free, Basic, Business and Enterprise are data conversion tools that allow you to parse, convert, mine, import and export data files, reports, web capture, logs, legacy databases, text, CSV (comma separated; comma delimited), ASCII, EBCDIC, and almost any data format that you may have.

Copyright © 1986-2011 National Data Parsing Canada Corporation All rights reserved. Legal