What are binary and text files? (2024)

Introduction

On a computer, every file is a long string of ones and zeros. Specifically, a file is a finite-length sequence of bytes, where each byte is an integer between 0 and 255 inclusive (represented in binary as 00000000 to 11111111). Files can be broadly classified as either binary or text. These categories have different characteristics and need different tools to work with such files. Knowing the differences between binary and text files can save you time and mistakes when reading or writing data.

Here is the primary difference: Binary files have no inherent constraints (can be any sequence of bytes), and must be opened in an appropriate program that knows the specific file format (such as Media Player, Photoshop, Office, etc.). Text files must represent reasonable text (explained later), and can be edited in any text editor program.

Remember that all files, whether binary or text, are composed of bytes. The difference between binary and text files is in how these bytes are interpreted. Every text file is indeed a binary file, but this interpretation gives us no useful operations to work with. The reverse is not true, and treating a binary file as a text file can lead to data corruption. As a method of last resort, a hex editor can always be used to view and edit the raw bytes in any file.

File extensions

We can usually tell if a file is binary or text based on its file extension. This is because by convention the extension reflects the file format, and it is ultimately the file format that dictates whether the file data is binary or text.

Common extensions that are binary file formats:

Common extensions that are text file formats:

Binary file characteristics

Binary file in application (good)

Binary file in text editor (bad)

For most software that people use in their daily lives, the software consumes and produces binary files. Examples of such software include Microsoft Office, Adobe Photoshop, and various audio/video/media players. A typical computer user works with mostly binary files and very few text files.

A binary file always needs a matching software to read or write it. For example, an MP3 file can be produced by a sound recorder or audio editor, and it can be played in a music player or audio editor. But an MP3 file cannot be played in an image viewer or a database software.

Some binary formats are popular enough that a wide variety of programs can produce or consume it. Image formats like JPEG are the best example – not only can they be used in image viewers and editors, they can be viewed in web browsers, audio players (for album art), and document software (such as adding a picture into a Word doc). But other binary formats, especially for niche proprietary software, might have only one program in the world that can read and write it. For example, a high-end video editing software might let you save your project to a file, but this software is the only one that can understand its own file format; the binary file will never be useful anywhere else.

If you use a text editor to open a binary file, you will see copious amounts of garbage, seemingly random accented and Asian characters, and long lines overflowing with text – this exercise is safe but pointless. However, editing or saving a binary file in a text editor will corrupt the file, so never do this. The reason corruption happens is because applying a text mode interpretation will change certain byte sequences – such as discarding NUL bytes, converting newlines, discarding sequences that are invalid under a certain character encoding, etc. – which means that opening and saving a binary file will almost surely produce a file with different bytes.

Text file characteristics

Text file in text editor (good)

Text file in hex editor (inconvenient)

By convention, the data in every text file obeys a number of rules:

  • The text looks readable to a human or at least moderately sane. Even if it contains a heavy proportion of punctuation symbols (like HTML, RTF, and other markup formats), there is some visible structure and it’s not seemingly random garbage.

  • The data format is usually line-oriented. Each line could be a separate command, or a list of values could put each item on a different line, etc. The maximum number of characters in each line is usually a reasonable value like 100, not like 1000.

  • Non-printable ASCII characters are discouraged or disallowed. Examples include the NUL byte (0x00), DEL byte (0x7F), and most of the range 0x01 to 0x1F (except tab, carriage return, newline, etc.). Some text editors silently convert or discard these bytes, which is why binary files should never be edited in a text editor.

  • The reading of newline sequences is usually universal – namely, CR (classic Mac OS), LF (Unix), or CR+LF (Windows) all mean the same thing, which is to end the current line and start the next one. The writing of newline sequences usually normalizes to the preferred one on the current platform, regardless of which variant was read. (For example, a text file “Hello CR LF world CR Lorem LF ipsum” read in a Unix text editor would likely be written out as “Hello LF world LF Lorem LF ipsum”.)

  • There is some character encoding that governs how extended-ASCII bytes are handled. Byte values from 0x80 to 0xFF are not covered by the universally accepted ASCII standard, and the interpretation of these bytes depends on the choice of character encoding – such as UTF-8, ISO-8859-1, Shift JIS. Thus the interpretation of a text file depends on the character encoding used (unless the file format is known to be pure-ASCII), whereas a binary file is just a sequence of plain bytes with no inherent notion of character encoding.

  • In most text file formats, some flexibility is given to whitespace characters. For example, using one space or two space might not change the meaning of a command. And in C-like programming languages, one whitespace character has the same meaning as any positive number of whitespace or newline characters (except within strings).

Observations regarding the general computing environment around text files:

  • Any text file can be viewed or edited in any text editor. The creator of a text-based file format doesn’t need to know or care about what editor program a future user will use. This contrasts with binary formats, where each format is generally coupled to a specific program that can handle it.

  • Software developers work with many kinds of text files all the time – program source code, setup scripts, technical documentation, program-generated output and logs, configuration files, you name it. Much of the world of computer programming revolves around text files, since they are easy to create, edit, and consume. This contrasts with how an average computer user mainly works with binary files.

  • Text files are usually viewed and edited in a monospaced font. The historical reason is that the earliest computer terminals could only display monospaced text. A modern reason is that monospaced allows an author to visually align characters in different lines for ASCII art, tabular data, etc.

  • On the Unix platform, text files are so ubiquitous that they often have no file name extension at all. For example, “README” and “config” are the full names of popular files that exist in many places.

  • Beyond text editors, there exist many standard Unix tools to generate, manipulate, or view textual data: ls, more, sort, grep, awk, etc.

  • Version control software (Git, Mercurial, SVN, Perforce, etc.) work best with text files. For binary files they can only tell you whether a file has changed or not. For text files, they can show line-by-line differences, perform automatic merges, and do many other useful things.

Programming considerations

Every practical programming language provides separate facilities for working with binary versus text files. Generally speaking, if you read a binary file in text mode you will get unhelpful data that looks like garbage, if you write a binary file in text mode it will probably be corrupt, if you read a text file in binary mode you can’t perform any useful text operations on the bytes, and if you write a text file in binary mode you will need to manually convert characters to bytes. So it pays to use the right tools for the right job. To illustrate with concrete examples, let’s look briefly at how binary vs. text files work in three popular programming languages.

  • In C, you specify binary or text mode when opening a file stream with fopen(). The only difference this makes is that in text mode, reading a newline sequence always converts from universal newlines on disk to '\n' in memory, and writing '\n' always produces the platform's preferred newline sequence for the file on disk (such as "\r\n"). C only supports 8-bit characters, so text mode does not actually help with interpreting file bytes as Unicode code points.

  • In Python, you specify binary or text mode when opening a file stream with open(). In Python 3, if a file is opened in binary mode then read() and write() work with byte sequences of the bytes type. Otherwise a file as opened in text mode, then read() and write() apply a character encoding to convert the underlying bytes to/from a Unicode string of the str type. (The behavior is somewhat different in Python 2.)

  • In Java, InputStream and OutputStream work with bytes, whereas Reader and Writer work with Unicode characters (as UTF-16). You can open a FileInputStream (which works with bytes), and create an InputStreamReader on top of it with a specific character encoding (such as UTF-8). Using this Reader, you can get the text characters in the file. An analogous process applies to writing text to a file.

What are binary and text files? (2024)
Top Articles
A Guide to Handling Travel Money in India
The richest person who ever lived - BBC Reel
Best Pizza Novato
Monthly Forecast Accuweather
J & D E-Gitarre 905 HSS Bat Mark Goth Black bei uns günstig einkaufen
Big Spring Skip The Games
Us 25 Yard Sale Map
Nikki Catsouras Head Cut In Half
Category: Star Wars: Galaxy of Heroes | EA Forums
Planets Visible Tonight Virginia
Large storage units
Miami Valley Hospital Central Scheduling
Morocco Forum Tripadvisor
Gmail Psu
Les Rainwater Auto Sales
Used Sawmill For Sale - Craigslist Near Tennessee
Skyward Login Jennings County
The Ultimate Style Guide To Casual Dress Code For Women
CANNABIS ONLINE DISPENSARY Promo Code — $100 Off 2024
Zoe Mintz Adam Duritz
Craigslist Southern Oregon Coast
China’s UberEats - Meituan Dianping, Abandons Bike Sharing And Ride Hailing - Digital Crew
FDA Approves Arcutis’ ZORYVE® (roflumilast) Topical Foam, 0.3% for the Treatment of Seborrheic Dermatitis in Individuals Aged 9 Years and Older - Arcutis Biotherapeutics
Fsga Golf
Air Traffic Control Coolmathgames
Winco Employee Handbook 2022
What Are The Symptoms Of A Bad Solenoid Pack E4od?
Barista Breast Expansion
Acurafinancialservices Com Home Page
Grave Digger Wynncraft
Airg Com Chat
Barbie Showtimes Near Lucas Cinemas Albertville
Dairy Queen Lobby Hours
Dentist That Accept Horizon Nj Health
Mrstryst
Elanco Rebates.com 2022
Frommer's Belgium, Holland and Luxembourg (Frommer's Complete Guides) - PDF Free Download
Matlab Kruskal Wallis
Old Peterbilt For Sale Craigslist
Metro By T Mobile Sign In
The Closest Walmart From My Location
Has any non-Muslim here who read the Quran and unironically ENJOYED it?
Gt500 Forums
SF bay area cars & trucks "chevrolet 50" - craigslist
Uc Davis Tech Management Minor
Lady Nagant Funko Pop
Marcal Paper Products - Nassau Paper Company Ltd. -
Whitney Wisconsin 2022
Bonecrusher Upgrade Rs3
The Goshen News Obituary
Swissport Timecard
Gainswave Review Forum
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 5603

Rating: 4.9 / 5 (49 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.