[ 2009-November-01 11:52 ]
This is a partial list of data interchange formats that attempt to make it easy to exchange data between applications that may be written in different languages. The most common use is to build RPC-like systems, so one application can submit a request over the network, and receive some response. Unfortunately, there are tons of these formats. If you think you should invent your own, please think again. It would help our industry if there was broad support for a small number of interchange formats. As you can see from this list, there are tons that exist already. Please don't invent a new one. Other people agree with me on this. I recommend you pick one from the "somewhat broadly used" list. If anyone has any comments or updates, let me know and I'll keep this up to date.
Somewhat Broadly Used
- XDR (eXternal Data Representation): An IETF standard. Define messages using a language that looks like C structures. Compile that to generate serialization/deserialization code. Used for NFS. Python includes xdrlib. A GPL C implementation is included in sfslite. The GNU C library includes a version, but apparently it is very bad, so Portable XDR is a re-implementation under an LGPL/GPL license. FreeBSD includes a version somewhere.
- Abstract Syntax Notations One (ASN.1): An ITU standard that is used for cryptographic and telecom standards, as well as LDAP. Apparently has poor open source tool support, although asn1c, asnparser, Erlang's Asn1, and a Java ASN.1 framework exist, among others. I've never used this.
- SOAP: A complex XML-based format. Huge wad of specifications, standardized by the W3C. I've never used this, and recommend avoiding it unless you absolutely must. Used by Salesforce.com's Web Services API.
- XML-RPC: A simple XML-based format. There is some confusion about using Unicode strings. Python includes the xmlrpclib module. Used by Sun's Storage appliance (code named Fishworks), among others.
- JavaScript Object Notation (JSON): Originally used to communicate with Javascript programs, but is now used for other applications, since it is a simple text-based format. There are two incompatible RPC specifications (JSON-RPC 1.0 and 2.0).
- Google Protocol Buffers: Uses variable length encoding for space efficiency. Supports optional fields to permit upgrades. Messages are defined in a
.proto
file, which is used to generate parsers. Provides an RPC interface, but no implementation.
- Facebook/Apache Thrift: Very similar design to Google protocol buffers. Provides an RPC implementation. Probably more widely used.
Not Widely Used?
Not So Honourable Mentions
- CORBA: Define object interfaces using the CORBA interface description language (IDL), which is intended to support multiple programming languages. Unfortunately, the wire protocol (IIOP) was not standardized until much later, and so it does not fit in this list, where I am trying to include formats that support multiple programming languages. CORBA has developed a reputation for being excessively complex and hard to use. However, it is still used inside many companies.
- Java Serialization: Java includes support for serializing and deserializing objects. Unfortunately, this is again mostly Java specific, and is also quite slow.