Skip to content Skip to sidebar Skip to footer

Properly Format Multipart/form-data Body

Introduction Background I'm writing a script to upload stuff including files using the multipart/form-data content type defined in RFC 2388. In the long run, I'm trying to provide

Solution 1:

This is a placeholder answer, describing what I did while waiting for some authoritative input to some of my questions. I'll be happy to accept a different answer if it demonstrates that this approach is wrong or unsuitable in at least one of the design decisions.

Here is the code I used to make this work according to my taste for now. I made the following decisions:

Can I use 8 bit data for my text fields and still conform to the specification?

I decided to do so. At least for this application, it does work.

Can I get the email package to serialize my text fields as 8 bit data without extra encoding?

I found no way, so I'm doing my own serialization, just as all the other recipes I saw on this.

Can I avoid base64 encoding for binary file content as well?

Simply sending the file content in binary seems to work well enough, at least in my single application.

If I can avoid it, should I write the Content-Transfer-Encoding as 8bit or as binary?

As RFC 2045 Section 2.8 states, that 8bit data is subject to a line length limitation of 998 octets between CRLF pairs, I decided that binary is the more general and thus the more appropriate description here.

If I had to serialize the body myself, how could I use the email.header package on its own to just format header values?

As already edited into my question, email.utils.encode_rfc2231 is very useful for this. I try to encode using ascii first, but use that method in case of either non-ascii data or ascii characters which are forbidden inside a double-quoted string.

Is there some implementation that already did all I'm trying to do?

Not that I'm aware of. Other implementations are invited to adopt ideas from my code, though.


Edit:

Thanks to this comment I'm now aware that the use of RFC 2231 for headers is not universally accepted: the current draft of HTML 5 forbids its use. It has also been seen to cause problems in the wild. But since POST headers not always correspond to a specific HTML document (think web APIs for example), I'm not sure I'd trust that draft in that regard either. Perhaps the right way to go is giving both encoded and unencoded name, the way RFC 5987 Section 4.2 suggests. But that RFC is for HTTP headers, while a multipart/form-data header is technically HTTP body. That RFC therefore doesn't apply, and I do not know of any RFC which would explicitely allow (or even encourage) the use of both forms simultaneously for multipart/form-data.

Solution 2:

You might want to look at Send file using POST from a Python script question which points to the Requests library which is becoming the mostly used Python library for http. In case you won't find all needed functionality there and decide to implement it yourself I encourage you to contribute it to this project.

Post a Comment for "Properly Format Multipart/form-data Body"