I have a web server based on TIdHTTPServer. It is built in Delphi Sydney. From a webpage I'm receiving following multipart/form-data post stream:
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="d"
83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="dir"
Upload
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="file_name"; filename="česká tečka.png"
Content-Type: image/png
PNG_DATA
-----------------------------16857441221270830881532229640--
Problem is that text parts are not received correctly. I read the Indy MIME decoding of Multipart/Form-Data Requests returns trailing CR/LF and changed transfer encoding to 8bit which helps to receive file correctly, but received file name is still wrong (dir should be Upload
and filename should be česká tečka.png
).
d=83AAAFUaVVs4Q07z
dir=UploadW
??esk?? te??ka.png 75
To demonstrate the issue I simplified my code to a console app (please note that the MIME.txt file contains the same as is in post stream above):
program MIMEMultiPartTest;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.Classes, System.SysUtils,
IdGlobal, IdCoder, IdMessage, IdMessageCoder, IdGlobalProtocols, IdCoderMIME, IdMessageCoderMIME,
IdCoderQuotedPrintable, IdCoderBinHex4;
procedure ProcessAttachmentPart(var Decoder: TIdMessageDecoder; var MsgEnd: Boolean);
var
MS: TMemoryStream;
Name: string;
Value: string;
NewDecoder: TIdMessageDecoder;
begin
MS := TMemoryStream.Create;
try
// http://stackoverflow.com/questions/27257577/indy-mime-decoding-of-multipart-form-data-requests-returns-trailing-cr-lf
TIdMessageDecoderMIME(Decoder).Headers.Values['Content-Transfer-Encoding'] := '8bit';
TIdMessageDecoderMIME(Decoder).BodyEncoded := False;
NewDecoder := Decoder.ReadBody(MS, MsgEnd);
MS.Position := 0; // nutne?
if Decoder.Filename <> EmptyStr then // je to atachment
begin
try
Writeln(Decoder.Filename ' ' IntToStr(MS.Size));
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end
else // je to parametr
begin
Name := ExtractHeaderSubItem(Decoder.Headers.Text, 'name', QuoteHTTP);
if Name <> EmptyStr then
begin
Value := string(PAnsiChar(MS.Memory));
try
Writeln(Name '=' Value);
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end;
end;
Decoder.Free;
Decoder := NewDecoder;
finally
MS.Free;
end;
end;
function ProcessMultiPart(const ContentType: string; Stream: TStream): Boolean;
var
Boundary: string;
BoundaryStart: string;
BoundaryEnd: string;
Decoder: TIdMessageDecoder;
Line: string;
BoundaryFound: Boolean;
IsStartBoundary: Boolean;
MsgEnd: Boolean;
begin
Result := False;
Boundary := ExtractHeaderSubItem('multipart/form-data; boundary=---------------------------16857441221270830881532229640', 'boundary', QuoteHTTP);
if Boundary <> EmptyStr then
begin
BoundaryStart := '--' Boundary;
BoundaryEnd := BoundaryStart '--';
Decoder := TIdMessageDecoderMIME.Create(nil);
try
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
BoundaryFound := False;
IsStartBoundary := False;
repeat
Line := ReadLnFromStream(Stream, -1, True);
if Line = BoundaryStart then
begin
BoundaryFound := True;
IsStartBoundary := True;
end
else
begin
if Line = BoundaryEnd then
BoundaryFound := True;
end;
until BoundaryFound;
if BoundaryFound and IsStartBoundary then
begin
MsgEnd := False;
repeat
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
Decoder.ReadHeader;
case Decoder.PartType of
mcptText,
mcptAttachment:
begin
ProcessAttachmentPart(Decoder, MsgEnd);
end;
mcptIgnore:
begin
Decoder.Free;
Decoder := TIdMessageDecoderMIME.Create(nil);
end;
mcptEOF:
begin
Decoder.Free;
MsgEnd := True;
end;
end;
until (Decoder = nil) or MsgEnd;
Result := True;
end
finally
Decoder.Free;
end;
end;
end;
var
Stream: TMemoryStream;
begin
Stream := TMemoryStream.Create;
try
Stream.LoadFromFile('MIME.txt');
ProcessMultiPart('multipart/form-data; boundary=---------------------------16857441221270830881532229640', Stream);
finally
Stream.Free;
end;
Readln;
end.
Could someone help me what is wrong with my code? Thank you.
CodePudding user response:
The stream data you have shown is malformed, as most of the required line breaks are missing, as is the ending MIME boundary after the PNG data. The data should look more like this:
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="d"
83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="dir"
Upload
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="file_name"; filename="Äeská teÄka.png"
Content-Type: image/png
[PNG_DATA]
-----------------------------16857441221270830881532229640--
Your call to ExtractHeaderSubItem()
in ProcessMultiPart()
is wrong, it needs to pass in the ContentType
string parameter, not a hard-coded string literal.
Your call to ExtractHeaderSubItem()
in ProcessAttachmentPart()
is also wrong, it needs to pass in only the content of just the Content-Disposition
header, not the entire Headers
list.
Regarding the dir
MIME part, there is no reason why Indy should be returning the body data as UploadW
instead of Upload
. I don't know where that W
is coming from, you are going to have to debug that one for yourself.
But, regarding the Decoder.FileName
, that value is not affected by the Content-Transfer-Encoding
header at all. MIME headers simply do not allow unencoded Unicode characters. Currently, Indy's MIME decoder supports RFC2047-style encodings for Unicode characters in headers, per RFC 7578 Section 5.1.3, but your stream data is not using that format. It looks like your data is using raw UTF-8 octets (which 5.1.3 also mentions as a possible encoding, but Indy does not currently support that), except that česká tečka.png
is not the correct UTF-8 form of česká tečka.png
, it looks like that data has possibly been double-encoded (ie, česká tečka.png
was UTF-8 encoded, and then the resulting bytes were UTF-8 encoded again). That is not a fault of Indy, your source data is just wrong to begin with. So, you are going to have to decode the original filename
value manually.
CodePudding user response:
Nowadays the filename
parameter should only be added for fallback reasons, while filename*
should be added to clearly tell which text encoding the filename has. Otherwise each client only guesses and supposes. Which may go wrong.
RFC 5987 §3.2 defines the format of that
filename*
parameter:charset
'
[language
]'
value-chars
...whereas:
charset
can beUTF-8
orISO-8859-1
or any MIME-charset...and the language is optional.
RFC 6266 §4.3 defines that
filename*
should be used and comes up with examples in §5:Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8''