Home > Enterprise >  Can you turn an Iterator<T> into an Iterator<&T> efficiently?
Can you turn an Iterator<T> into an Iterator<&T> efficiently?

Time:08-20

I've ran into two frustrating problems, it comes from wanting to send messages on a unix socket using sendmmsg from the nix-crate.

I have some given message which may or may not contain fds. Nix does most things zero-copy, making it sometimes tricky to work with, making you battle the borrow-checker and type system, both problems come from this function signature:

pub fn sendmmsg<'a, I, C, S>(
    fd: RawFd,
    data: impl std::iter::IntoIterator<Item=&'a SendMmsgData<'a, I, C, S>>,
    flags: MsgFlags
) -> Result<Vec<usize>>
    where
        I: AsRef<[IoSlice<'a>]>   'a,
        C: AsRef<[ControlMessage<'a>]>   'a,
        S: SockaddrLike   'a

Where SendMmsgData is defined as:

pub struct SendMmsgData<'a, I, C, S>
    where
        I: AsRef<[IoSlice<'a>]>,
        C: AsRef<[ControlMessage<'a>]>,
        S: SockaddrLike   'a
{
    pub iov: I,
    pub cmsgs: C,
    pub addr: Option<S>,
    pub _lt: std::marker::PhantomData<&'a I>,
}

And here's the code interfacing with it

...
#[inline]
    fn exec_write_many<M>(&mut self, messages: Vec<M>) -> Result<usize, Error>
        where
            M: SocketMessage,
    {
        let mut sent = 0;
        let mut send_message_data = vec![];
        for msg in messages.iter() {
            let mmsg_data = if msg.borrow_fds().is_empty() {
                SendMmsgData {
                    iov: [IoSlice::new(msg.borrow_buf()); 1],
                    cmsgs: vec![],
                    addr: NONE_ADDR,
                    _lt: std::marker::PhantomData::default(),
                }
            } else {
                SendMmsgData {
                    iov: [IoSlice::new(msg.borrow_buf()); 1],
                    cmsgs: vec![ControlMessage::ScmRights(msg.borrow_fds())],
                    addr: NONE_ADDR,
                    _lt: std::marker::PhantomData::default(),
                }
            };
            send_message_data.push(mmsg_data);
        }
        match nix::sys::socket::sendmmsg(self.sock_fd, &send_message_data, MsgFlags::MSG_DONTWAIT) {
            ...

Both problems are manageable but come at a performance cost, starting with the major one: I want to provide the sendmmsg with an iterator created like this instead:

...
#[inline]
    fn exec_write_many<M>(&mut self, messages: Vec<M>) -> Result<usize, Error>
        where
            M: SocketMessage,
    {
        let mut sent = 0;
        let sendmmsgs = messages.iter()
            .map(|msg| {
                if msg.borrow_fds().is_empty() {
                    SendMmsgData {
                        iov: [IoSlice::new(msg.borrow_buf()); 1],
                        cmsgs: vec![],
                        addr: NONE_ADDR,
                        _lt: std::marker::PhantomData::default(),
                    }
                } else {
                    SendMmsgData {
                        iov: [IoSlice::new(msg.borrow_buf()); 1],
                        cmsgs: vec![ControlMessage::ScmRights(msg.borrow_fds())],
                        addr: NONE_ADDR,
                        _lt: std::marker::PhantomData::default(),
                    }
                }
            });
        match nix::sys::socket::sendmmsg(self.sock_fd, sendmmsgs, MsgFlags::MSG_DONTWAIT) {
            ...

But since SendMmsgData is owned by the iterator I get this:

error[E0271]: type mismatch resolving `<[closure@socks/src/buffered_writer.rs:146:18: 162:14] as FnOnce<(&M,)>>::Output == &SendMmsgData<'_, _, _, _>`
    --> socks/src/buffered_writer.rs:163:56
     |
163  |         match nix::sys::socket::sendmmsg(self.sock_fd, sendmmsgs, MsgFlags::MSG_DONTWAIT) {
     |               --------------------------               ^^^^^^^^^ expected reference, found struct `SendMmsgData`
     |               |
     |               required by a bound introduced by this call
     |
     = note: expected reference `&SendMmsgData<'_, _, _, _>`
                   found struct `SendMmsgData<'_, [IoSlice<'_>; 1], Vec<ControlMessage<'_>>, ()>`
     = note: required because of the requirements on the impl of `Iterator` for `Map<std::slice::Iter<'_, M>, [closure@socks/src/buffered_writer.rs:146:18: 162:14]>`
note: required by a bound in `nix::sys::socket::sendmmsg`
    --> /home/gramar/.cargo/registry/src/github.com-1ecc6299db9ec823/nix-0.24.2/src/sys/socket/mod.rs:1456:40
     |
1456 |     data: impl std::iter::IntoIterator<Item=&'a SendMmsgData<'a, I, C, S>>,
     |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `nix::sys::socket::sendmmsg`

It's pretty frustrating as for Options for example I can just call as_ref() to turn the inner T to &T but I can't figure out how to do that with the iterator which makes me allocate another vector and loop through all messages to be sent.

The second smaller issue is the cmsgs. The type-system disallows using an array, since one branch will have the type [_ ;1] and the other [_ ;0]. The empty Vec causes no allocations, while the vec with one item will cause an allocation.

Both problems have me running into the same issue. I can't figure out how to create a wrapper struct and just implement IntoIterator<Item=&'a SendMmsgData<'a, I, C, S>> and AsRef<[ControlMessage<'a>]> respectively because both requires me to return a reference that will be created in the function body, as my datastructure is not on the form SendMmsgData or ControlMessage, and both of those reference some other piece of memory, in my case I would have to create a struct with an owned buffer and an internal reference to that (internal reference to self) which creates other problems.

Any ideas of how I can do this without the extra loop/allocations?

Ps. On measurements, doing this is around a 10% performance hit when messages do not have fds because of how that syscall works, with a 2% performance increase on the case of only having messages with fds:

if msg.borrow_fds().is_empty() {
                SendMmsgData {
                    iov: [IoSlice::new(msg.borrow_buf()); 1],
                    cmsgs: [ControlMessage::ScmRights(&[])],
                    addr: NONE_ADDR,
                    _lt: std::marker::PhantomData::default(),
                }
            } else {
                SendMmsgData {
                    iov: [IoSlice::new(msg.borrow_buf()); 1],
                    cmsgs: [ControlMessage::ScmRights(msg.borrow_fds())],
                    addr: NONE_ADDR,
                    _lt: std::marker::PhantomData::default(),
                }
            };

I could just use the libc crate directly, and if I can't solve this in a better way I'll have to do that instead.

CodePudding user response:

The trouble with your first problem is that sendmmsg could call next on the iterator n times, getting references to n SendMsgDatas that all have to live somewhere. Because you can't know what n is, all SendMsgDatas have to live somewhere, so you'll have to buffer them in a Vec. This could be fixed by changing the API of sendmmsg to take either owned or borrowed SendMsgDatas, but you obviously don't have control over that.

The cmsgs issue, I think, can be helped though. You can create your own Option-like wrapper that lives purely on the stack and that implements AsRef based on whether it contains a value or not:

struct ControlMessage<'a>(std::marker::PhantomData<&'a ()>);

enum CMsgWrapper<'a> {
    Empty,
    Msg(ControlMessage<'a>),
}

impl<'a> AsRef<[ControlMessage<'a>]> for CMsgWrapper<'a> {
    fn as_ref(&self) -> &[ControlMessage<'a>] {
        match self {
            CMsgWrapper::Empty => &[],
            CMsgWrapper::Msg(cmsg) => std::slice::from_ref(cmsg),
        }
    }
}

CodePudding user response:

Like in the accepted answer was said, your temporary struct should live somewhere. As I see, you accept input parameter by value, so you can probably modify them. Yes, I propose some "dirty" solution that does not look good even for me, but when performance really matters, we can try it.

So the idea is to put SendMmsgData structs into your message parameters as Option<SendMmsgData> and make SocketMessage trait have fn get_send_mmsg_data(&mut self) -> &SendMmsgData.

Here is example code

And yes, it's better to make PR into rust std library to chane sendmmsg interface making it accept references.

  • Related