At this point you're not even reading what's written, though I may not have done a perfect job of reading your post myself.
I'm not saying that long mouse press while holding food does nothing. I'm saying that I didn't find out that was a way to eat food. Which was exactly what I wrote, and follows my ponint of there being too many combinations of inputs and it being a challenge to keep them all straight. EDIT: If what you're saying is that long-press LMB actually does nothing, that's kind of my point. Long pressing other buttons does things, so if you're searching for an input to do something, a long press on the mouse is going to be one of the things you try. I may have mixed up long-press LMB and long-press grab for food eating - I've remapped LMB and the grab key to controller buttons and I'm trying to translate everything back to M&K for your benefit, so between the input soup and trying to keep track of controller map I might have mixed that up with grab - which still goes to my point.
You're also not even engaging with the full facts and overall discussion here. Four inputs for an object you're looking at is more than what I suggested is actually required for an object in front of you, and the user has to deal with more than just looking at objects. More than that, they have the experience of having to determine which of different possible input combinations performs the requested real-world action, and isn't operating from a place of already knowing what every possible input is . I think you're withdrawing into a simplified mental concept of the controls you have that isn't what's being discussed. Whats being discussed is the total number of distinct inputs combination across every different input a player has to enter at different times and multiplied against the various conditional states that in change some of the user interface input meanings, as well as user interface modifiers that also change what some of the inputs mean.
One possible thing that could be tripping you up is that not every conditional state (holding, looking at, grabbing) and every user interface modifier (such long-pressing the input or scrolling or whatever sprint is) always has a valid button combination with each other user interface, but this is not inherently perceivable by a new player who is learning how to play the game. To know what combinations aren't valid inputs, one must know all of the valid inputs. If you have an action you want to perform in mind but you are trying to figure out how to perform it because you haven't Matrix-style uploaded all of the valid input combinations into your brain, but you DO know that looking at an object and holding an object and grabbing an object can each influence the action that LMB performs, that's three different inputs the action you have in mind could be. If you know that long-pressing an input and quick-pressing an input have performed different actions, now you have six different ways of interacting with an object with LMB that could be your action. But if LMB is one of lets say 4 different keys, now there are 18 possible combinations. There's also mental logic to filter these down, but the point is that the sheer number of combinations you are facing prior to having the inputs completely memorized, or god forbid you forget any of them, is significantly problemactic for exactly everybody who isn't someone who has long since figured them out and put learning them into the forgotten past.
If you can extend a modicum of empathy for what it's like to not know how to play the game and how the true level of required knowledge a person has to learn is not actually sufficiently captured by your control summary, that's what will get you to understand what I'm talking about.
The tooltips while looking at an object, in some instances, have been EXTREMELY helpful - I did not have to struggle to figure out how to put things into the drone bag, or how to start the broomba. But they do not include words like "eat", and only inform the meaning of user inputs when an object is being looked at. They also get MUCH more clumsy and overly wordy with a controller map. More importantly, they don't cover the different sets of inputs and input meanings in the grabbed and held state.
And again, if I'm getting some of the input combinations wrong.... that's kind of my point!!!