Before I start, I am quite new to Keras and machine learning. I know the theory quite well but the syntax less so.
I am trying to create a reinforcement learning neural network using Keras. The problem to be solved is essentially the travelling salesman problem. The problem is, is that the network is fed in its location and the environment, which is a randomly created network of points such as [[0,5],[30,17],[19,83]..., and as the agent travels through this network, it changes as a point cannot be visited again. So if the agent goes from [0,0] to [0,5] then [30,17], the input would look like [0,5],[30,17],[19,83] to [30,17],[19,83] to [19,83]. There is a similar issue with the output, which is just the index of the possible locations to move to. This means that there could be any number of outputs.
The size of the input is initially 100, and the output could also be anywhere between 0 and 100. Methods like padding the inputs with a number would not work as the network would be fed a location impossible to get to, and there is a similar problem with padding the output with a number - the network can just stay in the same position whilst 'moving' ([0,0] to [0,0] etc). The agent also has limited time, so even with filling with random numbers it could just travel to locations which don't actually exist which doesn't solve the problem at hand.
How would I dynamically change the input and output sizes? Is it even possible, and if not, how should it be done?
edit: code because someone wanted it. Quite unintelligible but in essence a class containing the actions able to be done, the input in the form of self.state, and the enviroment in self.point_space. Reward is calculated as the distance travelled at each step and when complete, the distance compared to a random loop. The more important thing is if i can change the input and output sizes.
class GraphEnv(Env):
def __init__(self):
self.point_space = createpoints()
self.action_space = Discrete(len(self.point_space))
self.observation_space = self.point_space.copy()
self.state = [[0,0]]
for i in self.observation_space:
self.state.append(i)
self.length = len(self.point_space)
self.totallen = 0
self.unchangedpoint_space = self.point_space.copy()
def step(self, action):
oldstate = self.state[0]
self.state = []
self.state.append(list(self.point_space[action-1]))
try:
del self.point_space[action-1]
except:
pass
self.observation_space = self.point_space.copy()
for i in self.observation_space:
self.state.append(i)
self.action_space = Discrete(len(self.point_space))
#print("self.state = ", self.state)
reward = int(-math.sqrt((oldstate[0] - self.state[0][0])**2 (oldstate[1] - self.state[0][1])**2))
self.totallen = reward
self.length -= 1
if self.length <= 0:
#print("unchanged =", self.unchangedpoint_space)
randomscore = scoreforrandom(self.unchangedpoint_space)
reward = self.totallen - randomscore
#print("totallen =",self.totallen)
#print("randomscore =", randomscore)
#print("reward", reward)
done = True
else:
done = False
info = {}
return(self.state,reward,done,info)
def render(self):
pass
def reset(self):
self.state = [[0,0]]
self.length = 100
self.point_space = createpoints()
self.observation_space = self.point_space.copy()
self.state.append(self.observation_space)
self.unchangedpoint_space = self.point_space.copy()
#print("unchanged on init", self.unchangedpoint_space)
self.action_space = Discrete(len(self.point_space))
self.totallen = 0
pass
The video i used as help: https://www.youtube.com/watch?v=bD6V3rcr_54&t=77s&ab_channel=NicholasRenotte
CodePudding user response:
Found a solution - To fix the changing outputs i instead used 4 actions - up down left and right. This also helped to solve the changing inputs, padding with 9999 was now an option.