clarion.common
Class QBPNet

java.lang.Object
  extended byclarion.common.BPNet
      extended byclarion.common.QBPNet

public class QBPNet
extends BPNet


Field Summary
protected  int CONTROL_OUTPUT_DIM_NUM
          the # of NACS-control dims.
protected  double discount
          the discount and reinforcement.
protected  short[] fullControlOutputDVs
          the dim-val info of NACS-control outputs.
protected  short[] fullControlOutputOffsets
          the start offsets of each NACS-control dim in an one-dimenisonal array.
protected  short[] fullOutputDVs
          D-V info of bottom (IDN) outputs.
protected  short[] fullOutputOffsets
          start position of each (bottom) output dim in one dimensional aray.
protected  int netIdx
          the net index
protected  int netType
          network type : EX, GS or WM.
protected  int OUTPUT_DIM_NUM
           
protected  short[] outputLocations
          used to locate each type of output in the overall outputs.
protected  short[] outputNums
          #s of each type of output in its physical full length.
protected  short[][][] performedAction
          action performed.
protected  double[] preState
          current state and previous state
protected  double reinforcement
          the discount and reinforcement.
protected  double[] state
          current state and previous state
protected  int subsysIdx
          subsystem: ACS or MCS.
 
Fields inherited from class clarion.common.BPNet
BLAT, BLDT, BLPT, BLRT, debug, DesiredOutput, Errors, Eta, global, Hidden, HiddenDeriv, HiddenErrors, HiddenMomentum, HiddenThresholds, HiddenToOutputMomentum, HiddenToOutputWeights, HINITTHRESHOLD, HINITWEIGHT, HtoOWeights, Input, inputDVs, InputToHiddenMomentum, InputToHiddenWeights, ItoHWeights, LINITTHRESHOLD, LINITWEIGHT, Momentum, nHidden, nInput, NM, nOutput, Output, OutputDeriv, outputDVs, OutputMomentum, outputOffsets, OutputThresholds, PM, RZero, SUCC_C3, SUCC_C4, succRate, sumSqErrors
 
Constructor Summary
QBPNet(int inputNum, int nIdx, Global g)
          constructor.
 
Method Summary
 void backwardPass(double[][][] qValue)
          Updates the neural network using simplified QBP version.
 void backwardPass(double[][][] qValue1, double[][][] qValue2)
          Updates the neural network using QBP version.
protected  void computeErrors(double[][][] qValue)
          computes the errors of output layer of simplified QBP network.
protected  void computeErrors(double[][][] qValue1, double[][][] qValue2)
          computes the errors of output layer of QBP network.
 void getMaxQVals(double[] fillMe)
          Returns the max Q-values of each output dimension.
 double getQDiscount()
          returns the discount.
 double getQVal(int index)
          returns the Q-value of a particular output unit.
 void getQVals(double[] fillMe)
          returns all of the Q-values (the outputs).
 double getReinforcement()
          returns the reinforcement.
 void getState(int[] fillMe)
          returns current state.
 void reinit()
          reinitialize this network.
 void setInput()
          Sets the state as input to BPNet.
 void setPerformedAction(short[][][] action)
          Sets the performed action.
 void setPreState()
          Sets the previous state as input to BPNet.
 void setReinforcement(double value)
          sets currently received reinforcement.
 void setState(double[] curState)
          sets current state using an array of activations of all of the dimensional values.
 void setState(int[] curState)
          sets current state using an array of active dimensional values.
 void update(short[][][] action, double feedback, double[][][] qValue)
          update routine used for simplified Q-Learning.
 void update(short[][][] action, double feedback, double[][][] qVal1, double[][][] qVal2)
          update routine used for Q-Learning.
 
Methods inherited from class clarion.common.BPNet
backwardPass, calcRT, computeErrors, computeHiddenActivation, computeHiddenErrors, computeOutputActivation, forwardPass, getInput, getnHidden, getnInput, getNM, getnOutput, getOutput, getOutput, getOutput, getOutput, getOutput, getPM, getResponseTime, getSuccRate, getSumSqErrors, modifyHiddenToOutput, modifyInputToHidden, reinitWeights, resetMatches, restoreInitWeights, setDesiredOutput, setDesiredOutput, setDesiredOutput, setDesiredOutput, setInput, setInput, setInput, setLearningRate, setMomentum, updateMatches
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

subsysIdx

protected int subsysIdx
subsystem: ACS or MCS.


netType

protected int netType
network type : EX, GS or WM.


netIdx

protected int netIdx
the net index


OUTPUT_DIM_NUM

protected int OUTPUT_DIM_NUM

CONTROL_OUTPUT_DIM_NUM

protected int CONTROL_OUTPUT_DIM_NUM
the # of NACS-control dims.


fullOutputDVs

protected short[] fullOutputDVs
D-V info of bottom (IDN) outputs.


fullOutputOffsets

protected short[] fullOutputOffsets
start position of each (bottom) output dim in one dimensional aray.


fullControlOutputDVs

protected short[] fullControlOutputDVs
the dim-val info of NACS-control outputs.


fullControlOutputOffsets

protected short[] fullControlOutputOffsets
the start offsets of each NACS-control dim in an one-dimenisonal array.


outputNums

protected short[] outputNums
#s of each type of output in its physical full length.


outputLocations

protected short[] outputLocations
used to locate each type of output in the overall outputs. EX, WM, GS, NACS-CONTROL if any.


performedAction

protected short[][][] performedAction
action performed.


discount

protected double discount
the discount and reinforcement.


reinforcement

protected double reinforcement
the discount and reinforcement.


state

protected double[] state
current state and previous state


preState

protected double[] preState
current state and previous state

Constructor Detail

QBPNet

public QBPNet(int inputNum,
              int nIdx,
              Global g)
constructor.

Parameters:
inputNum - the number of input unit.
nIdx - the net index.
Method Detail

reinit

public void reinit()
reinitialize this network.


getState

public void getState(int[] fillMe)
returns current state.

Parameters:
fillMe - the array to store current state.

getQDiscount

public double getQDiscount()
returns the discount.

Returns:
the discount.

getReinforcement

public double getReinforcement()
returns the reinforcement.

Returns:
the reinforcement.

getQVal

public double getQVal(int index)
returns the Q-value of a particular output unit.

Parameters:
index - index on the particular output unit.
Returns:
the Q-value of a particular output unit.

getQVals

public void getQVals(double[] fillMe)
returns all of the Q-values (the outputs).

Parameters:
fillMe - the array to store the outputs.

getMaxQVals

public void getMaxQVals(double[] fillMe)
Returns the max Q-values of each output dimension.

Parameters:
fillMe - the array to store the maximal Q-values.

setState

public void setState(int[] curState)
sets current state using an array of active dimensional values.

Parameters:
curState - the array used to set current state.

setState

public void setState(double[] curState)
sets current state using an array of activations of all of the dimensional values.

Parameters:
curState - the array used to set current state.

setPreState

public void setPreState()
Sets the previous state as input to BPNet. used for normal Q-Learning.


setInput

public void setInput()
Sets the state as input to BPNet. used for normal Q-Learning.


setReinforcement

public void setReinforcement(double value)
sets currently received reinforcement.

Parameters:
value - the value used to set reinforcement.

setPerformedAction

public void setPerformedAction(short[][][] action)
Sets the performed action.

Parameters:
action - the performed action to set.

update

public void update(short[][][] action,
                   double feedback,
                   double[][][] qValue)
update routine used for simplified Q-Learning.

Parameters:
action - the performed action.
feedback - the currently received reinforcement.
qValue - the array of Q-values the action has.

update

public void update(short[][][] action,
                   double feedback,
                   double[][][] qVal1,
                   double[][][] qVal2)
update routine used for Q-Learning.

Parameters:
action - the performed action.
feedback - the currently received reinforcement.
qVal1 - the array of maximal Q-values after the action is performed.
qVal2 - the array of Q-values the action has before the action is performed.

backwardPass

public void backwardPass(double[][][] qValue1,
                         double[][][] qValue2)
Updates the neural network using QBP version.


backwardPass

public void backwardPass(double[][][] qValue)
Updates the neural network using simplified QBP version.

Parameters:
qValue - the array of Q-values the action has.

computeErrors

protected void computeErrors(double[][][] qValue1,
                             double[][][] qValue2)
computes the errors of output layer of QBP network.


computeErrors

protected void computeErrors(double[][][] qValue)
computes the errors of output layer of simplified QBP network.

Parameters:
qValue - the array of Q-values the action has.