public class RoleHistory extends Object
Synchronization policy: all public operations are synchronized. Protected methods are in place for testing -no guarantees are made.
Inner classes have no synchronization guarantees; they should be manipulated in these classes and not externally.
Note that as well as some methods marked visible for testing, there
is the option for the time generator method, now() to
be overridden so that a repeatable time series can be used.
| Modifier and Type | Field and Description |
|---|---|
protected static org.slf4j.Logger |
log |
| Constructor and Description |
|---|
RoleHistory(Collection<RoleStatus> roles,
AbstractClusterServices recordFactory)
Instantiate
|
| Modifier and Type | Method and Description |
|---|---|
void |
addNewRole(RoleStatus roleStatus)
Add a new role
|
Map<CharSequence,Integer> |
buildMappingForHistoryFile()
Build the mapping entry for persisting to the role history
|
void |
buildRecentNodeLists()
(After the start), rebuild the availability data structures
|
List<AbstractRMOperation> |
cancelOutstandingAARequests()
Escalate operation as triggered by external timer.
|
List<AbstractRMOperation> |
cancelRequestsForRole(RoleStatus role,
int toCancel)
Cancel a number of outstanding requests for a role -that is, not
actual containers, just requests for new ones.
|
boolean |
canPlaceAANodes()
Does the RoleHistory have enough information about the YARN cluster
to start placing AA requests? That is: has it the node map and
any label information needed?
|
NodeMap |
cloneNodemap()
Get a clone of the nodemap.
|
List<NodeInstance> |
cloneRecentNodeList(int role)
Get a clone of the available list
|
void |
dump()
Print the history to the log.
|
List<AbstractRMOperation> |
escalateOutstandingRequests()
Escalate operation as triggered by external timer.
|
List<NodeInstance> |
findNodeForNewAAInstance(RoleStatus role)
Find a node for use
|
NodeInstance |
findRecentNodeForNewInstance(RoleStatus role)
Find a node for use
|
int |
getClusterSize()
Get the total size of the cluster -the number of NodeInstances
|
NodeInstance |
getExistingNodeInstance(org.apache.hadoop.yarn.api.records.Container container)
Get the node instance of a container if there's an entry in the history
|
NodeInstance |
getExistingNodeInstance(String hostname)
Get the node instance of a host if defined
|
org.apache.hadoop.fs.Path |
getHistoryPath()
Get the path used for history files
|
NodeInformation |
getNodeInformation(String hostname,
Map<Integer,String> naming)
Get the information on a node
|
Map<String,NodeInformation> |
getNodeInformationSnapshot(Map<Integer,String> naming)
Get snapshot of the node map
|
long |
getNodesUpdatedTime()
Get the last time the nodes were updated from YARN
|
NodeEntry |
getOrCreateNodeEntry(org.apache.hadoop.yarn.api.records.Container container)
Get the node entry of a container
|
NodeInstance |
getOrCreateNodeInstance(org.apache.hadoop.yarn.api.records.Container container)
Get the node instance of a container -always returns something
|
NodeInstance |
getOrCreateNodeInstance(String hostname)
Get the node instance for the specific node -creating it if needed
|
List<NodeInstance> |
getRecentNodesForRoleId(int id)
Get the nodes for an ID -may be null
|
int |
getRoleSize() |
long |
getSaveTime() |
long |
getStartTime() |
long |
getThawedDataTime() |
void |
insert(Collection<NodeInstance> nodes)
Insert a list of nodes into the map; overwrite any with that name.
|
boolean |
isDirty() |
List<NodeInstance> |
listActiveNodes(int role)
Get the list of active nodes ...
|
List<OutstandingRequest> |
listOpenRequests()
Get a snapshot of the outstanding placed request list
|
List<OutstandingRequest> |
listPlacedRequests()
Get a snapshot of the outstanding placed request list
|
ProviderRole |
lookupRole(int roleId)
Lookup a role by ID
|
protected boolean |
markContainerFinished(org.apache.hadoop.yarn.api.records.Container container,
boolean wasReleased,
boolean shortLived,
ContainerOutcome outcome)
Mark a container finished; if it was released then that is treated
differently.
|
protected long |
now()
Get current time.
|
void |
onBootstrap()
Handler for bootstrap event: there was no history to thaw
|
ContainerAllocationResults |
onContainerAllocated(org.apache.hadoop.yarn.api.records.Container container,
long desiredCount,
long actualCount)
A container has been allocated on a node -update the data structures
|
void |
onContainerAssigned(org.apache.hadoop.yarn.api.records.Container container)
A container has been assigned to a role instance on a node -update the data structures
|
void |
onContainerReleaseSubmitted(org.apache.hadoop.yarn.api.records.Container container)
A container release request was issued
|
void |
onContainerStarted(org.apache.hadoop.yarn.api.records.Container container)
Container start event
|
void |
onContainerStartSubmitted(org.apache.hadoop.yarn.api.records.Container container,
RoleInstance instance)
Event: a container start has been submitted
|
boolean |
onFailedContainer(org.apache.hadoop.yarn.api.records.Container container,
boolean shortLived,
ContainerOutcome outcome)
App state notified of a container completed -but as
it wasn't being released it is marked as failed
|
boolean |
onNodeManagerContainerStartFailed(org.apache.hadoop.yarn.api.records.Container container)
A container failed to start: update the node entry state
and return the container to the queue
|
boolean |
onNodesUpdated(List<org.apache.hadoop.yarn.api.records.NodeReport> updatedNodes)
Update failedNodes and nodemap based on the node state
|
boolean |
onReleaseCompleted(org.apache.hadoop.yarn.api.records.Container container)
App state notified of a container completed
|
boolean |
onStart(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path historyDir)
Start up
|
boolean |
onThaw()
Handle the start process after the history has been rebuilt,
and after any gc/purge
|
List<org.apache.hadoop.yarn.api.records.Container> |
prepareAllocationList(List<org.apache.hadoop.yarn.api.records.Container> allocatedContainers)
Perform any pre-allocation operations on the list of allocated containers
based on knowledge of system state.
|
protected void |
putRole(RoleStatus roleStatus)
safety check: make sure the role is unique amongst
the role stats...which is extended with the new role
|
int |
rebuild(LoadedRoleHistory loadedRoleHistory)
rebuild the placement history from the loaded role history
|
void |
register(MetricsAndMonitoring metrics)
Register all metrics with the metrics infra
|
OutstandingRequest |
requestContainerForAARole(RoleStatus role)
Find a node for an AA role and request an instance on that (or a location-less
instance)
|
OutstandingRequest |
requestContainerForRole(RoleStatus role)
Find a node for a role and request an instance on that (or a location-less
instance)
|
OutstandingRequest |
requestInstanceOnNode(NodeInstance node,
RoleStatus role,
org.apache.hadoop.yarn.api.records.Resource resource)
Request an instance on a given node.
|
protected void |
reset()
Reset the variables -this does not adjust the fixed attributes
of the history, but the nodemap and failed node map are cleared.
|
void |
resetFailedRecently()
reset the failed recently counters
|
void |
saved(long timestamp)
Tell the history that it has been saved; marks itself as clean
|
org.apache.hadoop.fs.Path |
saveHistory(long time)
Save the history to its location using the timestamp as part of
the filename.
|
org.apache.hadoop.fs.Path |
saveHistoryIfDirty()
Save the history with the current timestamp if it is dirty;
return the path saved to if this is the case
|
void |
setDirty(boolean dirty) |
void |
setThawedDataTime(long thawedDataTime) |
void |
touch()
Mark ourselves as dirty
|
public RoleHistory(Collection<RoleStatus> roles, AbstractClusterServices recordFactory) throws BadConfigException
roles - initial role listrecordFactory - yarn record factoryBadConfigExceptionprotected void reset()
throws BadConfigException
BadConfigExceptionpublic void register(MetricsAndMonitoring metrics)
metrics - metricsprotected void putRole(RoleStatus roleStatus) throws BadConfigException
roleStatus - roleArrayIndexOutOfBoundsExceptionBadConfigExceptionpublic void addNewRole(RoleStatus roleStatus) throws BadConfigException
roleStatus - new roleBadConfigExceptionpublic ProviderRole lookupRole(int roleId)
roleId - role Idpublic int rebuild(LoadedRoleHistory loadedRoleHistory) throws BadConfigException
loadedRoleHistory - loaded historyBadConfigException - if there is a problem rebuilding the statepublic long getStartTime()
public long getSaveTime()
public long getThawedDataTime()
public void setThawedDataTime(long thawedDataTime)
public int getRoleSize()
public int getClusterSize()
public boolean isDirty()
public void setDirty(boolean dirty)
public void saved(long timestamp)
timestamp - timestamp -updates the savetime fieldpublic NodeMap cloneNodemap()
public Map<String,NodeInformation> getNodeInformationSnapshot(Map<Integer,String> naming)
naming - naming map of priority to enty name; entries must be unique.
It's OK to be incomplete, for those the list falls back to numbers.public NodeInformation getNodeInformation(String hostname, Map<Integer,String> naming)
hostname - hostnamenaming - naming map of priority to enty name; entries must be unique.
It's OK to be incomplete, for those the list falls back to numbers.public NodeInstance getOrCreateNodeInstance(String hostname)
hostname - node addresspublic void insert(Collection<NodeInstance> nodes)
nodes - collection of nodes.protected long now()
public void touch()
public void resetFailedRecently()
public org.apache.hadoop.fs.Path getHistoryPath()
public org.apache.hadoop.fs.Path saveHistory(long time)
throws IOException
time - timestamp timestamp to use as the save timeIOException - IO problemspublic org.apache.hadoop.fs.Path saveHistoryIfDirty()
throws IOException
IOException - failed to save for some reasonpublic boolean onStart(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path historyDir)
throws BadConfigException
fs - filesystemhistoryDir - path in FS for historyBadConfigExceptionpublic void onBootstrap()
public boolean onThaw()
throws BadConfigException
BadConfigExceptionpublic void buildRecentNodeLists()
public List<NodeInstance> getRecentNodesForRoleId(int id)
id - role IDpublic NodeInstance findRecentNodeForNewInstance(RoleStatus role)
role - rolepublic List<NodeInstance> findNodeForNewAAInstance(RoleStatus role)
role - rolepublic OutstandingRequest requestInstanceOnNode(NodeInstance node, RoleStatus role, org.apache.hadoop.yarn.api.records.Resource resource)
The role status entries will also be tracked
Returns the request that is now being tracked. If the node instance is not null, it's details about the role is incremented
node - node to target or null for "any"role - role to requestpublic OutstandingRequest requestContainerForRole(RoleStatus role)
role - role statuspublic OutstandingRequest requestContainerForAARole(RoleStatus role)
role - role statuspublic List<NodeInstance> listActiveNodes(int role)
O(nodes)role - role indexpublic NodeEntry getOrCreateNodeEntry(org.apache.hadoop.yarn.api.records.Container container)
container - container to look upRuntimeException - if the container has no hostnamepublic NodeInstance getOrCreateNodeInstance(org.apache.hadoop.yarn.api.records.Container container)
container - container to look upRuntimeException - if the container has no hostnamepublic NodeInstance getExistingNodeInstance(String hostname)
hostname - hostname to look upRuntimeException - if the container has no hostnamepublic NodeInstance getExistingNodeInstance(org.apache.hadoop.yarn.api.records.Container container)
container - container to look upRuntimeException - if the container has no hostnamepublic List<org.apache.hadoop.yarn.api.records.Container> prepareAllocationList(List<org.apache.hadoop.yarn.api.records.Container> allocatedContainers)
allocatedContainers - list of allocated containerspublic ContainerAllocationResults onContainerAllocated(org.apache.hadoop.yarn.api.records.Container container, long desiredCount, long actualCount)
container - containerdesiredCount - desired #of instancesactualCount - current count of instancespublic void onContainerAssigned(org.apache.hadoop.yarn.api.records.Container container)
container - containerpublic void onContainerStartSubmitted(org.apache.hadoop.yarn.api.records.Container container,
RoleInstance instance)
container - container being startedinstance - instance bound to the containerpublic void onContainerStarted(org.apache.hadoop.yarn.api.records.Container container)
container - container that just startedpublic boolean onNodeManagerContainerStartFailed(org.apache.hadoop.yarn.api.records.Container container)
container - container that failedpublic boolean canPlaceAANodes()
public long getNodesUpdatedTime()
public boolean onNodesUpdated(List<org.apache.hadoop.yarn.api.records.NodeReport> updatedNodes)
updatedNodes - list of updated nodespublic void onContainerReleaseSubmitted(org.apache.hadoop.yarn.api.records.Container container)
container - container submittedpublic boolean onReleaseCompleted(org.apache.hadoop.yarn.api.records.Container container)
container - completed containerpublic boolean onFailedContainer(org.apache.hadoop.yarn.api.records.Container container,
boolean shortLived,
ContainerOutcome outcome)
container - completed containershortLived - was the container short lived?outcome - protected boolean markContainerFinished(org.apache.hadoop.yarn.api.records.Container container,
boolean wasReleased,
boolean shortLived,
ContainerOutcome outcome)
touch()-edcontainer - completed containerwasReleased - was the container released?shortLived - was the container short lived?outcome - public void dump()
public Map<CharSequence,Integer> buildMappingForHistoryFile()
public List<NodeInstance> cloneRecentNodeList(int role)
role - role indexpublic List<OutstandingRequest> listPlacedRequests()
public List<OutstandingRequest> listOpenRequests()
public List<AbstractRMOperation> escalateOutstandingRequests()
public List<AbstractRMOperation> cancelOutstandingAARequests()
public List<AbstractRMOperation> cancelRequestsForRole(RoleStatus role, int toCancel)
role - roletoCancel - number to cancelCopyright © 2014–2015 The Apache Software Foundation. All rights reserved.