[ad_1]
I have a hdfs server where I am currently streaming.
I also hit this server with the following type command regularly to check for certain conditions: hdfs dfs -find /user/cdh/streameddata/ -name *_processed
however, I have started to see this command taking a massive portion of my CPU when monitoring in TOP:
cdh 16919 1 99 13:03 ? 00:43:45 /opt/jdk/bin/java -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop -Dhadoop.id.str=cdh -Dhadoop.root.logger=ERROR,DRFA -Djava.library.path=/opt/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.fs.FsShell -find /user/cdh/streameddata/ -name *_processed
This is causing other applications to stall, and is having a massive impact on my application on the whole.
My server is a 48 core server, I did not expect this to be an issue.
Currently, I have not set any additional heap in hadoop, so it is using the 1000MB default.
[ad_2]
لینک منبع