We have multiple big data jobs. Some of these big data jobs create multiple folders in /tmp/.
These files and folders are created by hive/spark jobs.
These folders end in "_resources". So, "*_resources".
Now, our platform will give issues if we don't delete these folders/files.
We want to delete these files to avoid any issues with the platform.
How can we do that using either python or shell script? Shell script would be a better options or us.
Thank you.
CodePudding user response:
You can use the find command
find / -mtime 30 -user username -exec rm -r {} \;
/
the path of the parent directory30
the number of days to go back, for example, if you put -mtime
5, it will delete everything OLDER then 5 days. (if you want exact hours see themmin
belowusername
the name of the user to delete the files for
you can also use the
-group
instead of the-user
if you grouped all those user in one group
I suggest you print the paths found by the command before doing the actual delete, just to make sure it is what you want
find / -mtime 30 -user username
If you are concerned about exact Y hours instead of days you can use the -mmin
instead of the -mtime
, it takes minutes instead of days, e.g:
find / -mmin 120 -user username -exec rm -r {} \;
the previous command will delete anything owned by user username
that is older than 120 minutes
(2 hours)
CodePudding user response:
Suggesting to dry-run/test find
output command, before execution irreversible deletions.few times to gain confidence. Then automate scheduling with cron
.
find / -mtime 30 -user username -type d -name "*_resources" -printf "rm -rf %p\n"
When you are ready and find
located the correct directories, and constructed the correct rm
commands.
Execute all constructed rm
commands at once with:
bash <<< $(find / -mtime 30 -user username -type d -name "*_resources" -printf "rm -rf %p\n")
CodePudding user response:
man find
-daystart Measure times (for -amin, -atime, -cmin, -ctime, -mmin, and -mtime) from the beginning of today rather than from 24 hours ago. This option only affects tests which appear later on the command line.
-mmin when the file was modified in minutes
hours=2
find /tmp -type d -mmin -$((60*$hours)) -user username -print0|xargs -I dir -0 rm -rf "dir"