I have a folder lets say "Word Assignment 1". It contains 50 files each belonging to different student in my class. They are pdf files. They are named as xxxxxxxxxxx-name of the student-PHASE 1-MS WORD-ASSIGNMENT 1. The xxxxx represent the register number of the student and name of the file changes for each file. I have a Excel file that has register numbers and their corresponding student name. The names given by each student during submission of the pdfs are different from the required format. I want the filename as mentioned above. I
I am in need of a script either in Python or Bash to rename the files by comparing the register number (which is in the first position of all files) with excel sheet and fetching the name and renaming the file according to the format
I tried to use bash but I have bo idea how to search with the excel sheet available and for different files in the folder.
CodePudding user response:
In the following solution, I've made certain assumptions that you may not satisfy.
- I've supposed the students' IDs are only numeric characters. If that is not the case, please change
df["id"] == int(student_id)
todf["id"] == student_id
- I've assumed the column name where you store the students' IDs is
id
, if that is not the case, please changedf["id"]
todf["your_column_name"]
. - Similarly for the students' names column name, if it is not
name
, please changedf.iloc[id_]["name"]
todf.iloc[id_]["your_column_name"]
- Here, the folder named
Word Assignment 1
is located in the same folder as the script. If that is not the case, please change thepath
variable to the absolute (or relative) path to said folder.
Solution:
import os
import pandas as pd
from typing import List
filename: str = "your_file.xlsx"
path: str = "./Word Assignment 1"
df: pd.DataFrame = pd.DataFrame(pd.read_excel(filename, sheet_name=0))
files: List[str] = [f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))]
for file in files:
student_id: str = file.split("-")[0]
id_: int = df.index[df["id"] == int(student_id)].tolist()[0]
name: str = df.iloc[id_]["name"]
os.rename(os.path.join(path, file), os.path.join(path, f"{student_id}-{name}-PHASE 1-MS WORD-ASSIGNEMENT 1.pdf"))
CodePudding user response:
Keep it simple. Save the sheet as a csv - I assume you can create a tabbed file with just ID/Name. If I call it ID.csv -
while read id name; do mv "$id"* "$id-$name-PHASE 1-MS WORD-ASSIGNMENT 1.pdf"; done < ID.csv
This assumes ID is a known and fixed length and none of the files have a following character that can misidentify and/or return multiple entries.
If the name has embedded whitespace, best to avoid including subsequent fields. If that's not an option, make sure your delimiter is distinct, such as a pipe character which doesn't usually show up in names, and set it in front of the read
.
while IFS="|" read id name _ # everything after the second comma goes into _
If you have quotes in the fields it may take some more tweaking...
Just as commentary, I recommend you avoid embedding spaces and such in filenames, but only as a habit to make things like this easier.