Hi Erin,
You're facing multiple issues here: a problem related to torch.distributed.launch
and another concerning data loading and formatting. Let's tackle them one by one.
torch.distributed.launch vs. torchrun
Both torch.distributed.launch
and torchrun
are used for distributed training, but torchrun
is newer and generally simpler to use.
-
torch.distributed.launch
: It's an older utility and requires you to pass the--local_rank
argument manually to your script. -
torchrun
: Provides a simpler interface and doesn't require manual handling of--local_rank
.
If you're facing issues with torch.distributed.launch
, switching to torchrun
might simplify things. The equivalent command would be:
Dataset Formatting
Your load_catalyst800
function seems to read a CSV file into a Pandas DataFrame, convert it to a NumPy array, and finally convert it to a PyTorch tensor. While this approach does produce a tensor, it's not quite a PyTorch Dataset, which often comes with built-in methods for batching, shuffling, and transformations.
To convert your tensor into a Dataset, you can use the TensorDataset
class from PyTorch:
After this, you can create a DataLoader:
Debugging Steps
-
Error Messages: The error logs indicate that
local_rank
is causing problems. Ensure that yourtrain.py
script can accept--local_rank
as an argument. - Data Format: Verify if your data is formatted correctly for PyTorch processing. Is it a classification problem? Are labels included in your data tensor?
- Library Compatibility: Make sure that the PyTorch version you're using is compatible with the GitHub repo you're trying to modify. Library inconsistencies often lead to unexpected errors.
If you found this answer useful, I would sincerely appreciate a positive review as I am new on Wyzant and reviews are incredibly helpful for me.
Thank you,
Benjamin M.
Erin O.
ok so I think the problem was ultimately that I was trying to run using NVIDIA GPUs which require installing the NVIDIA CUDA toolkit. My research indicates that this tool is no longer compatible with Apple computers? I do still need to try running on CPUs. I’m not seeing where I’m able to leave you a review? I might only be able to do that after a lesson? I’ve only seen that prompt after a lesson with a tutor.11/05/23
Erin O.
Absolutely! Let me take a look at your suggestions and I would be happy to leave a review:)10/27/23