Welcome back to to MOTW! This week, we'll be continuing from part 1, exploring some more of std::fs. Let's see what Rudgal has been up to!

The module

std::fs is the standard Rust module for interacting with a Filesystem. It handles a lot of the messy work when it comes to all the different ways that the underlying OS likes to handle files. Creating, Reading, Updating, Deleting, it's all there! Not every operation is supported by every OS, but this guide will try to remain as platform agnostic as possible.

The story so far

Rudgal The Delirious is the local Ornithologist. Armed with his trusty camera he's been tasked by the local Government to catalog and identify all of the birds in his local area. As he has to identify not only the species but the individual birds he's ended up with quite a few photos!

Last week Rudgal had laid the groundwork for his photo-reoganizing project. Files were filtered, directories iterated and recursed. But not that kind of curse. Yet.

Grouping files by modification time

Rudgal had decided that the most optimal and efficient way to sort these photos was to group them by the year, month and day that they were taken. After perusing the metadata docs the closest thing he could find was the modified function, which returns a SystemTime struct.

After consulting Suwyss the Mathematical's Guide to Time Magic, he was able to put together a pretty decent function to convert from SystemTime to something far more useful.

First! The far more useful part.

// Let Rust derive all the things I need for BTreeMap
#[derive(PartialEq, Eq, PartialOrd, Ord, Debug)]
struct RudgalDate {
    year: u64,
    month: u64,
    day: u64
}

Meet RudgalDate, the totally original and useful structure for grouping files together. Note especially the derive line. PartialEq, Eq, PartialOrd, Ord are useful when you want to use a struct as a key in order to sort some things. Say, in something that could group things together by said key. More on that later! First, Rudgal needs to convert from SystemTime to his RudgalDate:

Warning

In Rudgal's world there are no such things as leap days, leap seconds, or timezones. They do exist in our world so taking this approach is not recommended. Use the chrono crate for real-world scenarios.

const YEAR_SECONDS: u64 = 31556926;
const MONTH_SECONDS: u64 = 2629743;
const DAY_SECONDS: u64 = 86400;

fn systime_to_tuple(t: SystemTime) -> RudgalDate {
    // This is an approximate, do not use in any sort of production
    let secs = t.duration_since(SystemTime::UNIX_EPOCH).unwrap().as_secs();
    let years = secs / YEAR_SECONDS;
    let months = (secs % YEAR_SECONDS) / MONTH_SECONDS;
    let days = ((secs % YEAR_SECONDS) % MONTH_SECONDS) / DAY_SECONDS;
    RudgalDate {
        year:1970+years, 
        month: 1+months, 
        day: 1+days
    }
}

Good enough for me!

Said Rudgal, without having written any unit tests.

Now that he had his RudgalDate, it was time to group the files together by said RudgalDates. After poking around std::collections he came across BTreeMap. Digging deep into his dusty memories of Structures and Algorithms, he was able to remember that BTreeMaps act like regular Maps, but can keep the Keys in order. Nifty when you're dealing with time!

fn group_files_by_date(paths: Vec<PathBuf>) -> 
    io::Result<BTreeMap<RudgalDate, HashSet<PathBuf>>>
{
    let mut result = BTreeMap::new();
    paths.clone().into_iter()
        .for_each(|p| {
            let modified = fs::metadata(&p).and_then(|m| m.modified()).unwrap();
            let map = result.entry(systime_to_tuple(modified))
                .or_insert(HashSet::new());
            map.insert(p);
        });
    Ok(result)
}

Rudgal sat back in his sustainably-sourced Vegan leather office throne, walking through the code one line at a time to make sure he understood what he had written. He was particularly fond of entry.

Most excellent! It acts like a pointer to a bucket, even if there's nothing in it yet!

// Rudgal had decided a constant path isn't very flexible
let image_source = "./images";
// See the last part of std::fs for iter_dirs
let files = iter_dirs(&Path::new(image_source))?;
let files_count = files.len();
println!("There are {} images!", files_count);

let grouped = group_files_by_date(files)?;
let files_per_key = grouped.iter()
        .map(|(k, v)| (k, v.len()))
        .collect::<Vec<_>>();
println!("Some dates from the files are {:?}...", &files_per_key[0..3]);

Considering this a good enough test to start with, Rudgal runs the code.

There are 10245 images!
Some dates from the files are [(RudgalDate { year: 2015, month: 4, day: 3 }, 84), (RudgalDate { year: 2015, month: 7, day: 30 }, 85), (RudgalDate { year: 2016, month: 1, day: 9 }, 76)]...

Success!

Hard linking one path to another

It's hard for Rudgal to shake his Bureaucratic training. The years studying the tenets of Efficiency with the venerable Monks of Productivity had left their mark.

I can't just have duplicate photos flying around! That would cost me valuable disk space.

Gasp, Rudgal. Gasp.

After a thorough several minutes worth of research, he comes across std::fs::hard_link. Files in two places?! This is olde Filesystem magic, but it would do. At least while he's testing.

fn link_files(link_root: &Path, files: &BTreeMap<RudgalDate, HashSet<PathBuf>>)
    -> io::Result<()>
{
    // More to add here later

    let mut link_root = link_root.to_path_buf();
    for (date, date_files) in files {
        link_root.push(format!("{}/{}/{}", date.year, date.month, date.day));
        for file in date_files {
            link_root.push(file.file_name().and_then(|f|f.to_str()).unwrap());
            fs::hard_link(file, &link_root)?;
            link_root.pop();
        }
        // Because we add three levels (year, month and day),
        // we have to remove three levels as well!
        link_root.pop();
        link_root.pop();
        link_root.pop();
    }

    Ok(())
}

Running his new code with the highest expectation of excellence, Rudgal soon had his mood crushed like the skulls of so many enemies.

Error: Os { code: 2, kind: NotFound, message: "No such file or directory" }

No such file or directory?! Impossible!

Easy there, champ. You can fix that.

Creating directories recursively

Of course! He simply forgot to create the directories before trying to link the file into that directory. A simple modification to his code (and a touch of create_dir_all) gave him the fix he needed!

for file in date_files {
    // Ensure the destination exists already!
    fs::create_dir_all(&link_root)?;

    link_root.push(file.file_name().and_then(|f|f.to_str()).unwrap());
    fs::hard_link(file, &link_root)?;
    link_root.pop();
}

Perfect! Nothing could possibly go wrong this time.

Error: Os { code: 17, kind: AlreadyExists, message: "File exists" }

Oops. Well, at least there's an easy fix to this.

Deleting a folder recursively

Rudgal had been running his code repeatedly as he was testing, but it had never occurred to him that he wouldn't be able to have two files occupy the same place.

Burn it! Raze the directories to the ground!

Well, electronically speaking. Another modification!

...
if link_root.exists() {
    println!("Removing directory {:?} and everything below it", link_root);
    fs::remove_dir_all(&link_root)?; 
}

let mut link_root = link_root.to_path_buf();
...

And with a little code to run it:

println!("Linking the pictures");
link_files(&image_link_dest, &grouped)?;

let link_count = iter_dirs(&image_link_dest)?.len();
println!("{} images linked", link_count);
Linking the pictures
Removing directory "./images_links" and everything below it
10245 images linked

Success! Rudgal is now free to explore the folders and ensure that the images were sorted as he desired. Once he was sure, it was time to make the sorting permanent.

Copying a file

Now that he was sure he was pleased with the future layout of his photos, Rudgal decided to use precious bytes to copy the files into their new layout. After a few minutes of toying with std::fs::copy, he realized that his code looked very similar to his approach to linking the file!

fn copy_files(copy_root: &Path, files: &BTreeMap<RudgalDate, HashSet<PathBuf>>)
    -> io::Result<()>
{
    if copy_root.exists() {
        println!("Removing directory {:?} and everything below it", copy_root);
        fs::remove_dir_all(&copy_root)?;
    }

    let mut copy_root = copy_root.to_path_buf();
    for (date, date_files) in files {
        copy_root.push(format!("{}/{}/{}", date.year, date.month, date.day));
        for file in date_files {
            // Ensure the destination exists already!
            fs::create_dir_all(&copy_root)?;
            copy_root.push(file.file_name().and_then(|f|f.to_str()).unwrap());

            // Copy from the original to the newly-sorted path
            fs::copy(file, &copy_root)?;

            copy_root.pop();
        }
        copy_root.pop();
        copy_root.pop();
        copy_root.pop();
    }

    Ok(())
}

Having learned his lesson about creating directories and deleting files before running his code again, he was able to skip right to copying the files. Let's see if he was able to make it work!

println!("Copying the pictures");
copy_files(&image_dest, &grouped)?;
let copied_count = iter_dirs(&image_dest)?.len();
println!("{} images copied", copied_count);
Copying the pictures
Removing directory "./images_sorted" and everything below it
10245 images copied

Perfect!

Conclusion

This is end of part 2 of std::fs! In this part we covered working with file creation times, directory manipulation, hardlinking files and copying files to new locations. If you'd like to see Rudgal's code for this part, it's posted here.

Next week

Stay tuned next week for the conclusion of std::fs:

File. Writing.