How much do you nitpick other people's code?

johnnylarner

johnnylarner

Posted on September 28, 2023

How much do you nitpick other people's code?

I recently watched a video posted by Prime Reacts based on this blog by Dan Lew on why nitpicking colleagues' code is bad. For those not familiar with the term, Dan Lew defines it as flagging small, insignificant problems in a code review that are not wrong but suboptimal. He also argues that fixing the "problems" will make you, the grouchy reviewer, feel smug but won't benefit your code base or your team's opinion of you.

Watching this video made me feel a petty criminal listening to a convict showing genuine remorse for a crime that I too had committed. Fortunately for me, I haven't been coding long enough to piss off hundreds of colleagues yet. But to those of you whose code I nitpicked - I apologise.

Upon reflection I realised that my previous career paths before software engineering had baked into me a culture of nitpicking. First as an editor, my task was to locate and remove language inconsistencies as well as create a coherent tone across an article. Then, as a compliance analyst I was tasked with nitpicking colleagues' work for typos, document dates and other trivialities that make you question the point of life.

But from here on out I'm a changed man. For the reasons clearly laid out in Dan's blog, I am turning a new leaf and accepting that code perfection does not make code better!

Now it's your turn

The reason you probably opened this blog was because you sought an answer to the question in the title. I've collected 4 code snippets and for each of them provided one or multiple issues. For relative consistency, there are no possible other issues (though please leave a comment if you want). Select things you would seriously considering leaving as feedback in a code review. At the end the blog I'll leave my two scores: my old self versus my new self. See how you fare.

Quiz

Question 1

def add_features(trip_df: DataFrame, zone_df: DataFrame) -> DataFrame:
    """Returns a pandas DataFrame containing
    all the features derived from NYC
    Taxi dataset to answer all business questions.
    """
    # Rename columns
    trip_df = rename_columns_as_lowercase(trip_df)
    zone_df = rename_columns_as_lowercase(zone_df)
    trip_df = update_payment_type_as_string_values(trip_df)

    # Add features
    trip_df = add_borough_and_zone(trip_df, zone_df, "pulocationid")
    trip_df = add_borough_and_zone(trip_df, zone_df, "dolocationid")
    return trip_df
Enter fullscreen mode Exit fullscreen mode

Possible issues

  • Duplicated function calls can be refactored into loops
  • Comments are redundant or misleading
  • Function name is not descriptive

Question 2

def build_payload(
    train_type: str, 
    train_age: int, 
    train_length: float, 
    train_is_diesel: bool
) -> dict[str, str]:

    input_parameters = [
        train_type,
        train_age,
        train_length,
        train_is_diesel,
    ]

    payload_names = [
        "train_type",
        "train_age",
        "train_length",
        "train_is_diesel",
    ]
    payload = {}

    # Set payload defaults
    current_time = datetime.datetime.now()
    source = "pipeline"

    for i, p_name in enumerate(payload_names):
        if p_name == "train_is_diesel":
            if train_is_diesel:
                payload["engine_type"] = "diesel"

        else:
            payload[p_name] = input_parameters[i]

    return payload
Enter fullscreen mode Exit fullscreen mode

Possible issues

  • Declare default values as global values for transparency
  • Zipping is a more intuitive way to iterate over the lists, so replace enumerate with zip
  • Refactoring the boolean function argument would make the payload mapping static
  • Current time makes the function indeterministic, it should be parameterised

Question 3

def write_df(df: DataFrame) -> None:
    # Select columns
    small_df = df.select("name", "age", "dob", "gender")

    # Cache df
    df.cache()

    df.write.csv("full_data.csv", df)
    small_df.write.csv("subset_data.csv", df)
Enter fullscreen mode Exit fullscreen mode

Possible issues

  • This function should do something generic but lack of parameterisation makes it not possible to reuse
  • Small df isn't an expressive name
  • Comments don't give us useful information

Question 4

class DataRowExtractor:
    def __init__(self):
        pass

    def extract_rows(self, data) -> dict[str, Any]:
        extracted_rows = []
        for i, row in data.iterrows()
            extracted_data.append(row)
        return extracted_rows

def main():
    csv_path = os.path.join("/working_dir", "data", "table_data.csv")
    write_path = os.path.join("/working_dir", "data", "row_data.json")

    data = pd.read_csv(csv_path)
    data_extractor = DataRowExtractor()

    row_data = data_extractor.extract_rows(data)
    with open(write_path) as f:
        json_data = f"{row_data}"
        f.write(row_data)

Enter fullscreen mode Exit fullscreen mode

Possible issues

  • OS Path module is verbose and should be replaced with the new pathlib Path API
  • The RowExtractor doesn't add any real abstraction and overcomplicates the code
  • Pandas has a built in method to create records from a dataframe
  • Data is not a descriptive name

Have I really changed?

As promised, I'd give you my scores:

question old_me new_me
1 1 1
2 3 1
3 2 1
4 4 2
Total 10 5

Conclusion

Now obviously I rigged the results to make my improvement look and feel real. Jokes aside, time (and colleagues) will tell whether I really improve at not nitpicking!

One instructive aspect I tried to get across in the examples is that - perhaps in a production environment - the bugs in question 2 and 4 (good work if you found them) may have gone unseen by a reviewer focussed on scrutinising irrelevant banalities.

This is particularly relevant for question two: if we focus on reviewing the design of the function as a whole, we see that by replacing a boolean argument with a string, we can simply return a dictionary or maybe even just drop the function all together. Simple design is less bug prone, and this case we'd avoid the bug in our nested if statement.

Now we can see concretely why nitpicking is dangerous. But what if (like me) - despite having only picked out critical issues in a code review - you still found a bunch of nits want to pick ( I'm thinking here about code simplification through better library knowledge i.e. with extract_rows) ? I spoke to a family friend with 25> years experience coding about this and he suggested offering your nits in a non-committal to your colleague (I found nits, if you want to discuss them, we can). This makes the feedback loop something your colleague controls and could turn nits into valuable insights.

💖 💪 🙅 🚩
johnnylarner
johnnylarner

Posted on September 28, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related