How much do you nitpick other people's code?

I recently watched a video posted by Prime Reacts based on this blog by Dan Lew on why nitpicking colleagues' code is bad. For those not familiar with the term, Dan Lew defines it as flagging small, insignificant problems in a code review that are not wrong but suboptimal. He also argues that fixing the "problems" will make you, the grouchy reviewer, feel smug but won't benefit your code base or your team's opinion of you.

Watching this video made me feel a petty criminal listening to a convict showing genuine remorse for a crime that I too had committed. Fortunately for me, I haven't been coding long enough to piss off hundreds of colleagues yet. But to those of you whose code I nitpicked - I apologise.

Upon reflection I realised that my previous career paths before software engineering had baked into me a culture of nitpicking. First as an editor, my task was to locate and remove language inconsistencies as well as create a coherent tone across an article. Then, as a compliance analyst I was tasked with nitpicking colleagues' work for typos, document dates and other trivialities that make you question the point of life.

But from here on out I'm a changed man. For the reasons clearly laid out in Dan's blog, I am turning a new leaf and accepting that code perfection does not make code better!

Now it's your turn

The reason you probably opened this blog was because you sought an answer to the question in the title. I've collected 4 code snippets and for each of them provided one or multiple issues. For relative consistency, there are no possible other issues (though please leave a comment if you want). Select things you would seriously considering leaving as feedback in a code review. At the end the blog I'll leave my two scores: my old self versus my new self. See how you fare.

Quiz

Question 1

def add_features(trip_df: DataFrame, zone_df: DataFrame) -> DataFrame:
    """Returns a pandas DataFrame containing
    all the features derived from NYC
    Taxi dataset to answer all business questions.
    """
    # Rename columns
    trip_df = rename_columns_as_lowercase(trip_df)
    zone_df = rename_columns_as_lowercase(zone_df)
    trip_df = update_payment_type_as_string_values(trip_df)

    # Add features
    trip_df = add_borough_and_zone(trip_df, zone_df, "pulocationid")
    trip_df = add_borough_and_zone(trip_df, zone_df, "dolocationid")
    return trip_df

Possible issues

Duplicated function calls can be refactored into loops
Comments are redundant or misleading
Function name is not descriptive

Question 2

def build_payload(
    train_type: str, 
    train_age: int, 
    train_length: float, 
    train_is_diesel: bool
) -> dict[str, str]:

    input_parameters = [
        train_type,
        train_age,
        train_length,
        train_is_diesel,
    ]

    payload_names = [
        "train_type",
        "train_age",
        "train_length",
        "train_is_diesel",
    ]
    payload = {}

    # Set payload defaults
    current_time = datetime.datetime.now()
    source = "pipeline"

    for i, p_name in enumerate(payload_names):
        if p_name == "train_is_diesel":
            if train_is_diesel:
                payload["engine_type"] = "diesel"

        else:
            payload[p_name] = input_parameters[i]

    return payload

Possible issues

Declare default values as global values for transparency
Zipping is a more intuitive way to iterate over the lists, so replace enumerate with zip
Refactoring the boolean function argument would make the payload mapping static
Current time makes the function indeterministic, it should be parameterised

Question 3

def write_df(df: DataFrame) -> None:
    # Select columns
    small_df = df.select("name", "age", "dob", "gender")

    # Cache df
    df.cache()

    df.write.csv("full_data.csv", df)
    small_df.write.csv("subset_data.csv", df)

Possible issues

This function should do something generic but lack of parameterisation makes it not possible to reuse
Small df isn't an expressive name
Comments don't give us useful information

Question 4

class DataRowExtractor:
    def __init__(self):
        pass

    def extract_rows(self, data) -> dict[str, Any]:
        extracted_rows = []
        for i, row in data.iterrows()
            extracted_data.append(row)
        return extracted_rows

def main():
    csv_path = os.path.join("/working_dir", "data", "table_data.csv")
    write_path = os.path.join("/working_dir", "data", "row_data.json")

    data = pd.read_csv(csv_path)
    data_extractor = DataRowExtractor()

    row_data = data_extractor.extract_rows(data)
    with open(write_path) as f:
        json_data = f"{row_data}"
        f.write(row_data)

Possible issues

OS Path module is verbose and should be replaced with the new pathlib Path API
The RowExtractor doesn't add any real abstraction and overcomplicates the code
Pandas has a built in method to create records from a dataframe
Data is not a descriptive name

Have I really changed?

As promised, I'd give you my scores:

question	old_me	new_me
1	1	1
2	3	1
3	2	1
4	4	2
Total	10	5

Conclusion

Now obviously I rigged the results to make my improvement look and feel real. Jokes aside, time (and colleagues) will tell whether I really improve at not nitpicking!

One instructive aspect I tried to get across in the examples is that - perhaps in a production environment - the bugs in question 2 and 4 (good work if you found them) may have gone unseen by a reviewer focussed on scrutinising irrelevant banalities.

This is particularly relevant for question two: if we focus on reviewing the design of the function as a whole, we see that by replacing a boolean argument with a string, we can simply return a dictionary or maybe even just drop the function all together. Simple design is less bug prone, and this case we'd avoid the bug in our nested if statement.

Now we can see concretely why nitpicking is dangerous. But what if (like me) - despite having only picked out critical issues in a code review - you still found a bunch of nits want to pick ( I'm thinking here about code simplification through better library knowledge i.e. with extract_rows) ? I spoke to a family friend with 25> years experience coding about this and he suggested offering your nits in a non-committal to your colleague (I found nits, if you want to discuss them, we can). This makes the feedback loop something your colleague controls and could turn nits into valuable insights.