Filtering data frame through command line

2 years ago

#74183

Tom Lehrte

I am currently writing a program to analyze data from an external file. I have already implemented some basic functions. Now I want to filter the data frame (for carrier, origin, and date). I know how to use loc functions etc... However, I am trying to filter the information through the command line (e.g. python flights.py --origin STR max distance flights.tsv). So I cannot set the values in the profram. Does anyone know how I could append my basic functions to make that work?

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("statistic", choices=["avg", "max"], help="Which statistic should be run?")
parser.add_argument("variable", choices=["distance", "delay"], help="What variable should be used for the calculation?")
parser.add_argument("tsvfile", help="Name of data file to be analyzed")
parser.add_argument("--carrier", dest="carrier", help="Comma-separated list of airline codes for those airlines whose flights should be included")
parser.add_argument("--date", dest="date", help="Departure dates for flights to be included")
parser.add_argument("--origin", dest="origin", help="Departure dates for flights to be included")
args = parser.parse_args()

import pandas as pd
df = pd.read_csv("flights.tsv", sep="\t")
df["ARRIVAL_DELAY"] = df["DEPARTURE_DELAY"] + df["ACTUAL_DURATION"] - df["PLANNED_DURATION"]
df2 = df.loc[df.ARRIVAL_DELAY > 0, :]

s = args.statistic
v = args.variable
t = args.tsvfile


if s == "avg" and v == "distance" and t == "flights.tsv":
    distance=(df["DISTANCE"].mean())
    print(round(distance,1))
elif s == "avg" and v == "delay" and t == "flights.tsv":
    delay = (df2['ARRIVAL_DELAY'].sum() / 280384)
    print(round(delay,1))   
elif s == "max" and v == "delay":
    print(df["DEPARTURE_DELAY"].max())
elif s == "max" and v == "distance" and t == "flights.tsv":
    print(df["DISTANCE"].max())

python

pandas

dataframe

command-line-arguments

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs