2 years ago
#74183
Tom Lehrte
Filtering data frame through command line
I am currently writing a program to analyze data from an external file. I have already implemented some basic functions. Now I want to filter the data frame (for carrier, origin, and date). I know how to use loc functions etc... However, I am trying to filter the information through the command line (e.g. python flights.py --origin STR max distance flights.tsv). So I cannot set the values in the profram. Does anyone know how I could append my basic functions to make that work?
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("statistic", choices=["avg", "max"], help="Which statistic should be run?")
parser.add_argument("variable", choices=["distance", "delay"], help="What variable should be used for the calculation?")
parser.add_argument("tsvfile", help="Name of data file to be analyzed")
parser.add_argument("--carrier", dest="carrier", help="Comma-separated list of airline codes for those airlines whose flights should be included")
parser.add_argument("--date", dest="date", help="Departure dates for flights to be included")
parser.add_argument("--origin", dest="origin", help="Departure dates for flights to be included")
args = parser.parse_args()
import pandas as pd
df = pd.read_csv("flights.tsv", sep="\t")
df["ARRIVAL_DELAY"] = df["DEPARTURE_DELAY"] + df["ACTUAL_DURATION"] - df["PLANNED_DURATION"]
df2 = df.loc[df.ARRIVAL_DELAY > 0, :]
s = args.statistic
v = args.variable
t = args.tsvfile
if s == "avg" and v == "distance" and t == "flights.tsv":
distance=(df["DISTANCE"].mean())
print(round(distance,1))
elif s == "avg" and v == "delay" and t == "flights.tsv":
delay = (df2['ARRIVAL_DELAY'].sum() / 280384)
print(round(delay,1))
elif s == "max" and v == "delay":
print(df["DEPARTURE_DELAY"].max())
elif s == "max" and v == "distance" and t == "flights.tsv":
print(df["DISTANCE"].max())
python
pandas
dataframe
command-line-arguments
0 Answers
Your Answer