问题

数据处理需要根据日期过滤,但是读入 dataframe 的日期数据是 Object 类型,无法进行排序和过滤。

解决方法

将 Object 类型的日期字段转换成时间类型,然后再进行过滤、排序等操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import pandas as pd
from datetime import datetime, timedelta

# 读取 csv
df = pd.read_csv('test.csv', low_memory=False)
# >>> df
# id quote author date
# 0 656222 Every moment is a fresh beginning. T.S Eliot 2020-11-11 11:22
# 1 346927 Everything you can imagine is real. Pablo Picasso 2020-11-12 12:23
# 2 995443 Whatever you do, do it well. Walt Disney 2020-11-13 13:24
# 3 281277 When words fail, music speaks. Shakespeare 2020-11-14 14:25
# 4 603115 If God did not exist, it would be necessary to... Voltaire 2020-11-15 15:26
# 5 605452 Brevity is the soul of wit. Shakespeare 2020-11-16 16:27
# >>> df.dtypes
# id int64
# quote object
# author object
# date object
# dtype: object


# 转换字段类型
df['date'] = pd.to_datetime(df['date'])
df[['quote', 'author']] = df[['quote', 'author']].astype(str)
# >>> df.dtypes
# id int64
# quote object
# author object
# date datetime64[ns]
# dtype: object


# 过滤
l = datetime.strptime('2020-11-11', '%Y-%m-%d') # 2020-11-11 00:00
r = datetime.strptime('2020-11-12', '%Y-%m-%d') + timedelta(days=1) # 即 2020-11-13 00:00
mask = (df['date'] >= l) & (df['date'] <= r)
df.loc[mask]
# >>> df.loc[mask]
# id quote author date
# 0 656222 Every moment is a fresh beginning. T.S Eliot 2020-11-11 11:22:00
# 1 346927 Everything you can imagine is real. Pablo Picasso 2020-11-12 12:23:00


# 排序
df.sort_values(by=['date'], ignore_index=True)
# >>> df.sort_values(by=['date'], ignore_index=True)
# id quote author date
# 0 656222 Every moment is a fresh beginning. T.S Eliot 2020-11-11 11:22:00
# 1 346927 Everything you can imagine is real. Pablo Picasso 2020-11-12 12:23:00
# 2 995443 Whatever you do, do it well. Walt Disney 2020-11-13 13:24:00
# 3 281277 When words fail, music speaks. Shakespeare 2020-11-14 14:25:00
# 4 603115 If God did not exist, it would be necessary to... Voltaire 2020-11-15 15:26:00
# 5 605452 Brevity is the soul of wit. Shakespeare 2020-11-16 16:27:00

参阅

其他

test.csv 的生成方法 🧐

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd

# 数据
data = {
'id': [656222, 346927, 995443, 281277, 603115, 605452],
'quote': ['Every moment is a fresh beginning.', 'Everything you can imagine is real.', 'Whatever you do, do it well.', 'When words fail, music speaks.', 'If God did not exist, it would be necessary to invent Him.', 'Brevity is the soul of wit.'],
'author': ['T.S Eliot', 'Pablo Picasso', 'Walt Disney', 'Shakespeare', 'Voltaire', 'Shakespeare'],
'date': ['2020-11-11 11:22' ,'2020-11-12 12:23' ,'2020-11-13 13:24' ,'2020-11-14 14:25' ,'2020-11-15 15:26' ,'2020-11-16 16:27']
}

# 保存
df = pd.DataFrame (data, columns = ['id','quote', 'author', 'date'])
df.to_csv('test.csv', index=False, encoding='utf-8')