add some content to txt files using python

发表于 2022-07-16 更新于 2022-10-30 分类于杂记阅读次数：

本文字数： 2.8k 阅读时长 ≈ 3 分钟

利用python向海量文本文件中添加内容或读出内容

近期科研过程中做了一些实验，因为notebook写的码并不很干净，不太方便在内部设置多组参数，就把notebook导出为python文件，然后用bash脚本设置参数循环调用.py，最终把参数设置和实验结果导出到了一个txt文本文件中。这里顺便记录下bash脚本

#! /bin/bash
conda activate ch
l="a b c d e f g"
a=(0.005 0.01 0.05 0.1 0.2)
for attack in $attacks
do
    for i in ${a[*]}
    do
        CUDA_VISIBLE_DEVICES=6 python my_analysis.py $l $i --gpu=6
    done
done

但是在生成结果后发现只记录了结果，忘了把参数写在txt文本文件里了，但是参数作为了文件的存储路径。最终想要将这些结果绘制成多个图，每个图的数据来源于不同文本文件，如果在绘图时还需要通过获取文件路径的方式来获取参数，感觉还是太麻烦了，于是就想先把这些参数给写进文件里。

设置需要遍历的目录dir

首先先设置一下需要遍历哪个文件夹,其中osp用来获取当前工作目录

1 2	import os.path as osp work_dir = osp.join('your-dir/your-sub-dir')

向txt文件写入

然后编写函数向一个文本文件中插入内容
这里因为刚好想要插入的位置就在一排“——-”前，所以直接使用了find，如果是复杂的插入位置可以用正则表达式

'''add something to a file'''
def add2file(file, insert):
    with open(file, 'r') as f:
        content=str(f.read())
        pos=content.find("---")
        print(pos)

        with open(file,"wt") as f:
            content=content[:pos]+insert+content[pos:]
            f.write(content)

遍历文件夹

然后编写一个函数遍历这个文件夹，同时在每个文本文件中插入

def add_iter_file(dir,insert):    
    if not os.path.isdir(dir):
        s = 'xxxxx: ' + str(insert) + '%\n'
        add2file(dir,s)
    else:
        files= os.listdir(dir) #得到文件夹下的所有文件名称
        for file in files:
            add_iter_file(dir+'/'+file,insert)

然后执行就可以了，比如：

atio = [0,20,40,60,80,100]
for pr in ratio:
    x_dir = osp.join(work_dir, str(pr))
    add_iter_file(x_dir, pr)

读取文件内容做成DataFrame

因为之前说的直接从文本中读取内容制图不方便，于是想先读到一个pandas的DataFrame里，先导出为csv存着，然后读取这些csv画图。
先写一个函数读取文件中需要的内容

'''
get the results for one file
'''
def get_results(file_name):
        with open(file_name, "r") as f:
            data = f.readlines()
            a = data[0][31:-1]
            b = data[2][13:-1]
            c = data[3][13:-1]
            d = data[4][14:-2]
            e = data[5][19:-2]
            f = data[34][36:-1]

            return a, b, c, d, e, f

再按我需要的行列索引新建Dataframe

1
2
3

df_1 = pd.DataFrame(columns=c, index=i)
df_2 = pd.DataFrame(columns=c, index=i)
df_3 = pd.DataFrame(columns=c, index=i)

然后只需要再编写一个类似函数遍历文件夹下所有文件就好了

def iter_file(dir):    
    if not os.path.isdir(dir):
        a, b, c, d, e, f = get_results(dir)
        f = cr[f]
        df_1[a][f] = b
        df_2[a][f] = c
        df_3[a][f] = d
    else:
        files= os.listdir(dir) #得到文件夹下的所有文件名称
        for file in files:
            iter_file(dir+'/'+file)

绘图

稍微记录一下绘图时的内容

读入csv

按上面创建的csv读入时需设置第一列作为索引、

1
2
3

df_1 = pd.read_csv('log/a.csv',index_col=0)
df_2 = pd.read_csv('log/b.csv',index_col=0)
df_3 = pd.read_csv('log/c.csv',index_col=0)

DataFrame按列遍历

def get_fig_all(df,name):    
    for index, row in df.iteritems():
        roww = [float(r) for r in row]
        get_fig(index,df.index,roww,name)

绘图时设置

def get_fig(name,x,y,y_l):
    if y_l == 'a' :
        lim = (0.92,0.94)
    elif y_l == 'b':
        lim = (-0.1,1.1)
    else:
        lim = (-1.2,1)
    plt.figure(figsize=(8, 6))
    plt.title(name)
    plt.ylim(lim)
    plt.xticks(x)
    plt.xlabel('xlabel')
    plt.ylabel(y_l)
    plt.plot(x,y)
    # plt.show()
    save_path = fig_dir + y_l+'/'+name+'.png'
    plt.savefig(save_path,bbox_inches='tight')
    plt.clf()